Multiple Sclerosis (MS) is a complex autoimmune disease of the central nervous system where risk emerges from a tug-of-war between environment (for example, latitude-related sun exposure, infections, smoking) and many genetic variants acting together rather than a single “culprit” mutation. The authors’ starting point is practical: France has a high MS prevalence, yet detailed, nationwide genomic characterizations of French MS cohorts are rare due to historical, ethical, and legal sensitivities around ancestry research. This preprint introduces the genetic “map” of a major French registry cohort, aiming to make future MS genetics in France more accurate, more inclusive, and easier to reproduce (with an added twist: a privacy-preserving synthetic dataset for sharing).
The OFSEP-HD cohort: a national microscope on MS in France
The study centers on OFSEP-HD (the high-definition arm of the French MS registry), a multicenter observational cohort initiated in 2018 with deep longitudinal follow-up (clinical, biological, and imaging) and biobanked samples. The genetic arm includes 2,667 MS patients, all meeting contemporary diagnostic criteria at inclusion, with consent and standardized data capture across France. After genotype quality control filters (missingness, sex checks, exclusion of sex/mitochondrial chromosomes, MAF/HWE thresholds, relatedness), 2,542 individuals remained for the core analyses—still a large national sample for detailed population genetic work.
From genotyping chips to millions of variants: building a usable genome dataset
Genotyping was performed using the Affymetrix Precision Medicine Research Array, yielding ~888,799 assayed variants, followed by imputation (via the TopMed Michigan Imputation Server) to expand variant coverage to the multi-million range used in downstream analyses. Conceptually, this is the difference between reading selected “sentences” in the genome versus reconstructing much more of the “book” using a reference library. For ancestry and population-structure inference, the team emphasized overlap with external references (1,000 Genomes and a dedicated North African dataset), and used a PCA framework built on shared, independent SNPs to avoid artifacts from linkage or platform differences.
Ancestry isn’t a checkbox: seven genetic clusters and the limits of self-report
Using PCA plus supervised clustering, the cohort separated into seven ancestry clusters: a dominant European group (cluster 1), a smaller admixed European/North African group (cluster 7), and additional clusters reflecting North African, African, Afro-Caribbean, East Asian, and South Asian ancestries. In the results, roughly 88% of patients fell in European-leaning clusters (1 and 7), while 232 patients formed a North African genetic cluster—an important minority in a national cohort. The most human—and clinically relevant—observation is that identity labels and genetics often disagree: about half of genetically North African patients did not self-report North African origins, illustrating how “origin” questions can capture culture, birthplace, or family narrative rather than genomic ancestry. The admixture analysis reinforces this nuance by showing that many “North African cluster” genomes are substantially admixed, often with a large European component.
HLA: zooming into the immune system’s most influential real estate
Because MS risk is heavily influenced by the Major Histocompatibility Complex (MHC)—especially HLA genes—the authors imputed classical HLA alleles (via a multi-ethnic reference approach) and then reconstructed multi-locus haplotypes (HLA-A, -B, -C, -DRB1, -DQB1). A key result: the well-known MS risk allele HLA-DRB1*15:01 appeared more frequently in the European genetic cluster (reported at 48.8%) than in the North African genetic cluster (33.2%), consistent with ancestry-dependent risk architecture. Among frequent haplotypes carrying at least one MS-associated allele, many included the protective HLA-A*02:01, while some also carried DRB1*15:01, underscoring that “risk” and “protection” can travel together on common haplotypic backgrounds.
Polygenic risk scores: when European discovery doesn’t transfer cleanly
The paper also tackles polygenic risk scores (PRS), using a published MS susceptibility variant set where the authors could recover 210 variants in their processed data (including 20 in the MHC region). They compared a log-additive PRS to a simpler “sum of risk alleles” approach and found high agreement for non-MHC SNPs, but a marked drop in agreement once MHC variants were included—because MHC effects are larger and uneven, and treating them as equal-weight “votes” can distort the signal. Importantly, PRS distributions differed significantly between European and North African genetic ancestry groups, reinforcing a central caution in translational genomics: PRS built largely from European GWAS can underperform—or mislead—when applied to individuals with different genetic backgrounds, potentially widening health disparities if used uncritically.
Synthetic data for sensitive genetics: sharing without exposing individuals
Finally, the authors address a real barrier in genetics: data sharing. Individual-level genotypes are inherently identifying, so they generated an anonymized synthetic dataset using an adaptation of an “avatarization” methodology designed to balance privacy (reducing singling out and linkability) with signal fidelity (retaining usable statistical structure). They report privacy metrics exceeding recommended thresholds (for example, very high hidden rate) alongside fidelity checks showing small distributional differences overall; only a small number of features significantly diverged after multiple-testing correction, and a global distance-based test did not detect systematic deviation beyond chance. In practical terms, they are proposing a way for the community to experiment, benchmark, and reproduce analyses on a realistic stand-in dataset—while protecting patients—an approach that may become increasingly standard as genomics scales.
Disclaimer: This blog post is based on the information provided in the cited scientific article. It aims to provide an accessible summary of the research findings and should not be considered as definitive medical advice. For any health concerns, please consult with a qualified healthcare professional.
Reference:
Paris, J., Silva, N. S., Faddeenkov, I., Morin, M., Boussamet, L., Demuth, S., ... & Gourraud, P. A. (2025). Genetic architecture of Multiple Sclerosis patients in the French national OFSEP-HD cohort. medRxiv, 2025-04.