Human Leukocyte Antigen (HLA) A-C-B-DRB1-DQB1 Haplotype Segregation Analysis Among 2152 Families in China and the Comparison to Expectation-Maximization Algorithm Result

Human Leukocyte Antigen (HLA) A-C-B-DRB1-DQB1 Haplotype Segregation Analysis Among 2152 Families in China and the Comparison to Expectation-Maximization Algorithm Result

Human leukocyte antigen (HLA) is the major histocompatibility complex in humans, playing a critical role in immune responses and transplantation outcomes. High-resolution HLA matching has been shown to reduce graft-versus-host disease (GVHD) and improve the success of hematopoietic stem cell transplantation (HSCT). For patients without HLA-identical sibling donors, unrelated donor HSCT becomes a crucial option. However, the likelihood of finding a suitable unrelated donor varies significantly among patients, largely due to the diversity and regional disparities in HLA haplotypes across different populations. While previous studies in Western countries have demonstrated the utility of HLA haplotype frequency (HF) in predicting the probability of identifying HLA-matched unrelated donors, the HLA system exhibits significant ethnic and regional variations. Therefore, establishing a reliable HLA haplotype database for the Chinese population is essential for improving the effectiveness of HSCT in this demographic.

This study focuses on the segregation analysis of five-locus HLA A-C-B-DRB1-DQB1 high-resolution haplotypes among 2152 families in China, representing the largest sample size of its kind in the country to date. The primary objective was to compare the observed HFs derived from family segregation analysis with the expected HFs calculated using the expectation-maximization (EM) algorithm from unrelated individuals. The goal was to identify a suitable method for establishing an HLA haplotype database in China, considering the unique genetic characteristics of the population.

Study Design and Methodology

The study included 2152 families, all of which had four confirmed haplotypes (a, b, c, and d) determined by descent. These families were categorized into three groups: (1) Families with both parents (n=1531); (2) Families with one parent and one or more siblings (n=175); and (3) Families without parents but with haplotypes confirmed by two or more siblings (n=446). The majority of the families were from East China (n=1907), with smaller numbers from Central China (n=173) and South China (n=72). High-resolution HLA typing for A, B, C, DRB1, and DQB1 loci was performed using sequence-based typing and sequence-specific oligonucleotide probe methods. Ambiguities were resolved through additional testing, and genomic DNA was extracted from peripheral blood samples. The study was approved by the ethics committee of the hospital, and informed consent was obtained from all participants.

Segregation Analysis and Haplotype Frequency Calculation

The observed HFs were calculated using segregation analysis with Arlequin software version 3.5.2.2. Only four haplotypes per family were counted to avoid repetition. A total of 3274 five-locus A-C-B-DRB1-DQB1 haplotypes were identified, with 285 classified as common (HF ≥0.1%) and the majority being less common or rare. The top 20 haplotypes from the segregation analysis were compared to the results of an unrelated individual study using the EM algorithm. No statistically significant differences were found between the two datasets (P values >0.5), indicating a high level of consistency. The most common haplotypes showed the best agreement, particularly those from East China, which accounted for the majority of the study population.

Allele Frequency and Linkage Disequilibrium Analysis

The same dataset was used to estimate allele frequencies (AFs) and pairwise linkage disequilibrium (LD) using Arlequin. The comparison between HFs and AFs revealed that they were not always positively correlated. For example, the most frequent haplotype, A30:01-C06:02-B13:02-DRB107:01-DQB102:02, had constituent alleles ranked 6, 3, 3, 3, and 4, respectively, in terms of AF. This discrepancy is attributed to the strong positive associations between alleles, as evidenced by high D0 values in the LD test. Similarly, haplotypes such as A11:01-C01:02-B46:01-DRB109:01-DQB103:03 and A11:01-C07:02-B40:01-DRB109:01-DQB1*03:03, ranked 16 and 17 in HF, had constituent alleles ranked 1 or 2 in AF. This suggests that positive associations between alleles are not always strong, and some two-locus haplotypes even exhibit negative associations.

Supplementary data revealed 11 A-B, 5 A-C, 27 B-C, 23 DRB1-DQB1, 3 A-DRB1, 5 B-DRB1, 4 C-DRB1, 2 A-DQB1, and 4 C-DQB1 two-locus haplotypes with strong positive associations (HF ≥0.1%, D0 >0.5, r2 >0.1). Additionally, three-locus haplotypes such as A30:01-C06:02-B13:02, A02:07-C01:02-B46:01, A33:03-C03:02-B58:01, A29:01-C15:05-B07:05, and A69:01-C12:02-B*52:01 showed very strong linkages, accounting for 6.02%, 5.63%, 5.30%, 0.49%, and 0.34% of the haplotypes, respectively.

Comparison of Observed and Expected Haplotype Frequencies

The patients’ typing results were used as phase-known and phase-unknown data to obtain observed and expected HFs, respectively, using the direct counting EM algorithm. A total of 2050 observed and 1852 expected haplotypes were identified, with 1228 overlapping. The remaining 822 observed and 624 expected haplotypes did not overlap, and their HFs were all less than 0.1%. Among the overlapping haplotypes, less-common haplotypes were more prevalent, but the numbers of common and less-common haplotypes were not equal between the observed and expected groups. Seventeen commonly observed haplotypes were less common in the expected group, while 41 less commonly observed haplotypes were common in the expected group.

A chi-square test for trend was conducted for the overlapping haplotypes using GraphPad Prism 6 software. The results showed no statistically significant differences between observed and expected haplotypes, both in total overlapping data (P=0.2424), common data (HF ≥0.1%, P=0.3698), and less-common data (HF <0.1%, P=0.1582). This indicates that the tendencies of observed and expected haplotypes are consistent, with the best concordance observed in common data, followed by total overlapping data and less-common data.

Implications for Haplotype Database Setup

The study highlights the importance of family segregation analysis in identifying rare haplotypes that may be missed by the EM algorithm. While unrelated data are easier to obtain and show good consistency with observed haplotypes, the EM algorithm can miss less-common real haplotypes and incorrectly construct less-common haplotypes. For example, the haplotype A02:07-C03:04-B40:01-DRB110:01-DQB1*05:01 was incorrectly built by EM despite its constituent alleles being common. This is likely due to the presence of eight negative pairwise LDs, making it difficult for this haplotype to appear in the data.

Therefore, family segregation analysis must be used to check and supplement the EM algorithm, particularly for identifying less-common haplotypes. This approach ensures the accuracy and comprehensiveness of the HLA haplotype database, which is crucial for both unrelated HSCT and haplo-HSCT.

Applications in Hematopoietic Stem Cell Transplantation

In unrelated HSCT, the HLA haplotype tool can predict the likelihood of finding an HLA-matched unrelated donor and identify potential mismatching alleles. Patients with common haplotypes have a higher probability of finding a suitable donor, while those with less common haplotypes face greater challenges. Additionally, the tool can predict C and DQB1 typing results based on A, B, and DRB1 typing, aiding clinicians in selecting suitable donors during the confirmatory typing stage.

In haplo-HSCT, the HLA haplotype tool can help determine whether a donor is a true 2-haplo-match or 1-haplo-match, even when complete family data are not available. For example, a sibling with a 5/10 allele match may not be a true 1-haplo-match, while a parent or child with a 10/10 allele match is still a 1-haplo-match. The tool can also predict whether a 10/10 allele match sibling is a true 2-haplo-match or 1-haplo-match, guiding clinicians in treatment decisions.

Conclusion

This study underscores the importance of establishing a reliable HLA haplotype database for the Chinese population, particularly for improving outcomes in HSCT. The comparison between family segregation analysis and the EM algorithm highlights the strengths and limitations of each method. While the EM algorithm is efficient for estimating common haplotypes, family segregation analysis is essential for identifying rare and less-common haplotypes. The integration of both approaches ensures the accuracy and comprehensiveness of the HLA haplotype database, ultimately enhancing the effectiveness of HSCT in China.

doi.org/10.1097/CM9.0000000000001458

Was this helpful?

0 / 0