A comprehensive immune repertoire study for patients with pulmonary tuberculosis

Abstract Background Tuberculosis (TB) is a major global health problem and has replaced HIV as the leading cause of death from a single infectious agent. Methods Here, we applied high throughput sequencing to study the immune repertoire of nine pulmonary tuberculosis patients and nine healthy control samples. Results Tuberculosis patients and healthy controls displayed significantly different high express clones and distinguishable sharing of CDR3 sequences. The TRBV and TRBJ gene usage showed higher expression clones in patients than in controls and we also found specific high express TRBV and TRBJ gene clones in different groups. In addition, six highly expressed TRBV/TRBJ combinations were detected in the CD4 group, 21 in the CD8 group and 32 in the tissue group. Conclusion In conclusion, we studied the patients with tuberculosis as well as healthy control individuals in order to understand the characteristics of immune repertoire. Sharing of CDR3 sequences and differential expression of genes was found among the patients with tuberculosis which could be used for the development of potential vaccine and targets treatment.


M4-
pathogens during the infection and reproduction of mycobacterium tuberculosis has been studied which leads to the development of vaccine, diagnosis, drug resistance (Horwitz, Lee, Dillon, & Harth, 1995;Lindenstrøm et al., 2009;Meintjes et al., 2009;Vanham et al., 1997). Since, the TCR repertoire is a mirror of the human immune response, its characteristics have been widely investigated in infectious and other diseases to study the state of the immune system and the progression of these diseases (Chaudhry, Cairo, Venturi, & Pauza, 2013). The diversity within the TCR repertoire is ensured through somatic recombination of germline-encoded variable (V), diversity (D), and junctional (J) gene segments. Nucleotide deletions at the coding ends and nucleotide additions at the V(D)J junctions also contribute substantially to the TCR repertoire diversity (Nikolich-Žugich, Slifka, & Messaoudi, 2004). The TCR diversity is a function of the third hypervariable complementary-determining (CDR3) region, which lies at the intersection between the V, D, J and V, J gene segments within the TCR and TCR chains, respectively. The CDR3 region encodes that part of the TCR which predominantly interacts with antigenic peptide/MHC complexes. Thus, even when T cell clones express the same V/J genes rearrangement, they can be identified by the unique combination of their CDR3 sequences (TCR clonotypes) (Toivonen, Arstila, & Hänninen, 2015). Accordingly, the complexity and distribution of TCRs within specific T cell populations will reflect the degree of complexity of the T cell response.
In the present study, we studied the immune repertoire of CD4 + , CD8 + T cells of patients and healthy controls and tissue sample of patients to elucidate the effect of tuberculosis on patients' immune system. The characteristics of diversity and stability, CDR3 length distribution and CDR3 sequences sharing were analyzed. Besides, the usage of TRBJ, TRBV as well as the combination of TRBV/TRBJ were studied. The different repertoire features between tuberculosis patients and controls were then found as future targets for further study.

| Clinical samples
Tuberculosis tissue samples and blood samples of nine patients and blood samples of nine healthy controls were collected at the Second Clinical Medical College of Jinan University (Shenzhen People's Hospital, Guangdong, China). All patients gave written informed consent and the present study was approved by the Medical Ethics Committee of Shenzhen People's Hospital.

| DNA extraction and mixing
T cell was isolated using superparamagnetic polystyrene beads (Miltenyi) coated with monoclonal antibodies specific for T cells. DNA was prepared from 0.5 to 2 × 10 6 T cells from each sample (patients and controls), which was sufficient for analyzing the diversity of TCR in the T cell subsets. DNA was extracted from PBMCs using GenFIND DNA (Agencourt, Beckman Coulter, Brea, CA) extraction kits following the manufacturer's instructions.
Ten milligrams of tuberculosis tissue was obtained from each patient sample and DNA was extracted using standard methods. Briefly, dewaxing was done using xylene and followed by over-night proteinase K digestion for tissues. QIAamp DNA Mini kit (Qiagen GmbH, Hilden, Germany) was further used for DNA extraction following the manufacturer's instructions. DNA quality was evaluated by loading on a 0.8% agarose gel

| Multiplex-PCR amplification of TCR-β CDR3 regions
The human TCR-β sequences were downloaded from IMGT (http://www.imgt.org/). A relative conserved region in frame region 3, upstream of CDR3, was selected for the puta-tive forward primer region. A cluster of primers corresponding to the majority of the V gene family sequence was selected. Similarly, reverse primers corresponding to the J gene family were designed. In total, 30 forward primers and 13 reverse primers were used for multiplex PCR to amplify the rearranged TCR-β CDR3 regions. The reaction mixtures (50 μl total) comprised 2 μl of pooled TCR-β variable gene (TRBV; 10 μM), 2 μl of pooled TCR-β joining gene (TRBJ; 10 μM), 25 μl of 2X Qiagen Multiplex PCR Master Mix, 5 μl of 5X Q-solution, 500 ng of template DNA (10 μl) and 6 μl of H 2 O. The PCR conditions comprised 95 C for 15 min; followed by 25 cycles of 94 C for 15 s and 60 C for 3 min; followed by a final extension for 10 min at 72 C. The PCR products were purified using AMPure XP beads to remove primer sequences (Beckman Coulter, Inc., Brea, CA, USA). A second round of PCR was performed to add a sequencing index to each sample. In this round, each reaction mixture (50 μl total) consisted of 13.5 μl of H 2 O, 0.5 μl of 2X Q5 DNA polymerase, 10 μl of 5X Q5 buffer, 1 μl of dNTPs (10 mM), 1 μl of P1 (10 μM), 23 μl of DNA, and 1 μl of index (10 μM). The PCR conditions comprised 98 C for 1 min; followed by 25 cycles of 98 C for 20 s, 65 C for 30 s and 72 C for 30 s; and a final extension for 5 min at 72 C. The library was separated on an agarose gel, and the target region was isolated and cleaned using QIAquick Gel Extraction kits (Qiagen).

| NGS and data analysis
The library was quantitated using the Agilent 2100 Bioanalyzer instrument (Agilent DNA 1,000 reagents) and real-time quantitative PCR (TaqMan probes) and sequenced by Illumina MiSeq. Briefly, the adaptor reads and low-quality reads were filtered from the raw data, the clean data was used in further alignments. Subsequently, the clean data was aligned to the human IGH database and analyzed using the online IMGT/ HighV-QUEST tool. The data included V, J assignment, CDR3 length distribution, clustering and other analyses.

| Diversity and stability of repertoire in different groups
The distribution characteristics of the sequences and clone expansion were analyzed firstly. In the current study, the expression level of certain CDR3 clones higher than 0.5% of total clones was defined as high expansion clones (HECs). In tuberculosis patients, the HEC number was higher in the tissue group than that in the CD8 group or the CD4 group, while the comparison between the CD4 and CD8 groups showed no statistical difference. In the control groups, the HEC number in the CD8 group was significantly higher than that of the CD4 group ( Figure 1a). In the comparison of HEC ratio, tuberculosis patients' tissue group showed higher ratio than CD8 or CD4 groups, and that of CD8 group was higher than in CD4 group. In consistent with HEC number, the HEC ratio of CD8 group was higher than in CD4 group in control group ( Figure 1b). The Shannon entropy measures multiplex of the immune system. It ranges from 0 to 1, "1" represents the most diversity and "0" represents the least diversity of immune system. In tuberculosis patients, Shannon entropy in CD4 was higher than CD8 or tissues, while tissue group showed the lowest Shannon entropy, although entropy of CD8 was not statistically higher than tissue group. In controls, Shannon entropy in CD4 was also higher than that of CD8 group ( Figure 1c). The Gini coefficient was then calculated to further understand the stability of tuberculosis patients' immune system. In patients, Gini coefficient in CD8 was higher than in CD4 group, while other comparisons showed no significant change. In controls, Gini coefficient in CD8 group was higher than that of CD4 group (Figure 2).

| CDR3 length distribution mode analysis
In addition, we made further analysis of CDR3 length distribution in all samples and the differences between groups. We first fit the Gaussian distribution of each sample and compared the R 2 value between each sample and each group. The R 2 value ranged from 0 to 1, suggesting the worst fitted Gaussian distribution to the best fitted distribution. According to the R 2 value, the length distribution of all samples was fitted to Gaussian distribution, although no statistical significance was found for comparing between groups, as shown in Figures 3 and 4. The nucleotides and amino acids length of all samples were analyzed. As shown in Figures 3 and 5, the length distribution of CDR3 sequences ranged from 1-30 nucleotides and followed a Gaussian distribution. Besides, both nucleotides and amino acids length distribution of the high expression clones showed a significant difference between the control and tuberculosis patients. In both tuberculosis patients and controls, the amino acids sequence ranged from 1 to 30 amino acids and the highest percentage for both was 13 amino acid sequences. The CDR3 length of CD4, CD8 and tissue groups was analyzed, and it was observed that all presented with a similar pattern with the whole patient group. However, we found that the amino acid length of 1, 2, 5, 25, 27, 28, 29 were absent in more than seven samples in tissue group (n = 9), which is rare in CD4 and CD8 groups. In healthy controls, the amino acids sequence also ranged from 1 to 30 amino acids and the highest percentage was 13 amino acid sequences. For the distribution of CDR3 length, there were no statistically significant differences as has been found between groups. All samples showed a Gaussian distribution, the highest percentage centralized at 13 amino acids. Besides, CDR3 length of tissue group showed more skewed as length 1, 2, 5, 25, 27, 28, 29 were absent in most of tissue sample ( Figure 6).

| CDR3 sequence sharing modes analysis
Different individuals sharing an identical TCR sequence corresponding to the same antigenic epitope, termed public T cell response, were observed in a variety of immune responses, including tumorigenesis, autoimmunity, and viral infections. So, we counted the public T cell clones in each group based on the nucleotide and amino acid sequences of CDR3 (Li, Ye, Ji, & Han, 2012). In order to understand the immunological reaction to the common tuberculosis pathogens, the sharing pattern of CDR3 sequence were analyzed between patients. According to the sequence data, there were 586,248 nucleotide sequences and 504,126 amino acids sequences in CD4 group of patients, and 697,706 nucleotide sequences and 618,480 amino acids sequences in CD4 group of controls. There were 253,466 nucleotide sequences and 210,566 amino acids sequences in CD8 group of patients, 349,470 nucleotide sequences and 294,076 amino acids sequences in CD8 group of control. In addition, in tissue samples of patients, we obtained 108,817 nucleotide sequences and 96,008 amino acids sequences.
To elucidate the characteristics of sharing sequences, we compared the amino acid sequences and nucleotide sequences of highly expressed clones which were expressed in more than 0.5% in either patient group or control group (Figure 7). In patient group, eight amino acid sequences and eight nucleotide sequences were shared in all samples from CD4, CD8 and tissue groups. However, 35 amino acid sequences and 33 nucleotide sequences were shared in CD4 and CD8 samples of patient, while there were 61 amino acid sequences and 61 nucleotide sequences that were shared in CD4 and CD8 samples of control. No amino acid or nucleotide sequences were shared in CD4 and CD8 in both patients and controls. All shared sequences are displayed in Table 2.

| Significance of TRBV and TRBJ usage in patients and controls
In addition, to elucidate the potential specific immune reaction to tuberculosis, usage of TRBV and TRBJ was analyzed in all groups.
In patients, TRBJ1-3, TRBJ2-6 were significantly highly expressed in CD8 group than in CD4 group, on the contrary TRBJ1-5 showed lower expression in CD8 group than in CD4 group. The expression of TRBJ1-3, TRBJ1-5, TRBJ2-4, TRBJ2-5 was also higher in CD4 than in tissue group, and TRBJ2-7 exhibited lower expression in CD4 than in tissue group. TRBJ1-2 showed lower expression and TRBJ2-5 presented higher expression in CD8 group than in tissue group respectively. In controls, TRBJ1-2, TRBJ1-6, TRBJ2-4, TRBJ2-5 exhibited significant higher expression in CD4 group than in CD8 group. Besides, TRBJ1-6 and TRBJ2-7 in controls' CD4 group showed statistical higher expression than that of patients. While TRBJ1-3 in controls' CD4 group showed lower expression than patients' expression. TRBJ2-5 exhibited higher expression in patients' CD8 group than that of control's ( Figure 9). Top 20 genes in each group are shown in Figure 10.
Additionally, we combined the expression data of all samples on TRBV or TRBJ to understand the correlation between the expression in the samples. The heatmaps are shown in Figure 11.

| Combination of usage of TRBV and TRBJ in tuberculosis patients and control samples
TRBV/TRBJ combination was an important source of CDR3 sequence diversification. Within all TRBV/TRBJ combinations, we first counted the highly expressed which represent more than 0.5% of all combinations in each group. For CD4 group in controls, there were six TRBV/TRBJ combinations which were used more than 0.5%, and the number is 22 in CD8 group in controls. In the tuberculosis patient group, there were also six TRBV/TRBJ combinations which were used more than 0.5% in CD4 group, and 21 high expression TRBV/TRBJ in CD8 group, and 32 high expression TRBV/ TRBJ in tissue group (Table 3).
In order to examine the potential contribution of specific TRBV/TRBJ combinations to disease progress, comparison of the relative frequencies of TRBV/TRBJ combinations between patients and controls was performed. There were 46 up-regulated and 10 down-regulated TRBV/TRBJ combinations as has been found after comparison between CD4 group of patients and controls. There were 12 up-regulated combinations and 12 down-regulated combinations as has been found after comparison between CD8 group of patients and controls (Figure 12). We then compared different TRBV/ TRBJ combinations in CD4, CD8, tissue group of patients as shown in Figure 13.

| DISCUSSION
Tuberculosis, a well-known infectious disease, is closely related to immune reaction in its development, diagnosis and treatment process (Janis, Kaufmann, Schwartz, & Pardoll, 1989;Cooper, 2009;Andersen et al., 2000;MacMicking, Taylor, & McKinney, 2003). The immune repertoire is characterized by a complex and dynamic organization, a highly organized, dynamic and coherent structure to assist in the understanding of the generation and selection of immune TCRs. Thus, fully measuring the diversity of the T cell repertoire, which determines the flexibility and specificity in the cellular immune response, could provide new insights into the underlying disease process (Burgos, 1996;Liu et al., 2017). In 1996 investigated disease-specific change in gamma-delta T cell repertoire of pulmonary  tuberculosis patients by flow cytometric analysis of blood and bronchoalveolar lavage gamma-delta T cells (Li et al., 1996). They demonstrated the hypothesis that gamma-delta T cells play a role in the protective immune response to Mtb infection (Li et al., 1996). In 2018, Chaofei Cheng, et al., found that the CDR3δ tended to be more polyclonal and Comparison between CD8 + cell group in patients and CD8 + cell group in controls F I G U R E 1 3 TRBV/TRBJ combination in groups. (a) Comparison between CD8 + cell group in patients and CD4 + cell group in patients.
(b) Comparison between tissue group in patients and CD4 + cell group in patients. (c) Comparison between CD8 + cell group in patients and tissue group in patients CDR3γ tended to be longer in TB patients; the γδ T cells expressing CDR3 sequences using a Vγ9-JγP rearrangement expanded significantly during Mtb infection by NGS study of repertoire (Cheng et al., 2018). However, for further understanding of immune reaction, repertoire diversity and stability within tuberculosis patients and controls still needs more comprehensive studies. Here, we present a study of enormous characterization data of tuberculosis patients and comparable controls. HEC number, HEC ratio, Shannon entropy and Gini coefficient were applied to evaluate the general characteristics of the repertoires. In both patients and normal controls, the HEC number and HEC ratio showed higher frequency in tissue samples than in CD8 or CD4 samples. Besides, HEC number and HEC ratio showed higher frequency in CD8 than CD4 samples in both patients and normal controls. This suggested that a more centralized and stronger immune reaction in tissue samples than in the CD4 + or CD8 + cell samples provides potential evidence for further elucidation for the understanding of the mechanism in depth. The Shannon entropy which was previously used as an economy parameter, was introduced in this study to illustrate the multiplex of the immune system. According to the criteria, we found that the tissue group's repertoire showed lowest complexity in patients which also suggests the strong immune reaction in tuberculosis tissue than other sample type groups.
The length distribution of CDR3 in each sample was fitted to Gaussian distribution, which provides an evenly distributed data set. Consistent with the previous study, we found that there were significant differences between patients and control groups. Since, the CDR3 sequence which was commonly expressed in patient samples could provide a solid clue for disease-specific immune reaction research, we evaluated all amino acids shared in all samples. The CD4, CD8 groups showed similar sharing of sequences which were quite different to that of tissue groups. In the analysis of TRBV and TRBJ gene usage, we found significant difference between CD4, CD8 and tissue groups of patients. These differently expressed genes showed a disease-specific gene expression profile which provides further information for tuberculosis and control study. To find further diseasespecific CDR3 sequences, we also investigated the TRBV/ TRBJ combination in patients and controls. Within same group analysis, we found highly expressed combination sequences in each certain group. And we also found differential expression level of recombination sequences in the comparison between patients' samples and normal control samples, which revealed the important function of TRBV/ TRBJ combination in the immune of tuberculosis and provide presupposition for further study of diagnosis or treatment application.
In conclusion, our study first elucidated the immune repertoire characteristics of tuberculosis patients using NGS based methods. We found that the CDR3 sequences were extremely highly expressed in tuberculosis patients' tissue samples than other type of samples, which suggested a specific and strong immune reaction during the development of tuberculosis. We then analyzed the CDR3 sequence sharing in all samples and each group. Later, we elucidated the TRBV, TRBJ usage and TRBV/TRBJ combination in all samples. This study provides a whole spectrum and profile of tuberculosis patients and studied the specific recombination CRD3 sequences which differ between the patients and controls. Although the sample number is relatively small, we still provide a useful resource of further study on the diagnosis, prognosis and prevention of tuberculosis by understanding candidates' immunology repertoire features.

ACKNOWLEDGMENT
Not applicable.

CONFLICT OF INTEREST
The authors have no conflict of interests to declare regarding this manuscript.

AUTHOR'S CONTRIBUTIONS
YF, BL, SL and YD designed the study and drafted the manuscript. YL, MW, YY, LX, ShL and QH acquired and interpreted the data. SL, YF and YD revised the manuscript for important content. All authors read and approved the final manuscript.

ETHICS APPROVAL AND CONSENT TO PARTICIPATE
All patients gave written informed consent and the present study was approved by the Medical Ethics Committee of Shenzhen People's Hospital.

CONSENT FOR PUBLICATION
Not applicable.

DATA AVAILABILITY STATEMENT
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.