Gene encoding the CTP synthetase as an appropriate molecular tool for identification and phylogenetic study of the family Bifidobacteriaceae

Abstract An alternative molecular marker with respect to the 16S rRNA gene demonstrating better identification and phylogenetic parameters has not been designed for the whole Bifidobacteriaceae family, which includes the genus Bifidobacterium and scardovial genera. Therefore, the aim of the study was to find such a gene in available genomic sequences, suggest appropriate means and conditions for asmplification and sequencing of the desired region of the selected gene in various strains of the bacterial family and verify the importance in classification and phylogeny. Specific primers flanking the variable region (~800 pb) within the pyrG gene encoding the CTP synthetase were designed by means of gene sequences retrieved from the genomes of strains belonging to the family Bifidobacteriaceae. The functionality and specificity of the primers were subsequently tested on the wild (7) and type strains of bifidobacteria (36) and scardovia (7). Comparative and phylogenetic studies based on obtained sequences revealed actual significance in classification and phylogeny of the Bifidobacteriaceae family. Gene statistics (percentages of mean sequence similarities and identical sites, mean number of nucleotide differences, P‐ and K‐distances) and phylogenetic analyses (congruence between tree topologies, percentages of bootstrap values >50 and 70%) indicate that the pyrG gene represents an alternative identification and phylogenetic marker exhibiting higher discriminatory power among strains, (sub)species, and genera than the 16S rRNA gene. Sequences of the particular gene fragment, simply achieved through specific primers, enable more precisely to classify and evaluate phylogeny of the family Bifidobacteriaceae including, with some exceptions, health‐promoting probiotic bacteria.

Representatives of the scardovial genera differ from bifidobacteria based on genotypic, phenotypic characteristics, and habitat (Killer et al., 2013a). In most cases, they are adapted to an oxygen-containing environment. They were found to be present in the human and animal oral cavity, human clinical samples, urinary tract, porcine cecum, and intestinal tract of wild pigs (Downes et al., 2011;Huys et al., 2007;Killer, Havlik, Bunesova, Vlkova, & Benada, 2014;Simpson, Ross, Fitzgerald, & Stanton, 2004). In contrast to bifidobacteria, scardovia inhabiting the human body are presented as opportunistic pathogens causing oral diseases and urinary tract infection (Mahlen & Clarridge, 2009;Tanner et al., 2011).
The classification and phylogenetic studies of bifidobacteria were based on the determination of carbohydrate fermentation patterns, enzyme activities, morphological and physiological characteristics, peptidoglycan type, DNA G + C content, DNA-DNA relatedness, and 16S rRNA gene sequences by the end of the last century (Dong, Xin, Jian, Liu, & Ling, 2000;Scardovi, Trovatelli, Biavati, & Zani, 1979). Even though all of the above classification techniques, and especially the latter two, are still in use for the differentiation of taxonomic units and primary evaluation of phylogeny (Mattarelli et al., 2014), they have been reported to have several shortcomings. Biochemical and physiological characteristics can be similar among the strains of separated species and, on the other hand, vary among strains of the same species. The results of DNA G + C determination using traditional methods may differ from laboratory to laboratory, depending on the technique used (Fournier, Suhre, Fournous, & Raoult, 2006). DNA-DNA heteroduplexes are formed between strands having at least 80% sequence complementarity. For this reason, DNA reassociation values may not replace the comparison of whole-genome sequences and gene content differences . Phylogeny, classification, and ecological studies on bifidobacteria based on the 16S rRNA gene can be misleading due to the presence of different copies in genomes (Satokari, Vaughan, Akkermans, Saarela, & de Vos, 2001;Větrovský & Baldrian, 2013;Wu, Jospin, & Eisen, 2013). Moreover, there are examples of bifidobacterial species sharing very similar 16S rRNA gene sequences (Delcenserie et al., 2007;Lugli et al., 2014). The application of variable regions within housekeeping genes belonging to COGs (Clusters of Orthologous Genes/Groups of proteins) family in the phylogeny of bifidobacteria is considered to be an alternative to 16S rRNA-derived phylogeny (Jian, Zhu, & Dong, 2001;Killer, Sedláček, Rada, Havlík, & Kopečný, 2013b;Ventura et al., 2006).
A phylogenetic/identification marker applicable to almost the entire Bifidobacteriaceae family has not been devised yet. This led us to find a candidate gene in the genomes of representatives of the family with more robust discriminatory power among taxonomic units and comparable or better phylogenetic features than the 16S rRNA gene. The gene encoding the CTP (cytidine triphosphate) synthase (catalyzing the ATP-dependent amination of UTP to CTP with either l-glutamine or ammonia as the source of nitrogen), which plays an irreplaceable role in the synthesis of RNA in the process of transcription seems to be an appropriate candidate. It is ubiquitous in bacteria, homologous, exists in a single copy in the genome, is subject to stabilizing selection, stable with respect to rapid genetic modification and able to produce a robust phylogenetic tree that reflects the evolution of the species as much as possible (Glaeser & Kämpfer, 2015;Wu et al., 2013).
Below, we discuss the applicability of the variable region within the pyrG gene encoding the CTP synthase in the differentiation of taxonomic units and phylogenetic assessment of the Bifidobacteriaceae family compared with the 16S rRNA gene.

| Strains, culture conditions, and DNA extraction
Forty-three type strains of the Bifidobacteriaceae family and seven wild strains of bifidobacteria (Table 1)  were also included in the study because results of our recently accepted study (Pechar, Killer, Mekadim, Geigerová, & Rada, 2017a) confirmed a significant genetic difference between the strain and B.
thermophilum DSM 20210 T . Thus, we wanted to find out whether these two strains, both referred to as type, also differ based on the sequences of the given gene. All strains (except of B. thermophilum JCM 1207 T ) were routinely cultivated in anaerobic TPY broth (Scardovi, 1986) at 37°C for 24 h. Genomic DNA was extracted for PCR purposes from 1 ml of vital cultures using a DNeasy Blood & Tissue kit (Qiagen) following the manufacturer′s instructions for gram-positive bacteria.

| Isolation and identification of wild strains of bifidobacteria
Seven strains of bifidobacteria originating from the feces of infants and calves (Table S1) were isolated from colonies grown in modified TPY agar (Rada & Petr, 2000) under conditions previously reported (Killer et al., 2013a). They were then identified based on 16S T A B L E 1 (Continued) (Continues) rRNA gene sequences that were obtained after amplification using the 27f -1541R primer pair (Dong et al., 2000). The sequencing of almost complete 16S rRNA genes and the others in the study was performed by the company SEQme (CZ) on the basis of both primers. Complete sequences derived from forward and reverse primers were then stacked in the program Geneious v7.1.7 (Biomatters Ltd.) and stored in the GenBank database of the NCBI (National Centre for Biotechnology Information) through the application Banklt. The EzBioCloud database (Yoon et al., 2017) was used to find the closest related taxa.

| Designing primers flanking the variable region of the pyrG gene
To design primers for the amplification and sequencing of the pyrG gene region of the family Bifidobacteriaceae, the corresponding sequences derived from the complete genome sequences belonging to 15 representatives of the family were applied (Table S2). The sequences were aligned using the Alignment tool in Geneious, which also automatically determined the direction to acquire the consensus sequence. This was used to design primers defining a variable fragment of the particular gene using the application Primer3 in the same software. In order to increase the specificity and functionality of the primers, PCR parameters such as the range of primer size, melting temperature (Tm), % GC, and then maximum Tm difference, maximum dimer Tm, maximum 3′ stability and GC clamp were properly set (Untergasser et al., 2012).

| PCR amplification, sequencing, and deposition in the GenBank database
The pyrG gene fragment in representatives of the family

| Determination of gene characteristics
The alignment of sequences was carried out using the ClustalW algorithm in the program BioEdit v7.2.6 (http://www.mbio.ncsu.edu/ BioEdit/page2.html). The partial pyrG and also 16S rRNA gene sequences were applied. The latter were retrieved for type strains from complete genomes (the GenBank/EMBL/DDBJ accession numbers are shown in Table 1). The 16S rRNA gene, generally considered to be a standard identification and phylogenetic marker in prokaryotes, was applied as a comparative baseline to verify whether the pyrG gene represents an appropriate molecular marker for the Bifidobacteriaceae family. The resulting alignment was subjected to the removal of hypervariable positions using the Gblocks algorithm with the default setting (Castresana, 2000). The percentages of mean pairwise identities, pairwise distances, identical sites, and CG contents were computed in Geneious v7.1.7 (Biomatters Ltd.).
The other gene parameters, consisting of the mean number of nucleotide differences, P-distance (number of base substitution per site), K-distance, number of conserved and variable sites, number of parsimonious-informative, and singleton variable sites were calculated in the software package MEGA v5.05 (Tamura et al., 2011). The latter four parameters were applied to the pyrG gene and amino-acid sequences that were generated using the toggle translation tool in the program BioEdit. The program DnaSP v5 (Librado & Rozas, 2009) was used for calculation the number of synonymous sites (substitution in the coding region that causes no amino acid changes), nonsynonymous sites (substitution in the coding region that causes amino acid changes), nonsynonymous changes per nonsynonymous site (dN), synonymous changes per synonymous site (dS), and the ratio of dN/dS in the pyrG region.

| Phylogenetic analyses
The

| Testing of recombination
Potential recombination events in the partial pyrG gene sequences of examined Bifidobacteriaceae strains were scanned using the automated RDP analysis (Martin & Rybicki, 2000) in the program RDP4 v4.67 (Martin et al., 2010).

| Comparison of 16S rRNA and pyrG gene characteristics and statistics
The pyrG (length of 798 pb) gene fragment was amplified and sequenced in Bifidobacteriaceae strains under the conditions described above. The assigned GenBank/EMBL/DDBJ accession numbers of the gene fragment and 16S rRNA gene sequences are listed in Table 1.
Basic gene parameters calculated in bifidobacteria, scardovia, and the whole Bifidobacteriaceae family are shown in Table 2. The stretch of pyrG gene fragment examined did not contain any indels in bifidobacterial species. Only five insertions and deletions in a short region T A B L E 2 Comparison of basic gene and phylogenetic parameters between the 16S rRNA and pyrG sequences of the Bifidobacteriaceae strains examined were observed in Parascardovia denticolens, Scardovia inopinata, and Alloscardovia species, as illustrated in Figure S1. Although the 16S rRNA gene alignments covered a much longer segment, the shorter pyrG gene region exhibited a much higher sequence variability among representatives of the investigated family. This is documented by lower percentages of mean pairwise identities, identical sites and, on the other hand, higher values of mean number of nucleotide differences, P-and K-distances. As expected, the highest sequence variability was found among the scardovial species (

| Classification of wild strains of bifidobacteria
The classification of seven wild strains of bifidobacteria through 16S rRNA and pyrG gene comparative analysis is documented in Table S1. catenulatum), better differentiation from type strains was found using the pyrG gene than with the 16S rRNA gene sequences.

| Phylogenetic analyses
The phylogeny of the family Bifidobacteriaceae reconstructed based on 16S rRNA and pyrG gene sequences is presented in Figure 1. The maximum-likelihood algorithm and the best-fit ML evolutionary model (Table 2) were applied. The 16S rRNA-derived phylogenetic tree, created primarily by sequences found in complete genomes to make the topology as accurate as possible, includes a separated scardovial cluster and B. adolescentis, B. bifidum, B. boum, B. longum, and B. pseudolongum phylogenetic groups (Figure 1a). The same phylogenetic groups, plus a cluster including species of bifidobacteria isolated from important pollinators (honeybees and bumblebees) are visible in the pyrG phylogenetic tree (Figure 1b). The higher confidence, more accurate topology and robustness of pyrG-based phylogeny are documented by higher percentages of bootstrap values >50 and >70 ( Table 2). The much higher discriminatory power among Bifidobacteriaceae species in the pyrG-derived phylogeny is shown by the length of branches and the scale referring to substitutions per nucleotide position.
A comparative analysis revealing the degree of differentiation of bifidobacterial subspecies based on 16S rRNA and pyrG gene sequences was also performed. A higher resolution was determined in the pyrG gene sequences, as shown in Table S4.
The phylogeny based on pyrG gene-derived amino-acids (265 aa) sequences revealed a separated cluster of scardovia, then the B.
boum and B. pseudolongum phylogenetic groups and a cluster including

| Assesment of recombination events
Recombination events in the pyrG gene sequences of the examined Bifidobacteriaceae strains were not observed using the automated RDP analysis.
Average nucleotide identity of pyrG gene sequences (  (Jian et al., 2001;Ventura et al., 2006). Nonetheless, it is important to note that shorter gene segments (540-690 nt) and a lower number of Bifidobacterium species were considered. A lower content of CG determined in scardovia for both genes is consistent with previous findings based on the values of this parameter in genomic DNA (Killer et al., 2010(Killer et al., , 2013a. The average GC composition of the pyrG gene calculated in all strains tested (62.0%) is typical for bacteria belonging to the Actinobacteria phylum (Delétoile et al., 2010).
The calculated ratio of dN/dS < 1 on the basis of pyrG gene sequences among Bifidobacteriaceae strains tested (Table 3) suggesting stabilizing selection and slow evolution together with no detection of recombination events allow the particular gene to be considered a suitable molecular marker (Glaeser & Kämpfer, 2015;Nuñez et al., 2014). Genes that are not prone to horizontal transmission and recombination meet one of the basic requirements for their use as identification and phylogenetic markers (Adékambi et al., 2011;Roux et al., 2011). It is necessary to point out that mainly type strains (with the exceptions of seven strains) were employed and thus intraspecies recombinations may not be ruled out.
Individual phylogenetic groups are better defined based on pyrG gene sequences in the contrast to the 16S rRNA-derived phylogeny and the topology of the phylogenetic tree more accurately reflects those observed in phylogenetic studies of the genus Bifidobacterium and the Bifidobacteriaceae family on the basis of whole-genomic assays Milani et al., 2014Milani et al., , 2015Sun et al., 2015;Zhang, Gao, Adeolu, Khadka, & Gupta, 2016 longum group using the whole-genomic analyses Zhang et al., 2016). As illustrated in both trees (Figure 1a,b) Classification and phylogenetic position of wild strains of bifidobacteria using pyrG gene sequences correlated with those found on the basis of 16S rRNA gene sequences (Table S1, Figure 1a,b). This allows to consider the pyrG gene as an appropriate, easily accessible identification tool for isolates of bifidobacteria of different origin.
All phylogenetic branches and nodes reconstructed based on the concatenation of 16S rRNA and pyrG gene sequences using the Templeton test implemented in NJ localized incongruence difference analysis are congruent ( Figure S3). This indicates that the pyrG gene can be used as an alternative phylogenetic marker of the 16S rRNA gene in the Bifidobacteriaceae family. Remarkably, the robustness and topology of the tree encompassing particular taxonomic groups correspond to a large extent to the phylogenomic analyses of the family (Lugli et al., 2017), although 16S rRNA and a coding housekeeping gene-phylogeny should not usually be applied together.
Although the application of a single gene fragment for classification and phylogenetic purposes in particular taxonomic groups of bacteria may in many ways be confusing and inaccurate, the results presented above provide evidence that the pyrG gene may be used in the family Bifidobacteriaceae as an alternative classification and phylogenetic marker to the 16S rRNA gene. In contrast to the phylogenetic marker used as the 'gold standard' in the phylogeny of prokaryotes, the higher degree of sequence divergence of the pyrG gene at the (sub)species level (Table S4) is superior for identification purposes.
The gene is expected to be applicable for the classification of new

CONFLICT OF INTEREST
The authors declare that they have no competing interests.