High genetic diversity and stable Pleistocene distributional ranges in the widespread Mexican red oak Quercus castanea Née (1801) (Fagaceae)

Abstract The Mexican highlands are areas of high biological complexity where taxa of Nearctic and Neotropical origin and different population histories are found. To gain a more detailed view of the evolution of the biota in these regions, it is necessary to evaluate the effects of historical tectonic and climate events on species. Here, we analyzed the phylogeographic structure, historical demographic processes, and the contemporary period, Last Glacial Maximum (LGM) and Last Interglacial (LIG) ecological niche models of Quercus castanea, to infer the historical population dynamics of this oak distributed in the Mexican highlands. A total of 36 populations of Q. castanea were genotyped with seven chloroplast microsatellite loci in four recognized biogeographic provinces of Mexico: the Sierra Madre Occidental (western mountain range), the Central Plateau, the Trans‐Mexican Volcanic Belt (TMVB, mountain range crossing central Mexico from west to east) and the Sierra Madre del Sur (SMS, southern mountain range). We obtained standard statistics of genetic diversity and structure and tested for signals of historical demographic expansions. A total of 90 haplotypes were identified, and 29 of these haplotypes were restricted to single populations. The within‐population genetic diversity was high (mean h S = 0.72), and among‐population genetic differentiation showed a strong phylogeographic structure (N ST = 0.630 > G ST = 0.266; p < .001). Signals of demographic expansion were identified in the TMVB and the SMS. The ecological niche models suggested a considerable percentage of stable distribution area for the species during the LGM and connectivity between the TMVB and the SMS. High genetic diversity, strong phylogeographic structure, and ecological niche models suggest in situ permanence of Q. castanea populations with large effective population sizes. The complex geological and climatic histories of the TMVB help to explain the origin and maintenance of a large proportion of the genetic diversity in this oak species.


| INTRODUC TI ON
The Mexican Highlands encompass the main mountain systems of Mexico (i.e., Sierra Madre Occidental, Sierra Madre Oriental, Trans-Mexican Volcanic Belt, Sierra Madre del Sur and Altos de Chiapas) and are located between the Nearctic and Neotropical regions.
Biogeographically, the Mexican Highlands comprise most of the Mexican Transition Zone which also includes the Central American mountains forming the Chiapas Highlands province (Morrone, 2014).
In the Mexican Transition zone, key evolutionary processes have been studied such as species diversification and historical demography including the role of paleoclimatic and geological events in the evolutionary history of the species (Mastretta-Yanes, Moreno-Letelier, Piñero, Jorgensen, & Emerson, 2015).
The Mexican Transition Zone is characterized by an impressive biological and physical complexity in terms of biogeographical and macroecological patterns (Morrone, 2010;Ríos-Muñoz & Navarro-Sigüenza, 2012;Sosa, Ornelas, Ramírez-Barahona, & Gándara, 2016), geological evolution (De Cserna, 1989;Ferrari, Orozco-Esquivel, Manea, & Manea, 2012), and paleoclimatic dynamics (Caballero, Lozano-García, Ortega-Guerrero, & Correa-Metrio, 2019;Caballero-Rodríguez et al., 2018). Therefore, the study of the evolution of the biota in the Mexican Transition Zone requires the consideration of both biotic and abiotic factors, as the phylogeographic approaches do. Even though initial syntheses concerning multitaxa phylogeographic patterns in the Mexican Transition Zone suggest some common responses of species to geographic barriers and historical climate changes (Ramírez- Barahona, & Eguiarte, 2013;Ornelas et al., 2013), we are still far from attaining a detailed view of the processes that have shaped the diversity and evolution of the species in the region.
The most important facts that phylogeographic studies have shown for species in the Mexican Transition Zone include pre-Quaternary divergence times, east-west phylogeographical breaks in the Trans-Mexican Volcanic Belt that are incongruent with geographic distance or topography, and north-south phylogeographical breaks within the Sierra Madre Occidental (Gutiérrez-García & Vázquez-Domínguez, 2013;Mastretta-Yanes et al., 2015;Ornelas et al., 2013 and references therein). However, shortcomings such as differences in sampling schemes among studies, circumstantial choice of the studied taxa and the prevalence of both allopatric and parapatric genetic divergence (Mastretta-Yanes et al., 2015) have made the identification of general phylogeographic patterns a difficult task.
On this basis, for this study we chose Quercus castanea Neé (1801) as a focal species. This species has high ecological importance and has been intensively studied for gene flow estimation in fragmented landscapes (Herrera-Arroyo et al., 2013;Oyama et al., 2017) and community genetics (Tovar-Sánchez et al., 2013). Additionally, Q. castanea has one of the broadest geographical and altitudinal distributions among Mexican oaks. Therefore, focusing on this species may enable us to study historical processes through a large area that experienced a wide range of tectonic and paleoclimate events and shed light into the complex evolutionary processes of Nearctic and Neotropical biotas.
Specifically, the objectives of the present study were to (a) determine the patterns of genetic diversity and differentiation of populations of Q. castanea across its entire geographical range using simple sequence repeats of the chloroplast DNA (cpSSR), (b) test for signals of historical demographic expansions and to determine the timing of the population expansion, (c) contrast the conclusions derived from genetic data with models of the contemporary period, LGM and LIG ecological niche of Q. castanea, and (d) contrast the phylogeographic patterns revealed in this study with the regionalization of the biogeographic provinces based on the physiographic features.
from April to May and is wind-pollinated. Acorns mature from July to November, and dispersal occurs by gravity, birds, and squirrels (Valencia-A., 1995). We sampled 36 populations with approximately 10 individuals per population across the Q. castanea geographical distribution in Mexico (Table 1, Figure 1). Within populations, individuals were randomly selected with at least 10 m of separation between consecutive samples. From each tree, five young intact leaves were collected and stored at −80°C for molecular analysis, and a branch was pressed for further confirmation of species identity.

| Molecular methods
Genomic DNA was extracted from 100 mg of leaf material using the modified CTAB protocol of Lefort and Douglas (1999). Seven cpDNA microsatellite loci were selected and amplified in multiplex polymerase chain reactions (PCR). Two groups of primers were arranged according to allele size and fluorescent labels. The first group was formed by the primer pairs UDT1, UDT3, UCD5, UKK3, and UKK4 previously designed by Deguilloux, Dumolin-Lapègue, and Gielly (2003). The second group of primers included CMCS6 and CMCS10 previously designed by Sebastiani, Carnevale, and Vendramin (2004). PCRs were performed using the QIAGEN Multiplex PCR kit (QIAGEN) in a volume of 5 µl containing 1X Multiplex PCR Master Mix, 2 µM each primer, dH 2 0, and 20 ng template DNA. The thermal cycling conditions consisted of an initial denaturation step for 15 min at 95°C, followed by 40 cycles, each at 95°C for 1 min, annealing at 50°C for 1 min, and extension at 72°C for 2 min. A final extension at 72°C for 10 min was included. Multiplex PCR products were combined with a GeneScan-500 LIZ size standard and then run in an ABI-PRISM 3100-Avant sequencer (Applied Biosystems).
Fragments were analyzed and sized with the Peak Scanner program 1.0 (Applied Biosystems).

| Genetic diversity
Genetic diversity parameters were estimated for each population and for four regions commonly recognized as morphotectonic and biogeographic units for Mexico (the Sierra Madre Occidental, the Trans-Mexican Volcanic Belt, the Central Plateau and the Sierra Madre del Sur) (Ferrusquía-Villafranca, 1993;Morrone, 2005Morrone, , 2014. Population-level and regional haplotype richness (corrected for unequal sample sizes with rarefaction), number of exclusive haplotypes and gene diversity (h S ) were calculated using SPAGeDi 1.1 (Hardy & Vekemans, 2002). D 2 SH , which is the mean pairwise genetic distance among individuals within a population under a stepwise mutation model (Goldstein, Ruiz Bares, Cavalli-Sforzaf, & Feldman, 1995), was calculated using the LMSE program (Navascués, Hardy, & Burgarella, 2009). To assess the patterns of the geographical distribution of genetic variation in Q. castanea, genetic diversity indexes were correlated with elevation, latitude, and longitude using a correlation test implemented in R 3.2.2 (R Core Team, 2017). To depict genealogical relationships among haplotypes, a median-joining network was constructed using the software NETWORK 4.5 (Bandelt, Forster, Sykes, & Richards, 1995). This method combines the topology of a minimum spanning tree with a maximum parsimony search (Polzin & Daneshmand, 2003) to identify and remove unnecessary links between haplotypes.

| Genetic and phylogeographic structure
Tests of population genetic structure were performed using analyses of molecular variance (AMOVA) with ARLEQUIN 3.5 (Excoffier, Laval, & Schneider, 2007) to obtain estimates of both F ST (based on the infinite alleles mutation model, IAM) and R ST (based on the stepwise mutation model, SMM). We also performed a hierarchical grouping to obtain the variance components of genetic variation among groups, among populations within groups, and within populations. The categories were defined by the four above-mentioned biogeographic provinces. The significance of partitions was tested using 10 4 permutations.
Phylogeographic structure was determined by comparing the differentiation coefficients for unordered and ordered alleles (N ST and G ST , respectively) in SPAGeDi 1.1 (Hardy & Vekemans, 2002). If N ST , which considers the genetic differences among the haplotypes, is significantly higher than G ST , then there is phylogeographic structure in the populations (Pons & Petit, 1996). To determine whether phylogeographic structure was determined by regional diversity, differences between mean N ST and mean G ST were calculated through regular distance intervals. This analysis was implemented in SPAGeDi 1.1 (Hardy & Vekemans, 2002) using four distance intervals. Geographic intervals were defined by calculating a distance that homogenizes the number of populations represented in each interval.
We analyzed the spatial distribution of genetic variation and identified genetic discontinuities between Q. castanea populations using GENELAND 4.0.7 (Guillot, Renaud, Ledevin, Michaux, & Claude, 2012), which considers genetic structure together with the geographic location of populations using a model of spatial Bayesian clustering under a MCMC method, to assign individuals into genetic clusters (Guillot et al., 2012;Safner, Miller, McRae, Fortin, & Manel, 2011). GENELAND was implemented using a number of potential spatial clusters (K) ranging from two to 35. The analyses were set up considering ten independent runs, correlated allele frequencies, one million generations with a thinning value of 100 and a burn-in value of 1,000. Finally, the distribution of GENELAND spatial clusters was mapped using ArcMap 10.3 (ESRI Inc.).

| Historical population demography
Tests for population expansions were performed at the population and regional levels using F S (Fu, 1997) and mismatch distributions (Rogers & Harpending, 1992). The F S statistic evaluates the TA B L E 1 Geographical location and estimates of genetic diversity for 36 populations of Quercus castanea in four biogeographic provinces of Mexico Note: n, sample size; h S , within-population genetic diversity; D 2 sh, mean pairwise genetic distance among individuals within a population under a stepwise mutation model. Allelic richness is reported after a rarefaction analysis to standardize for unequal sample sizes. Expanding populations are also expected to have unimodal distributions for differences in repeat number in pairwise comparisons among individuals in a sample (i.e., the mismatch distribution), while in stationary populations, the distribution is predicted to be ragged and multimodal (Rogers & Harpending, 1992). To use ARLEQUIN 3.5 for these analyses, the cpSSR data were binary coded following Navascués and Emerson (2005) and Navascués et al. (2009). The number of repeats was coded as "1," and shorter alleles were coded filling the differences in repeats with "0." The significance of F S values was evaluated with 10 4 data bootstraps (Excoffier et al., 2007).
For the mismatch distributions, the raggedness index of Harpending l is the number of loci evaluated, and N 0 and N 1 are the previous effective population sizes and the sizes after the expansion, respectively (Rogers & Harpending, 1992). The pseudolikelihood method (Navascués et al., 2009) implemented in the LMSE program was used to estimate these parameters. This method takes homoplasy, which is common for cpSSRs, into account in the calculations. Since no reliable estimates of the mutation rate for cpSSRs have been reported, the translation of these values into years is subjected to considerable uncertainty. However, mutation rates in the range of 10 -5 to 10 -4 mutations per generation per locus have been usually assumed in the literature, as well as generation times for trees between 25 and 100 years (Heuertz, Teufel, & González-Martínez, 2010;Navascués et al., 2009). Hence, we used those values of mutation and generation times to calculate t. To avoid redundancy among variables, the initial set of 20 variables was reduced to a group of seven variables using a correlation matrix implemented with the corrplot (Wei & Simko, 2017) function in R 3.2.2 (R Core Team, 2017). When combinations showed r values over .7, the global variables were retained over the particular ones (e.g., annual mean temperature was chosen over mean temperature of the wettest month). In addition to elevation, the bioclimatic variables retained represent annual trends (mean annual temperature, annual precipitation), seasonality (temperature seasonality, precipitation seasonality, annual range in temperature), and extreme or limiting environmental factors (precipitation of the driest month).

| Ecological niche modeling
As conventionally used, a niche model was produced under contemporary climatic conditions and then projected into the past climatic scenarios (Nogués-Bravo, 2009). To run the model, a subset of 70% of Q. castanea records were used as training data, and the remaining 30% occurrences were used to validate it. As a threshold-independent method for model validation, we used the area under the receiver operating characteristic curve (AUC) (Fielding & Bell, 1997). Fifty independent runs using bootstrap replication were performed for each model to assure convergence on similar solutions. A jackknife test for the model in the contemporary period was used to measure the relative importance of each climatic variable on the occurrence prediction.
To evaluate whether the models suggest changes in the distribution of Q. castanea between the LIG, LGM, and the contemporary periods, we defined the presence of the species using the 10 percentile training presence threshold rule (Phillips et al., 2006). In order to identify the areas within the distribution of Q. castanea that have remained stable through the different time periods (LIG, LGM and contemporary), binomial outputs calculated using the above-mentioned threshold rule were summed using the raster calculator tool in ARCMAP 10.4 (ESRI Inc.) and the extent of the stable areas was compared to the entire area of the contemporary distribution model.   Table 2). In contrast, if distances among haplotypes are considered (SMM), the genetic differentiation among populations within regions is 46.33%, while variation residing within populations is 37.22% and variation among the four regions is 16.45% (Table 2).

| Genetic diversity and structure
The network of the 90 haplotypes identified in Q. castanea is shown in Figure 1a. In most cases, haplotypes were separated by one mutational step. Some groups of closely related haplotypes were restricted to specific geographical regions. This restriction was particularly clear for the gray group haplotypes (Figure 1a

| Historical population demography
We obtained nonsignificant negative and positive F S values for all populations probably due to limited sampling sizes; therefore, mismatch distributions were not calculated at the population level (Table 3). However, when grouping samples at the regional level, a signal of recent demographic expansion was detected by both F S and

Harpending's raggedness indexes for the Trans-Mexican Volcanic
Belt and the Sierra Madre del Sur (Table 3)

| Past and contemporary ecological niche modeling
The AUC values (averaged over the 100 runs) for the training and test data for the Maxent modeling were 0.971 and 0.963, indicating an appropriate model performance. The jackknife analysis indicated that the climatic variables with the highest relative contributions to the contemporary distribution model were related to elevation and temperature seasonality. The estimation of the distribution extent of Q. castanea suggests that 19% of the species suitable habitat has remained stable from the LIG to the contemporary period (Figures 3 and 4).

| D ISCUSS I ON
The analysis of the evolutionary patterns reported for the Mexican Highlands (e.g., Bryson, Linkem, & Pavón-Vázquez, 2017;Bryson et al., 2018;Caviedes-Solis & Leaché, 2018;Sanginés-Franco et al., 2015;Villaseñor, Delgadillo, & Ortiz, 2006) have allowed to state hypothesis that describe the evolution of montane forests (e.g., cloud F I G U R E 2 Geographical location of the most important genetic clusters for Quercus castanea using GENELAND. White polygons represent unique genetic clusters, blue scale polygons represent genetic clusters within the TMVB, and gray scale polygons represent genetic clusters formed by populations distributed in the TMVB and SMS. White circles represent Q. castanea studied populations. Each circle includes its population ID from Table 1. Bottom left corner includes the changes in the phylogeographic structure in terms of G ST and N ST per distance intervals TA B L E 3 Tests for historical population expansions and estimation of demographic parameters for 36 populations of Quercus castanea in four biogeographic provinces of Mexico according to Table 1 notation

| Quercus castanea genetic diversity
Interestingly, genetic variation that has been found in oak species From an evolutionary perspective, at the intraspecific level,

Q. castanea could illustrate the rapid diversification of the genus
Quercus in the Mexican highlands that has been related to ecological opportunity (Hipp et al., 2018) since this species currently has one of the widest ecological and geographical distributions, reflecting the species capacity to colonize and establish through a wide range of heterogeneous physiographic and climatic conditions.

| Quercus castanea genetic structure
As has been reported for most of the oak species studied in Cavender- Bares, 2013;Gugger, Ikegami, & Sork, 2013;McCauley et al., 2019;Rodríguez-Gómez et al., 2018;Tovar-Sánchez et al., 2008), Q. castanea also exhibited a well-defined phylogeographic structure. However, unlike several of the mountainous oak and other tree species in Mexico and Central America (McCauley et al., 2019;Ornelas et al., 2013;Rodríguez-Correa et al., 2017), the distribution of the genetic variation in this case is not explained by major biogeographic regions (Table 2; Figure 2

| Quercus castanea evolution in the Mexican highlands
Recently Finally, our results indicate a similar pattern to previous studies in the Mexican highlands that have reported pre-Quaternary divergence times (Bryson et al., 2017(Bryson et al., , 2018Bryson & Riddle, 2012;Ornelas & González, 2014) and to patterns of pre-Quaternary diversification for Neotropical species, characterized by complex mixtures of taxa that radiated and spread over a wide range of timescales (Bennett et al., 2012). These findings indicate that the origin and maintenance of a large proportion of the genetic diversity in Q. castanea have occurred within the Trans-Mexican Volcanic Belt, which is also a hotspot for oak species diversity and has acted as a natural bridge between other biogeographic provinces (Bryson et al., 2018).

| CON CLUS IONS
Our study suggests that together tectonics and paleoclimate have shaped the genetic structure of Q. castanea, as has been found for other species in the Mexican highlands (Mastretta-Yanes et al., 2015). By focusing on species with a wide distribution, we were able to reveal in detail aspects of historical population dynamics such as old population expansion predating the Pleistocene and the maintenance of considerable areas that probably sustained stable populations over the Pleistocene climatic changes. The presence of these stable areas favored the genetic connectivity between biogeographic provinces (mostly between the TMVB and the SMS) and the maintenance of genetic variation. These PAPIIT-DGAPA, UNAM grants IV201015 to KO and IA208017 to HRC.

CO N FLI C T O F I NTE R E S T
Authors have no competing interests to declare.

AUTH O R CO NTR I B UTI O N S
JMPR and KO conceived and designed the study. JMPR and VRR performed field and laboratory work. JMPR, HRC, AGR, and KO analyzed and interpreted the data. JMPR, HRC, and KO drafted the article. All authors contributed equally to subsequent drafts and gave final approval for publication.