Identifying hybrids & the genomics of hybridization: Mallards & American black ducks of Eastern North America

Abstract Resolving evolutionary relationships and establishing population structure depends on molecular diagnosability that is often limited for closely related taxa. Here, we use 3,200 ddRAD‐seq loci across 290 mallards, American black ducks, and putative hybrids to establish population structure and estimate hybridization rates. We test between traditional assignment probability and accumulated recombination events based analyses to assign hybrids to generational classes. For hybrid identification, we report the distribution of recombination events complements ADMIXTURE simulation by extending resolution past F4 hybrid status; however, caution against hybrid assignment based on accumulated recombination events due to an inability to resolve F1 hybrids. Nevertheless, both analyses suggest that there are relatively few backcrossed stages before a lineage's hybrid ancestry is lost and the offspring are effectively parental again. We conclude that despite high rates of observed interspecific hybridization between mallards and black ducks in the middle part of the 20th century, our results do not support the predicted hybrid swarm. Conversely, we report that mallard samples genetically assigned to western and non‐western clusters. We indicate that these non‐western mallards likely originated from game‐farm stock, suggesting landscape level gene flow between domestic and wild conspecifics.

Eurasian mallards largely possess OW A haplotypes, NW mallard clade taxa have significant representation of both OW A and NW B haplotypes (Avise et al., 1990;Johnson & Sorenson, 1999;Kulikova et al., 2005;Kulikova, Zhuravlev, & McCracken, 2004;Lavretsky, McCracken et al., 2014). Competing hypotheses regarding the cause for the presence of both major haplogroups, as well as observed paraphyly within New World taxa include the following: (a) historical secondary contact between New World (NW) monochromatic species with Eurasian mallards resulted in bi-directional gene flow, (b) an ancestral mallard invaded and speciated throughout the NW and the present paraphyly is the result of incomplete lineage sorting within NW taxa, and (c) NW mallards and allies were monophyletic for the B haplotype, but more recent gene flow with occasional Eurasian mallards and/or influx of feral mallards (hypothesized to be of OW origin) resulted in mtDNA paraphyly. However, conclusively testing between these competing hypotheses has been stifled due to the inability to genetically identify individuals to species, and thus estimate true rates of hybridization and gene flow using bi-parentally inherited nuclear markers.
Our primary objective is to determine the rate of hybridization and extent of gene flow between mallards and black ducks using high-throughput DNA sequencing methods. Whereas hybrids have been well documented between mallards and black ducks in the wild, we aim to determine whether hybridization has resulted in gene flow, including whether backcrossing is unidirectional (toward either black ducks or mallards) or bi-directional (toward both black ducks and mallards). We use two methods to identify hybrids (F1) and generational backcrosses (≥F2): (a) traditional approaches in estimating assignment probabilities across samples and (b) novel techniques that utilize information regarding local ancestry across chromosomal haplotype blocks to assign hybrid status (Janzen, Nolte, & Traulsen, 2018;Leitwein, Gagnaire, Desmarais, Berrebi, & Guinand, 2018). Comparing assignments between the two methods will determine whether traditional methods struggle to assign late generational hybrids that often possess only small fraction of the genome as admixed (Lawson, Dorp, & Falush, 2018).
Additionally, we provide an empirical test to determine the utility of accumulated recombination analyses for species that are at the earliest stages of species divergence and largely differentiated by small frequency differences. If hybridization has resulted in extensive gene flow between species, then we expect to find few, if any "pure" individuals, warranting one or both species to be considered a hybrid swarm. Alternatively, if sufficient isolating mechanisms have built up between the two species, then we expect the majority of samples to be assigned with high probability to their respective species or first-generation hybrids (F1), and little evidence of generational backcrosses (≥F2). Such a scenario would be consistent with the reinforcement hypothesis in which taxa retain species boundaries during secondary contact due to viability limitations of any potential hybrids (Servedio & Noor, 2003).
Next, if released game-farm mallards established a viable feral population in Eastern North America, we expect to identify a unique genetic signature of such population structure in eastern mallards.
Additionally, if feral mallards tended to breed with black ducks, then we also expect to find eastern black ducks-their closest abundant relative in the first part of the 20th century-with some assignment to a secondary, non-western mallard population. Alternatively, if the presumed survival of released feral mallards is low (Osborne et al., 2010;USFWS, 2013), then we expect to find little or no indication of a second mallard population and thus, no evidence of gene flow from domestic mallard variants into wild populations of mallards or black ducks.
Finally, by genetically vetting sampled individuals as pure individuals, hybrids, or backcrossed hybrids, we aim to determine the effectiveness of the current set of phenotypic characters (Kirby, Reed, Dupuis, Obrecht, & Quist, 2000) used to assign individuals to species or establish a hybrid status. A recent study that genetically vetted phenotypic traits between mallards, mottled ducks (A. fulvigula), and their hybrids reported that a key character used to identify hybrids was in fact found in 10% of genetically "pure" mottled ducks , suggesting that at least some of the phenotypic characters may not at all be entirely diagnostic. Thus, assessing genetic assignments will provide important information that will either validate current practices or identify which speciescohort requires re-evaluation. In addition, identifying discrepancies between per sample phenotypic and genetic assignment will allow us to compare and test the extent to which results are biased by incorrectly identified samples. In general, without genetically vetting a phenotypic trait, individuals may be incorrectly assigned to species, including the misidentification of a hybrid. Such a bias has the potential to impact downstream analyses and estimates of various summary statistics, rates of gene flow, evolutionary histories, etc., and perhaps resulting in skewed conclusions.
F I G U R E 1 Map of sample locations for American black ducks (ABDU), mallards (MALL), and hybrids (MBDH)-taxonomic or hybrid assignments based on original USFWS phenotypic-based assignments (also see Supporting Information Table S1; N = number of samples). The Mississippi flyway (MISS; striped) and Atlantic flyway (ATL; dotted) are denoted, with all areas west of the Mississippi River considered "WEST." Scatter plots of PC1 (x-axis) and PC2 (y-axis) are plotted Raw Illumina reads were demultiplexed and processed using the computational pipeline described by DaCosta and Sorenson (2014; Python scripts available at http://github.com/BU-RAD-seq/ ddRAD-seq-Pipeline) and following steps outlined in Lavretsky, Dacosta et al. (2015). The pipeline clusters demultiplexed and filtered reads into putative loci based on sequence similarity and genomic position as determined by BLAST, aligns reads within each putative locus, and infers genotypes for individual samples at each locus. Final output files (e.g., fasta, ADMIXTURE) were generated with custom python scripts that set a higher minimum sequencing depth to score an allele . To limit any biases due to sequencing error and/or allelic dropout, alleles with less than 5x coverage were scored as missing, such that a minimum of 10 reads were required to score a locus as heterozygous. Finally, loci with <20% missing genotypes were retained for downstream analyses. Chromosomal positions across markers were attained by aligning a reference sequence across ddRAD markers to the mallard genome (Kraus et al., 2011;Huang et al., 2013;chromosomal assembly provided by T. Farault, unpubl. data). Doing so allowed us to separately analyze autosomal and Z-linked markers in all downstream analyses.

| Mitochondrial DNA
Primers L78 and H774 were used to PCR amplify and sequence 653 bp of the mtDNA control region (Sorenson, Ast, Dimcheff, Yuri, & Mindell, 1999;Sorenson & Fleischer, 1996) following dideoxy sequencing methods described in Lavretsky, McCracken et al. (2014). The PCR products were sequenced on an ABI 3730 at the Yale University DNA Analysis Facility. Sequences were aligned and edited using Sequencher v. 4.8 (Gene Codes, Inc). All sequences have been submitted to GenBank (accession numbers MK425222-MK425495). The New World mallard clade is characterized by two divergent mtDNA haplo-groups: Old world (OW) A and New World (NW) B (Ankney et al., 1986;Avise et al., 1990;Lavretsky, Hernández Baños et al., 2014). We evaluated and assigned mtDNA sequences of each sample to either the OW (A) or NW (B) haplogroup, and tested for longitudinal trends in haplogroup presence, as well as association with ddRAD nuclear-based genetic assignments.

| Population structure
Prior to estimating various descriptive statistics, we explored and visualized population structure using bi-allelic SNPs with singletons (i.e., rare SNPs observed in only one individual) excluded and without a priori assignment of individuals to populations or species. Maximum likelihood estimates of population assignments for each individual were obtained with ADMIXTURE v.1.3 Alexander, Novembre, & Lange, 2009). Autosomal and Z-linked SNPs were formatted for analyses using plink v. 0.67 (Purcell et al., 2007), following steps outlined in Alexander, Novembre, and Lange (2012). Updates to ADMIXTURE now permit for the effective analysis of sex-linked markers (Shringarpure, Bustamante, Lange, & Alexander, 2016) without the increased concern of how heterogamy at sex chromosomes may impact results if homozygosity for all heterogametic samples is assumed by using the "-haplotype" function and designating the heterogametic sex. In birds, the female is the heterogametic sex (ZW), and the male is the homogametic sex (ZZ). Analyzing autosomal and Z-linked markers separately, each ADMIXTURE v.1.3 analysis was run with a 10-fold cross-validation, and with a quasi-Newton algorithm employed to accelerate convergence (Zhou, Alexander, & Lange, 2011). To limit any possible stochastic effects from single analyses, we ran 100 iterations at each population of K (from K of 1-10). Each analysis used a block relaxation algorithm for point estimation and terminated once the change (i.e., delta) in the log-likelihood of the point estimations increased by <0.0001. The optimum K was based on the average of CV errors across the 100 analyses per K; however, additional Ks were analyzed for further population structure resolution. We then used the program CLUMPP v.1.1 (Jakobsson & Rosenberg, 2007) to determine the robustness of the assignments of individuals to populations at each K. First, the R program PopHelper (Francis, 2016) was used to efficiently convert ADMIXTURE outputs into CLUMPP input files at each K. In CLUMPP, we employed the Large Greedy algorithm and 1,000 random permutations. Final admixture proportions for each K and per sample assignment probabilities (Q estimates; the loglikelihood of group assignment) were based on CLUMPP analyses of all 100 replicates per K. Additionally, population structure was also visualized using a Principal Component Analysis (PCA) in R (i.e., "prcomp"), with scoring of bi-allelic SNPs as described by Novembre and Stephens (2008), which accommodate heterogamy when analyzing Z-linked markers (also see Lavretsky, Dacosta et al., 2015).
Next, composite pairwise estimates of relative divergence (Φ ST ), nucleotide diversity (π), and Watterson's θ for mtDNA, autosomal, and Z-linked ddRAD-seq loci were calculated in the R package PopGenome (Pfeifer, Wittelsbürger, Ramos-Onsins, & Lercher, 2014) using concatenated datasets for each marker-type and with indel positions treated as missing. If a substantial amount of samples showed discrepancies between the original phenotypic and genetic assignment (Supporting Information Table S1), then data were reanalyzed assuming original identifications by USFWS personnel and based on genetic assignments from ADMIXTURE analyses. These comparisons permitted us to test whether incorrect taxonomy (black duck <> mallard) or hybrid status (pure <> hybrid) based on phenotypic characters biased results.

| Establishing hybrid indices & identifying hybrids
First, we employed methods outlined in Lavretsky et al. (2016) to simulate expected assignment probabilities for first-generation hybrids (F1) and nine generations of backcrosses (F2-F10) into either the mallard or black duck parental population for ddRAD-seq markers. In short, a total of ten F1 hybrids were first generated by randomly sampling an allele from the mallard and black duck gene pool across bi-allelic SNP positions-each position was randomly sampled based on a probability proportional to the allelic frequency in each respective gene pool. Five hybrids were then backcrossed to either the mallard or black duck for nine generations. To limit potential biases in simulations, hybrid indices were reconstructed using only individuals with ADMIXTURE based probabilities of ≥95% assignment to either black duck or mallard. We ran a total of ten independent simulations, with data subsequently inputted into ADMIXTURE to estimate assignment probabilities for a K of 2 and 3. At each K, 25 iterations were run per simulation for a total of 250 ADMIXTURE outputs generated per K, which were then combined and converted in PopHelper (Francis, 2016) into CLUMPP input files. We employed the Large Greedy algorithm and 1,000 random permutations with final admixture proportions for each K and per sample assignment probabilities based on CLUMPP analyses of all 250 replicates per K.
Per generation expected assignment probabilities were based on the average of either all ten (F1) or each of the five (F2-F10) backcrosses, along with each lower and upper limit.

| Accumulated recombination events
Next, due to some potential limitations of likelihood or Bayesian methods (Lawson et al., 2018), samples were also categorized into hybrid and parental classes based on the number of accumulated recombination events. First, we followed recent methods to simulate the expected number of recombination events (termed "junctions"; Fisher, 1949Fisher, , 1954 based on the idea that new junctions are formed when a crossover takes place at a site that is heterogenic for ancestry (Janzen et al., 2018). Subsequently, we measured the number of junctions in our samples and used this information to categorize each sample as parental or generational backcross. All analyses were based on the largest Chromosomes (1-7) as these provided the greatest number of markers (Supporting Information Figure S1), and thus, the highest likelihood of detecting junctions.
First, expectations of junctions across hybrid classes were simulated under two differing assumptions: (a) assuming a randomly mating hybrid swarm as done in Janzen et al. (2018), or (b) backcrossing with one parental species only (similar to above ADMIXTURE simulations). Under the first hybrid swarm scenario, we assumed a randomly mating hybrid swarm, where initial frequencies followed Hardy-Weinberg proportions and the initial heterozygosity for a first-generation offspring of two randomly mating ancestors would HMM assumes a well-mixed hybrid population, rather than a backcrossing population, and hence we opted to only use the inferred ancestry. Pilot runs showed that for some samples, extremely high hybrid age was inferred, causing an overestimation of the number of switches in ancestry (which are correlated with the age of the hybrid), and reducing overall confidence in local ancestry. This effect was most likely due to the relatively flat likelihood surface of hybrid age versus local ancestry. In order to avoid this problem, we modified the code of ANCESTRY_HMM by adding an exponential prior with a mean of 10 generations to the inferred generation time.
Inferring local ancestry across putative hybrid samples using ANCESTRY_HMM analyses requires SNPs from a potential parental pool from which allele frequencies are derived. First, an ancestry panel was created based on samples with ≥99% assignment to either black duck (N = 82) or mallard (N = 65) in ADMIXTURE analyses (i.e., assumed to be pure parental) (also see Leitwein et al., 2018). Using this ancestry panel, we applied ANCESTRY_HMM on the individuals within the ancestry panel as a cross-validation, with the expectation that individuals used for the black duck panel were genetically 100% black duck, and individuals used for the mallard panel were genetically 100% mallard. Surprisingly, we found that of these individuals, only very few were 100% genetically black duck or mallard (28 out of 82 black ducks, and 6 out of 65 mallards), with many individuals showing at least one chromosome containing recombination event(s) suggesting the presence of "introgressed" genetic material. We therefore opted to use three different ancestry panels: (a) using the assignment based on ADMIXTURE analyses, (b) using only the 100% pure individuals as detected through ANCESTRY_HMM analyses, and (c) using all individuals that had at most one recombined chromosome (this yielded 64 out of 82 black ducks and 16 out of 65 mallards).
Preliminary analyses (results not shown, pers. comm. Corbett-Dettig) have shown that strong differences in panel sizes between species can potentially bias results, to mitigate this effect, we subsampled alleles from the more frequent species in order to obtain equal allele counts.
Using these ancestry panels, we inferred ancestry for each of the 143 potentially hybrid individuals separately across their respective chromosomes 1-7. Lacking a recombination map, we assumed a constant recombination rate across the chromosome, such that the relative distance between markers (e.g., the distance is base pairs divided by the total size of the chromosome) was equal to the relative recombination rate (e.g., if two SNPs are separated by 0.1% of all base pairs, then the recombination rate is 0.1 cM, assuming a chromosome size of 1 Morgan). Chromosome sizes were corrected for their total recombination rate, such that sizes for chromosomes 1-7 were 3.17, 2.26, 1.12, 0.93, 0.79, 1.20, and 0.98 Morgan, respectively (Huang et al., 2006). The total number of junctions per chromosome was determined by assessing the most likely ancestry within non-overlapping windows along the chromosome; changes in most likely ancestry between windows were recorded as a junction. Pilot explorations showed that 20 windows per chromosome provided best results. Obtained results were visually verified to check against artefacts. Junction determination was performed blind, without prior knowledge about the inference of generation based on ADMIXTURE results. Given a number of observed junctions, we then obtained ten likelihood values across potential generations (F 1 -F 10 ). The likelihood for each chromosome was calculated as the probability of observing j junctions after t generations, given the size of the chromosome in Morgan (e.g., Supporting Information Table S2). Then, given the number of observed junctions across the seven chromosomes, the full likelihood is the product of the seven separate likelihoods.
In the absence of an analytical expectation for the likelihood of observing j junctions after t generations, instead, we used the observed frequency in simulations, based on 1,000 replicates.
Standard errors of the mean frequency were very small, indicating that this approximation of the likelihood performs well.
To assess which generation fits the data best, we then calculated the AIC value, where we assumed one degree of freedom (t), such that: AIC = 2*df -2 *log P(t) (Akaike, 1974), and where P(t) is the approximated likelihood as discussed above. Then, we calculated AIC weights (Wagenmakers & Farrell, 2004) to obtain the relative probability of the observed distribution of junctions being from that number of generations. The generation with the highest AIC weight was subsequently selected as the generation that best explained the data.

| ANCESTRY_HMM simulations
To verify correctness of the junctions method and to test the power of the used markers we also performed simulations. We performed the same simulations as used to obtain approximate likelihoods, but instead of tracking junctions, individuals were artificially genotyped at each generation. In order to genotype individuals, we used the same SNP positions (relative along the chromosome) as used in the data, and determined "true" ancestry within the simulation for that individual at that position. Then, given true ancestry, the corresponding observed allele was drawn from the distribution of alleles as observed in the data. For example, for a SNP at location 0.1 cM, where in the data the observed allele counts are [40,10] for black duck ([reference allele, alternative allele]) and [10,40] for Mallard, then the corresponding allele (reference or alternative allele) is drawn from the matching distribution depending on the observed ancestry in the simulation. For example, if in the simulation the individual picked for "genotyping" is of ABDU ancestry at location 0.1 cM, an allele is drawn from the black duck distribution (i.e., [40,10]).
After collecting alleles at the same SNP positions as in the data, the obtained alleles are analyzed using ANCESTRY_HMM to calculate local ancestry, and subsequently to count the number of Furthermore, we compared the degree of heterozygosity between simulated data, ancestry inferred from simulated data, and ancestry inferred from the empirical data. Doing so permitted us to determine whether there were any apparent biases toward certain ancestry.
Finally, we repeated our analysis using a set of artificial SNPs where allele frequencies were artificially constructed to be strongly diagnostic (e.g., a scenario in which allele frequencies were differentially fixed between two species). We performed this analysis in order to quantify the expected uncertainty in hybrid status estimation, given high-quality data.

| Outlier analyses
We tested for statistical outliers that are putatively under selection with the program BayeScan v. 2.1 (Foll & Gaggiotti, 2008). burn-in and 100,000 sampling steps with a thinning interval of 10 for a total of 1,100,000 iterations. The prior odds parameter for the neutral model was set at 10, which equals log10 (PO > 1.0).
We allowed a probability of false discovery (qval) of 0.05. Finally, to determine whether demographic and/or selective processes are the cause of outlier prominence, we calculated Tajima's D (Tajima, 1983), nucleotide diversity, and absolute divergence (i.e., d XY ;Nei & Li, 1979) across markers in the R package PopGenome (Pfeifer et al., 2014). Specifically, for markers in which selection is the cause for high relative divergence (Φ ST ), we expect a high estimate of absolute divergence, and a negative Tajima's D and low nucleotide diversity within the population that is most likely affected by directional selection.

| RE SULTS
We recovered 3,200 ddRAD-seq loci that met our coverage and missing data criteria. Of those, 3,037 loci (362,644 base pairs; 68,187 single-nucleotide polymorphisms [SNPs]) and 163 loci (19,873 base pairs; 2,511 SNPs) were assigned to autosomes and the Z-chromosome, respectively (Supporting Information Figure S2). Marker coverage corresponded to chromosomal size (Supporting Information Figure S1). Final datasets comprised loci with an average median sequencing depth of 80 reads per locus per individual (median range = 24-241 reads/locus/individual), and on average, 98% (minimum of 86%) of alleles per individual per locus were scored. Finally, of the total ddRAD markers, 2,603 and 125 autosomal and Z-linked markers, respectively, were successfully mapped to chromosomal position. Based on a genome size of 1.1 Gbp, marker density was every ~400 Kbp.

| Population structure & hybrids
Both PCA and ADMIXTURE analyses for autosomal markers were  Figure S4), we explored other K values to test for additional resolution as such analyses tend to bias toward lower K values in cases of close ancestry (Janes et al., 2017;Lavretsky, Dacosta et al., 2015). First, K of 2 recovered black ducks as distinct from mallards. Next, K of 3 revealed a high degree of assignment to a second mallard group in the Mississippi and Atlantic flyways that corresponded to those samples in the long tail of the PCA analysis ( Figure 1). We report no identifiable genetic structure within black ducks. There did not appear to be additional resolution at higher K values. The closer association in PCA analysis and lower resolution in ADMIXTURE results is likely the result of the 54-fold difference in the number of analyzed SNPs. However, there was a significant correlation between Z-chromosome and autosomal assignment probabilities (R 2 = 0.84, p < 0.0001) that provided confidence in attaining an overall genomic perspective using autosomal markers.

| Hybrid simulations based on assignment probability
Given the significant correlation between assignments from autosomal and Z-linked markers, and the lower resolution for Z-chromosome-based ADMIXTURE results, we focused on autosomal markers to build expected hybrid indices to assign empirical samples to hybrid or pure classes. Moreover, with the novel structure recovered within eastern mallards, and interspecific assignments within phenotypically assigned black ducks being to western mallards ( Figure 1), hybrid indices were simulated using black ducks and western mallards with ≥95% assignment at autosomal markers to their respective species. Within simulations, assignment probabilities between K runs were significantly correlated (R 2 > 0.99, p < 0.0001; t test p value = 0.78), however, they differed in final expected assignment probabilities. At K of 2, assignment probabilities eventually plateaued at ~99% assignment for backcrosses to their respective parental population. Whereas assignment probabilities for mallard-backcrossed simulations still plateaued at ~99% assignment at K of 3, small interspecific assignments remained across generational classes when evaluating simulation for black duck backcrosses; although, F4-F10 generations reached a consistent assignment probability of ≥95% to their black duck-backcrossed parental population. Thus, while slight interspecific assignments may indicate hybrid status in empirical data, our simulations suggest that this may not be the case when evaluating K of 3 in our dataset . Instead, the small interspecific assignment seen across black ducks (Figure 1) likely represent shared ancestry and perhaps forcing data into a population K of 3. Nevertheless, regardless of K value, lower and upper limits of expected assignment probabilities consistently overlapped one another for each respective species (Figure 2). In general, expected assignment probabilities during backcrossing differed, with a plateau in "purity" reached at the F3 versus the F4 stage for mallard or black duck backcrosses, respectively. Given the expected assignment probabilities for F1-F3 generations and the "purity" cut-offs set based on simulations for either black duck or mallard backcrosses (  Table S1).

| Hybrid simulations based on recombination junctions
Although the genetic divergence between the two parental species is limited and few strongly diagnostic SNPs are apparent in the data, ANCESTRY_HMM was able to resolve local ancestry for all hybrid samples (Figure 3; Supporting Information Figure S5). Supporting Information Figure S5). Thus, while using the number of accumulated recombination events (junctions) to independently assess hybrid status extended identification of hybrids into the F7 category, as compared to ADMIXTURE analyses, it did lack in detection of F1 hybrids.
We found a strong effect of the reference panel used. When only parents that contained pure ancestry were used, only few junctions were detected. However, given the lack of heterozygosity, these individuals were not inferred as F1 individuals (which would also lack junctions), but rather were assumed to be higher generation backcrosses (F4 and higher). The lack of detection of junctions is most likely due to the limited sample size of the reference panel, which makes detection of rare alleles problematic. Including parental individuals that showed at most one recombination event across seven chromosomes increased the ability to detect junctions, but still the match with ADMIXTURE analyses is low, and results tend to be biased toward higher generations. Found heterozygosity rates also reflected differences between the used reference panels; with the default reference panel being significantly different from both the pure and one-recombination reference panels (Tukey's HSD pair-  Figure 4). In the data, we observe a slight upturn in heterozygosity around generations 3-4, but do not F I G U R E 2 (a) The average and range of assignment probabilities from ADMIXTURE results at K of 2 and 3 across 25 simulated replications of hybridization (F1) and nine generations of backcrossing (F2-F10) using genetically vetted American black ducks (ABDU) and mallards (MALL) (Supporting Information Table S1)-each K is based on 250 independent ADMIXTURE analyses. Simulations established assignment probability bins for parental American black ducks, mallards, F1 hybrids, two (F2-ABDU & F3-ABDU) categories for black duck backcrosses, and one (F2-MALL) category of a mallard-backcross (Supporting Information Tables S1 and S4). (b) Empirical data for western (WEST), Mississippi flyway (MISS), and Atlantic flyway (ATL) ABDU, MALL, and hybrid samples-taxonomic or hybrid assignments based on original USFWS phenotypic-based assignments (also see Supporting Information Table S1). Above and below plotted assignments are corresponding ADMIXTURE assignment probabilities from K of 3 analyses across samples (Figure 1). Pure ABDU (no color) are denoted as having ≤5% assignment to the gray or black population. Pure MALL are denoted as having ≥98% assignment to and/or gray and black population(s). Note that western mallards are identified as a single (gray) population, and the prominence of the second (black) mallard population geographically increasing eastwardly. Finally, bold or non-bold circles denote samples with Old World A or New World B mitochondrial haplogroups, respectively (Supporting Information Table S1) from generation three onwards, we do not recover the excess heterozygosity in the first few generations. As a control, we find that we do recover heterozygosity rates similar to the backcrossing expectation in the simulations using perfect markers, but that this relationship breaks down when using markers from the empirical data.
This shows that it is not the distribution of diagnostic markers that hinders detection of heterozygosity, nor the pipeline applied. Rather, it is the diagnostic power of the markers that results in an inability to detect heterozygosity sufficiently.
Comparing hybrid status assignment across reference panels uncovers another apparent bias: it seems that as the "purity" of reference individuals in the reference panel increases, inference of local ancestry is increasingly biased toward black duck ancestry, where using the "Pure" ancestry panel almost all individuals are recovered as higher generation backcrosses toward black duck. This reference panel effect is most likely due to the higher sample size in black duck (even though sample sizes were subsampled to recover similar allele counts for analysis), which makes detection of rare (and often diagnostic) alleles more likely.
Next, results obtained using a hybrid swarm mating scheme were similar to results obtained using the backcrossing scheme (Supporting Information Figure S6), although both schemes report very different hybrid status for chromosomes which show a lack of recombination. For individuals with many chromosomes lacking recombination (for instance chromosomes 1-3 of sample PL010314; Supporting Information Figure S5), we found striking differences between the two mating schemes. The backcrossing scheme consistently infers these individuals to be the result of recurrent backcrossing, where the lack of recombination along the genome is the result of continuously mating again with the same parent. Conversely, the hybrid swarm scheme infers these individuals to be young hybrids, of F2 hybrid status. Although it is TA B L E 1 Simulation-based indices for "pure" black ducks, "pure" mallards, F1 hybrids, F2-black duck and mallard backcrosses, as well as F3-black duck backcrosses (Figure 2; Supporting Information Table S1). Per index, assignment probabilities are based on the proportion of intra-and inter-specific assignment. Purity assignments based on percentage assigned to black duck populations. Regions include WEST (west of the Mississippi River), the Mississippi flyway, and Atlantic flyway (Figure 1), with the number of samples per region provided (N). The total number and proportion of the total per region recovered per group, as well as the percent of samples within each region and per group having ≥5% assignment to a secondary mallard population and mitochondrial (mtDNA) Old World A haplogroup are also provided (Figures 1  and 2

| Mitochondrial DNA & the non-western mallard
Only two genetically vetted black ducks from the Mississippi flyway  Table 1).
Significantly associated (R 2 > 0.23, p value <0.0001), samples possessing an A haplogroup tended to have ≥5% assignment to a non-western mallard group, particularly in the Atlantic flyway (Table 1). Focusing on mallards, western mallards were characterized by a substantial number of samples with A haplotypes but only 3% having ≥5% assignment to a non-western mallard group (Figure 2). In contrast, ~40% of Mississippi flyway mallards possessed an A haplotype and/or significant assignment to a second mallard group, with the highest prevalence of samples with A haplotypes (74%) and/or assignment to a secondary mallard group (92%) found in Atlantic flyway mallards. Similar trends were found for F2-mallard backcrosses in which either half or all samples had A haplotype and/or assignment to a secondary mallard group ( Figure 2; Table 1).

| Testing for biases & summary statistics
Given the discrepancy in the number of mallards, black ducks, and hybrids being identified based on phenotype or genetics, we calcu-  were on average 1.5-times larger than those estimated for non-western mallards (Supporting Information Table S4).  Figure S6), we first report that none of the outliers were explained by the highest absolute divergence; however, this is likely a poor proxy given the strong correlation with nucleotide diversity (Martin, Davey, & Jiggins, 2015).

| Genomic differentiation
Nevertheless, we recover negative Tajima autosomal outliers identified between black ducks and non-western mallards that were also recovered when comparing black ducks and western mallards. Thus, those markers recovered when comparing black ducks or western mallards to non-western mallards suggests that these outliers are the result of demographic or selective processes within non-western mallards. Finally, we find an overall genomic shift toward positive values of Tajima's D in outlier and non-outlier markers within non-western mallards. This is in comparison to black ducks (Supporting Information Figure S8) and western mallards (Supporting Information Figure S9), which had a more even distribution of Tajima's D values across the genome, and largely negative values for outlier markers. Although, no significant outliers on the Z-Sex Chromosome were found when comparing western and non-western mallards, comparing black ducks and non-western mallards demarcated markers within the same outlier region as within western mallards (Figure 6), which showed relatively lower nucleotide diversity in the non-western mallard (Supporting Information Figure S8).

| Assigning hybrids: assignment probabilities versus junctions
Here, we used two separate simulation methods to infer hybrid status for all black duck, mallard, and putative hybrid samples.
First, we simulated allele sorting during backcrossing, and maximum likelihood assignment probabilities with the program ADMIXTURE. Secondly, we inferred local ancestry along chromosomes 1-7 and used the distribution of recombination events across these chromosomes to infer hybrid status, following the theory of junctions (Janzen et al., 2018). Across these two methods, we find congruence for at least 37% of all samples and report limitations in both analyses. First, ADMIXTURE simulations were only able to resolve up to F3/F4 generation as compared to junction simulations that resolved hybrid status up to F9. These results confirm that traditional population assignment programs are reliable in determining several generations of hybrids but can miss-assign late generational hybrids as parental (Leitwein et al., 2018). Conversely, junction simulations were unable to detect early hybrids (i.e., F1 hybrids) and were biased toward F3 hybrid status. Thus, while using the distribution of recombination events complements ADMIXTURE simulation by extending resolution past F4 hybrid status, we caution interpretations and hybrid assignment based on junctions for recently radiated taxa in which shared variation appears to limit hybrid identification. Most striking and concerning was the inability to identify F1 hybrids.
In theory, the chromosomes of true F1 hybrids is comprised of one chromosome provided by each parental lineage, which results in excessive rates of heterozygosity across sites (Leitwein et al., 2018). It is possible that detecting recombination events (or in this case, the lack thereof) was problematic for F1 hybrids because the software used to infer local ancestry was not designed there really are few F1 hybrids and early backcrosses. Although possible, the latter scenario is unlikely given the clear assignment of F1 hybrids in ADMIXTURE analyses (Figures 1 and 2) and observed levels of hybridization between mallards and black ducks.
Additionally, we find a strong effect of the reference panel used in ANCESTRY_HMM analyses (Supporting Information Tables S1 and   S2). For example, restricting the reference panels to include only 100% genetically "pure" individuals caused a strong bias toward black duck ancestry when inferring local ancestry, leading to a loss of the ability to detect junctions. This seems to indicate that for systems in which few diagnostic SNPs exist, as the case for very closely related taxa, the detection of rare alleles requires a large reference panel to work accurately. Future work will require full genome sequencing of parental and putative generational hybrid individuals to fully understand whether the potential limitations based on junction numbers and heterozygosity from ddRAD markers is a result of methodology or a true biological pattern.
Hybrid status assignment assuming a hybrid swarm mating scheme (following Janzen et al., 2018), or using an exclusive backcrossing scheme were similar (Supporting Information Tables S1 and   S2), barring some striking differences for individuals with highly purified chromosomes, where ancestry along the chromosomes was strongly biased toward one of the parents and crossovers were lacking. The backcrossing scheme resolved such genomic patterns by inferring that these individuals are the result of many consecutive generations of backcrossing. Alternatively, the hybrid swarm mating scheme could only resolve such genomic patterns by assuming that these were very young (F2) hybrids, having experienced very little recombination events (Supporting Information Table S1). However, although the hybrid swarm mating scheme inferred F2 to be the most likely hybrid status for these individuals (higher hybrid status would imply even more recombination events), F2 hybrid status in itself also seems unlikely, as this implies that both F1 parents experienced no recombination whatsoever during meiosis, often on multiple chromosomes. Therefore, these patterns seem to reinforce the idea that some of the hybrids analyzed are the result of repeated backcrossing with one of the parental species. Furthermore, the existence of these highly non-recombined individuals seems to validate our intuition that the hybrids are backcrossing with one of the parental species. Although recombinatory simulations identified more backcrossed stages (up to F7) as compared to ADMIXTURE simulations (up to F3), both suggest that there are relatively few backcrossed stages before a lineage's hybrid ancestry is lost and the offspring are effectively genetically parental again. In general, we find an exponential decrease in hybrid assignment, with each subsequent backcross, thus becoming increasingly indistinguishable from its backcrossed parental population (Figures 2 and 3a).
Thus, although our presumed scenario of backcrossing into a single population is un-vetted with field observations, congruence across analyses regarding the prevalence of few backcrossed generations suggests that this scenario may represent the majority rule in which backcrossing occurs with the parental that is most geographically prevalent. In fact, our results support breeding experiments in which backcrossing into black ducks resulted in indistinguishability of offspring and the parental population ≥F3 stage (Kirby, Sargeant, & Shutler, 2004).
Summarizing, we find that for the black duck-mallard system the recombinatory analyses were difficult to apply and might have provided misleading results and need to be applied with caution to other taxa with largely conserved genomes. Low differentiation between the two species caused a lack in diagnostic SNPs, which in turn made F I G U R E 7 Aligned pairwise Φ ST estimates of ddRAD markers for chromosomes that were identified to have non-neutral regions between genetically vetted black ducks (ABDU) and western mallards (MALL.W)-Markers identified in BayeScan analyses as putatively under divergent selection are denoted in red in first Manhattan plot. Comparisons between genetically vetted black duck and western mallard samples with each hybrid class (Supporting Information Table  S1) are presented it difficult to detect junctions across the genome. Furthermore, low density of diagnostic SNPs lead to an underestimation of heterozygotic ancestry, which disqualified this method to detect F1 individuals. Although recombinatory analyses provide a promising future avenue to detect hybrid status of individuals as more molecular data is obtained, we conclude that the current dataset based on ddRAD sequences appears to provide too unreliable results, and we subsequently base our conclusions on ADMIXTURE results.

| The genomics of mallards and black ducks in Eastern North America
We present the most comprehensive molecular study of mallards and black ducks to date, and are able to differentiate between these two previously closely related and genetically undiagnosable species (Figures 1 and 2; Lavretsky, Hernández Baños et al., 2014). First, whereas recent work identified migratory structure within black ducks (Lavretsky, Miller, Bahn, & Peters, 2014), we report no identifiable genetic structure within black ducks and suggest this species to be treated as a single genetic population. Conversely, we find evidence for two genetically distinct mallard stocks (Figures 1 and   2), which we characterize as western and non-western. The North American mallard's history is known to be potentially complicated by the well documented and intense release of game-farm mallards on the east coast since the early part of the 20th century (USFWS, 2013). Until recently, molecular methods did not distinguish among mallards across their Holarctic distribution (Kraus et al., 2013).
However, Söderquist et al. (2017) used genomic sequencing and reported feral European mallards not only have a distinct genetic signature, but also showed them to have a significant impact on wild European mallard stocks. Although the authors did not find the same trend in North America, they did not adequately sample east of the Mississippi River and thus likely missed the non-western genetic signature captured here. Unfortunately, without the presence of known game-farm/feral mallards in our dataset, we are unable to formally test whether the non-western mallard signal is the result of a game-farm parental population. However, even without a known feral parental population to compare to, the genetic diagnosability of many of the geographically eastern samples strongly suggests these to be of domestic ancestry.
We provide strong evidence that supports differing evolutionary histories of samples genetically assigned as non-western mallards (Supporting Information Table S1) when compared to western mallards or black ducks, including the following: (a) overall elevated genomic differentiation (  Figure S8) or western mallards (Supporting Information Figure S9). Finally, when analyzing "pure" non-western mallards (≥ 98% assignment; Figures 1 and 2), we find these to have approximately half the calculated effective population size of western mallards or black ducks (black ducks (Avg. Watterson's θ Aut = 0.013, Watterson's θ Z = 0.0047); western mallards (Avg. Watterson's θ Aut = 0.012, Watterson's θ Z = 0.0047); non-western mallards (Avg. Watterson's θ Aut = 0.0082, Watterson's θ Z = 0.0029; Supporting Information Table S4). In general, the lower genetic diversity coupled with the positive shift in Tajima's D values in non-western mallards are likely signatures of bottlenecking that is often experienced during domestication (Innan & Kim, 2004;Makino et al., 2018;Tufto, 2017). Together, these results strongly support these non-western mallards to be of alternative stock, and likely the result of a century of releasing game-farm mallards in Eastern North America.
Given that all collected samples were taken from wild birds, and not from shooting preserves, our data strongly suggest that released game-farm mallards have established a viable wild [feral] population that is significantly contributing to the genetics of their wild [native] counterparts. We also note that two female mallards (from Ohio) for which we obtained full bodies displayed male-or feral-like phenotypic characters (e.g., green in head, white neck-ring, red breast) and were identified as "pure" non-western mallards (assignment probability of >99%), as well as carried A (OW) haplotypes. These samples suggest that feral female mallards are successfully surviving on the landscape. How gene flow from these non-western, putatively feral birds affect fitness of wild populations remains to be determined; however, if unabated, the chance of negative impact(s) on wild populations may be significant and requires careful consideration (Randi, 2008;Tufto, 2017).
Although feral mallards pose a genetic threat to black ducks, we report overall lower levels of non-western population assignment and OW A haplotypes in black ducks (Figure 2; Supporting Information Table S1). Conversely, 50%-100% of samples identified as first-generation mallard-backcrossed samples had substantial nuclear assignment to the non-western mallard group and carried the OW A mtDNA haplotype (Table 1). Together, these data suggest that hybrids tend to backcross with mallards, and that these mallard backcrosses are largely into the non-western or "feral" mallard population ( Figure 2). The proximate cause for such a scenario remains to be determined. However, reduced representation of non-western assignment and OW A haplotypes in black ducks (i.e., tendency of F1 black duck x feral hybrids to backcross into black ducks) may be due to black ducks showing assortative mating, in general, or simply tend to not overlap the primary habitat used by feral mallards (Osborne et al., 2010). Specifically, feral mallard success to adapt to human-disturbed habitat (Diefenbach & Owen, 1989;Maisonneuve et al., 2006;Rogers & Patterson, 1984) and the black duck's natural reclusiveness and evasion of human-disturbed habitats (Hepp et al., 1988) may simply limit the chance of contact between black ducks and feral mallards.
Additionally, we hypothesize that mallard-black duck hybrids may be using human-dominated landscapes more often than their wild parentals, and thus are primarily coming into contact with other feral mallards (e.g., Hubb's Principle; Hubbs, 1955). Future conservation efforts will benefit from understanding the extent of gene flow from these putatively feral mallard birds into black ducks and native, wild mallards.

| Old world mtDNA haplotypes are associated with the non-western mallard group
Among markers, mitochondrial DNA (mtDNA) has been extensively used to infer gene flow among ducks (Ankney et al., 1986;Avise et al., 1990;Lavretsky, Hernández Baños et al., 2014). However, we find that many mallards and black ducks that are genomically "pure," possess an A haplotype ( Figure 2). Specifically, backcrossing can lead to mitochondrial capture, in which the OW A mtDNA haplotype is retained in a backcrossed lineage that is otherwise indistinguishable from their parental population at their nuclear genome ( Figure 8). For example, two black duck samples from the Mississippi flyway carry OW A haplotypes despite having ≥98% genetic assignment to the black duck population (Figure 2). This suggests that introgressed mtDNA can persist longer than introgressed nuclear variants, and in turn suggests overestimated rates of hybridization that need to be carefully considered. Such a scenario may explain why a proportion of all five New World mallard-like taxa carry the OW A haplotype despite being genetically "pure" (Bonnet, Leblois, Rousset, & Crochet, 2017;Lavretsky, Dacosta et al., 2015;Lavretsky, Hernández Baños et al., 2014).
Next, hypotheses accounting for the presence of Old World (OW) A and New World (NW) B haplogroups within all NW mallard clade species, include bi-directional introgression and/or ancestral variation (Avise et al., 1990;Johnson & Sorenson, 1999;Lavretsky, McCracken et al., 2014;Livezey, 1991). Under a scenario of wild mallard introgression from birds dispersing eastward from Alaska, we expect a decrease of A haplotypes from west to east. Instead, we find an opposite trend, with the largest proportion of birds on the east coast with A haplotypes and a steady decay in A haplotype frequencies westward ( Figure 2). Additionally, the presence of A haplotypes was significantly associated with the autosomal assignment probability to a non-western mallard group, rather than randomly distributed as expected under a scenario of sto-

| Signatures of demographic and selective processes across the genomes of North American black ducks and mallards
We identify several genomic regions that harbor markers with elevated relative divergence and signatures of divergent selection-as determined by BayeScan-between genetically vetted black ducks, western mallards, and non-western mallards ( Figure 6). In an attempt to distinguish between demographic and selective processes across putative outlier, non-neutral markers, we calculated and compared Tajima's D, nucleotide diversity, and absolute divergence (d XY ). In general, we find evidence for both demographic and selective processes influencing genetic variation and driving observed relative differentiation at outliers. First, as expected, most putative outliers determined using relative differentiation were marked by low nucleotide diversity in at least one of the compared taxa (Supporting Information Figures S7-S9). Next, our comparison of absolute divergence was complicated due to a strong correlation between nucleotide diversity and absolute divergence. The obtained correlation is most likely due to insufficient time since divergence (~150,000 years before present; Lavretsky, Hernández Baños et al., 2014) that is required to drive differences in these two estimates (Martin et al., 2015). Thus, comparing absolute divergence and relative differentiation as a test between selection and demographic processes acting on the genomes is likely not appropriate in cases of very recent divergence. Nevertheless, comparing putative outlier markers between genetically vetted western mallards and black ducks (Figure 6), we find overall lower nucleotide diversity and negative Tajima's D in western mallards across outlier regions on the Z-Sex chromosome, Chromosome 1, and Chromosome 2 (Supporting Information Figure   S7), which is evidence for, and consistent with divergent selection in western mallards. This is in contrast to findings when comparing nonwestern mallards to black ducks (Supporting Information Figure S8) or western mallards (Supporting Information Figure S9), in which we report an overall shift toward positive Tajima's D across outlier and non-outlier markers. We argue that the shift toward positive Tajima's D in non-western mallards is most likely the influence of processes associated with domestication (Makino et al., 2018;Randi, 2008).
We conclude that divergence between black ducks and mallards, including within mallards, is likely the result of both demographic and selective processes acting in each of them. Future work will greatly benefit from full genome resequencing to fully understand the size and location of outlier regions.
ADMIXTURE analysis of our dataset suggests a minimum ~25% hybridization rate between mallards and black ducks (Table 1), which is substantially higher than rates between mallards and either Mexican ducks (~2%; Lavretsky, Dacosta et al., 2015) or mottled ducks (~5%; Alternative to assortative mating limiting gene flow, we hypothesize that the possibility to reestablish the parental lineage through backcrossing in only a few generations may be playing an important role in maintaining black ducks in the face of relative high rates of hybridization. First, the genomes of black ducks and mallards lack any identifiable fixed differences (Figures 4 and 5), with only statistical differences at outlier and ≤1% differences in remaining regions. In fact, using genetically vetted black ducks and western mallards, we demarcated several prominent regions of clustered markers consistent with divergent selection on the Z-sex chromosome, as well as several autosomal chromosomes ( Figure 6). To test for the replacement rate of outlier regions during backcrossing, we compared genetically "pure" black ducks or western mallards to ducks within each hybrid class (Figure 7). We define the outlier replacement rate as the generations of backcrossing required until the region of interest is no longer an outlier between a backcrossed individual and pure parental, and thus providing a relative rate of gene flow (Figure 8). First, outlier comparisons suggest that hybrids backcrossing into black ducks or mallards become genetically indistinguishable from their "pure" parental population within one (F2) or two (F3) generations of backcrossing, respectively ( Figure 7) and corresponding to ADMIXTURE simulations ( Figure 2) and previous breeding experiments (Kirby et al., 2004). More specifically, outlier regions on the Z-chromosome and Chromosome-1 either steadily decayed or increased in overall differentiation when comparing each black duck-backcrossed generation to either parental black ducks or western mallards, respectively ( Figure 7). In fact, Z-chromosome and Chromosome 1 outlier regions show the near expected 50% decrease in Φ ST estimates when comparing F1-hybrids to pure mallards or black ducks (Figures 6   and 7). The 50% change in Φ ST estimates suggest that these regions recombine, with hybrids harboring both mallard and black duck variation. Conversely, regardless of hybrid class, outlier regions on chromosome 2, 5, & 21 maintained their original high differentiation regardless of comparison with western mallards and were entirely absent when compared to black ducks within one generation of backcrossing ( Figure 7). Together, these results suggest that latter regions are less likely, if at all able to introgress between the two species. In contrast, the steady decay or increase of outlier regions on the Zchromosome and Chromosome 1 suggests that these regions require a minimum of two backcrosses to restore the parental genotype. We acknowledge that ddRAD markers represent ~0.04% of the genome, making it highly probable that many important genomic regions have been missed. Importantly, with the ability to genetically vet wild samples as pure, F1, or generational backcrosses, now permits for the use of wild samples to further test how various genomic regions recombine and interact in intermediate forms.
We find that mallards and black ducks may be best explained by few key genomic regions that are under alternative evolutionary forces ( Figure 6) Specifically, we hypothesize that outside genomic regions that are essential to develop into a black duck or mallard, much of the genome codes for basic "Anas" functionality, and likely free to move during gene flow events ( Figure 8). Thus, while the rate of hybridization and hybrids may be prevalent, the detection of outlier regions suggests these are less likely to move between species, otherwise we would expect these to have been lost. However, whether demarcated outlier regions are on the path to being lost due to gene flow remains unknown. If gene flow between mallards and black ducks has amalgamated much of their genome, while reinforcing these outlier regions, then using historical museum specimens to comparing pre-versus post-secondary contact groups should identify increased differences at said outlier regions and decreased differences across the remaining genome. Alternatively, if black ducks and mallards have been a case of divergence-with-gene flow, then comparing pre-versus post-secondary contact should yield similar or increased divergence at outlier regions, while the remaining genome being at similar low levels of divergence.

| Conservation implications
Based on ADMIXTURE analyses, we determined that at least ~20% of all phenotypically identified mallards and black ducks were incorrect and possessed hybrid ancestry. Furthermore, only ~60% of phenotypically identified hybrids were true hybrids with ~12% and ~25% of remaining samples being actually "pure" mallards or black ducks, respectively ( Figure 2, Table 1; Supporting Information Table   S1). Our "error rates" are similar to those reported between mallards and Florida Mottled ducks before a genetically vetted field key was developed . With the exception of western mallards, all Mississippi and Atlantic flyway mallards, black ducks, and hybrids were sampled from the 2010 U.S. Fish and Wildlife Services' flyway Waterfowl Wingbee meetings. At the meeting, sex, age, and species/status assignments are based on previously characterized wing plumages believed to be diagnostic between black ducks, mallards, and their hybrids (see Kirby et al., 2000); however, the diagnosability of these traits, particularly for hybrids, had never been genetically vetted. For example, the presence of an anterior and/or posterior white-wing bar across the speculum is currently a primary character used to identify hybrids between mallards and black ducks. However, a recent study found the same character thought to be indicative of hybrids between mallards and mottled ducks in 10% of genetically "pure" mottled ducks . Thus, for scenarios of recent divergence in which ancestral characters may be maintained across lineages, it is critical to genetically vet phenotypic characters to confirm their diagnosability.
While outside the scope of this paper, using established genetic assignments, it is now possible to test and identify which species-cohort require(s) phenotypic re-evaluation, including identifying those characters that are truly diagnostic of hybrids. Without genetically vetting traits, the misidentification of samples can bias population estimates, hybrid identification, including over/underestimation of the impact of hybridization, behavioral studies, and any other study requiring proper species identification. We acknowledge that although breeding experiments would be ideal to validate simulations, our results closely correspond to previous breeding experiments (Kirby et al., 2004). Thus, we believe that outlined methods provide researchers a means to establish the number and generational hybrid classes in their datasets, allowing for a more accurate assessment of hybridization and introgression in wild populations.