Population genomics reveals a candidate gene involved in bumble bee pigmentation

Abstract Variation in bumble bee color patterns is well‐documented within and between species. Identifying the genetic mechanisms underlying such variation may be useful in revealing evolutionary forces shaping rapid phenotypic diversification. The widespread North American species Bombus bifarius exhibits regional variation in abdominal color forms, ranging from red‐banded to black‐banded phenotypes and including geographically and phenotypically intermediate forms. Identifying genomic regions linked to this variation has been complicated by strong, near species level, genome‐wide differentiation between red‐ and black‐banded forms. Here, we instead focus on the closely related black‐banded and intermediate forms that both belong to the subspecies B. bifarius nearcticus. We analyze an RNA sequencing (RNAseq) data set and identify a cluster of single nucleotide polymorphisms (SNPs) within one gene, Xanthine dehydrogenase/oxidase‐like, that exhibit highly unusual differentiation compared to the rest of the sequenced genome. Homologs of this gene contribute to pigmentation in other insects, and results thus represent a strong candidate for investigating the genetic basis of pigment variation in B. bifarius and other bumble bee mimicry complexes.


| INTRODUCTION
Pigment variation within and between populations and species has provided some of the most charismatic evidence for ecological and evolutionary forces involved in phenotypic differentiation (Kronforst et al., 2012). Although color variation can arise from neutral forces such as genetic drift, more often such patterns are the result of divergent selection on color morphs and can thus have implications for diverse processes from local adaptation to adaptive radiation of species (Rausher, 2008;Runemark, Hansson, Pafilis, Valakos, & Svensson, 2010;Wlasiuk, Carlos Garza, Lessa, & Smith, 2003). Insect pigmentation, in particular, has been studied in the context of adaptive strategies such as thermoregulation, crypsis, and warning coloration (Kronforst et al., 2012;True, 2003;Williams, 2007). Such color pattern variation is especially intriguing for its potential to reveal the genetic mechanisms of adaptive phenotypic change, and whether convergence across lineages occurs by shared versus independent molecular mechanisms (Oxford, 2005;Van Belleghem et al., 2017). For example, do species use the same basic tool sets to evolve pigmentation changes, co-opt genes that serve related functions in other species in novel ways, or does convergent evolution occur through unique sets of genes in different lineages with similar phenotypes?
Bumble bees (Hymenoptera: Apidae: Bombus) commonly exhibit adaptive color pattern variation within and between species (Plowright & Owen, 1980;Williams, 2007). Co-distributed species may converge on pigmentation patterns to produce Müllerian mimicry complexes (Müller, 1879), while intraspecific populations in different geographic regions can evolve different color patterns to match these local complexes (Hines & Williams, 2012;Plowright & Owen, 1980;Rapti, Duennes, & Cameron, 2014;Williams, 2007). Much of the flexibility appears to stem from partially independent "ground plan" color pattern elements that subdivide the abdomen inter-and intrasegmentally along anterioposterior and mediolateral axes, which are likely encoded by a combination of regulatory and downstream pigmentation genes (Rapti et al., 2014). Bumble bee color pattern complexes thus provide a model system in which to investigate genetic mechanisms underlying phenotypic polymorphism, the evolution and maintenance of aposematic coloration, and rapid phenotypic diversification.
Traditional genetic approaches for identifying molecular mechanisms underlying phenotypic variation is often challenging in bumble bees owing to difficulties of rearing and working with colonies in the laboratory (Plowright & Jay, 1966;Velthuis & van Doorn, 2006). Experimental research in bumble bees has thus often focused on a small set of species, especially domesticated species such as B. impatiens and B. terrestris (Amarasinghe, Clayton, & Mallon, 2014;Sadd et al., 2015;Stolle et al., 2011;Velthuis & van Doorn, 2006;Woodard et al., 2015) that may not be representative of the diversity of color pattern complexes in nature. Population genomics approaches have become a powerful tool for revealing adaptive genetic variation in nonmodel species where traditional genetics approaches are challenging and a priori knowledge of candidate genes is limited (Beaumont, 2005;Hohenlohe et al., 2010;Stapley et al., 2010). Phenotypically variable wild bumble bee populations are thus excellent targets for a population genomics approach (Lozier & Zayed, 2016), especially if discovered candidate genes can be cross-validated against functional data from model organisms.
Divergent selection for red versus black abdominal pigmentation in B. bifarius might be expected to produce unique detectable differentiation patterns at genomic regions controlling color compared to the majority of the genome (i.e., "outlier" loci) (Beaumont, 2005  . However, weak genomewide differentiation among the much more closely related B. b. nearcticus populations may permit more sensitive tests to discover loci linked to phenotype, despite the less dramatic color differences (Figure 1a).
In this study, we reanalyze an existing transcriptome data set for B. b. nearcticus  to identify outlier loci between black-banded and intermediate color forms. We identify a cluster of SNPs exhibiting unusual behavior compared to the rest of the sequenced genome. Annotation with the sequenced Bombus impatiens genome (Sadd et al., 2015) reveals that SNPs all fall within Xanthine dehydrogenase/oxidase-like (Xdh-like), part of a gene family with established pigmentation roles in other animals (Frost, Borchert, Thorsteinsdottir, & Robinson, 1985;McGraw, 2005;Watt, 1972).
Following suggested best practices (Quinn et al., 2013), reads with a map quality <20 were excluded and duplicate reads were removed with SAMTools (Li et al., 2009)

| Population genetics and outlier analysis
The fixation index F ST (Weir & Cockerham, 1984) and nucleotide diversity (π) were calculated per-SNP and in sliding windows (10 kb windows, 5 kb increments) using vcftools v0.1.12b (Danecek et al., 2011). Tajima's D (Tajima, 1989) was calculated as an average across whole scaffolds and in nonoverlapping 10 kb windows that contained at least ten SNPs. Narrow windows are required because of rapid linkage disequilibrium decay in Bombus (Sadd et al., 2015). We tested for extreme window-specific divergence using nested hierarchical bootstrapping to identify significantly elevated F ST accounting for SNP density per window (similar to Hohenlohe et al., 2010), as follows.
A null distribution of 500,000 pseudo-replicate data sets was simu-  (Jombart, 2008), grouping individuals into the two regional phenotypes, and retaining five principle components and a single discriminant axis. SNPs contributing most significantly to the discriminant analysis ("selected vs. unselected") were identified by applying the snpzip function to the DAPC followed by hierarchical clustering using the "average" clustering method (hclust function).
A second principal components analysis method, pcadapt, is specifically designed to detect adaptive outliers in the face of population struc- We next applied two outlier detection methods implemented in Arlequin 3.5 (Excoffier & Lischer, 2010), also filtered to remove loci with MAF < 0.05. The first approach is based on identifying outliers in a simulated distribution (100,000 coalescent simulations) of F ST as a function of heterozygosity under the finite island model of 100 demes (FDIST approach in Appendix S3; similar to Beaumont & Balding, 2004). The second approach is similar, but relies on a hierarchical model of population structure-here using four groups (two times the number sampled) with 100 demes each-and has a reduced false-positive rate under several more realistic demographic scenarios and should be more conservative (Excoffier, Hofer, & Foll, 2009) (HeirFDIST in Appendix S3). This approach has recently proven useful for detecting recent signatures of selection associated with recent divergence (e.g., Marques et al., 2016). For the hierarchical analysis, individuals were first grouped by site and then by regional color group. SNPs were removed from the hierarchical analysis if they were not present in any individual from a particular site.  (Sadd et al., 2015), assuming that strong synteny between these distantly related genomes (~20 mya) would reflect greater genome structure conservation between B. bifarius and B. impatiens (~4-6 mya) across the region of interest.

| Sanger sequencing validation of SNPs
To confirm allele frequency differentiation at our target gene, including possible errors in the RNAseq data for this region from limited sampling, missing data, and incorrect SNP calling or read mapping, we performed Sanger DNA sequencing of four high-F ST Xdh-like SNPs in a larger data set (53 individuals from 16 populations) (Appendix S1). Primer sets were designed using Geneious R7 such that one were manually inspected, edited, and aligned in Geneious R7 (Kearse et al., 2012). In all cases, heterozygotes were clearly identifiable from overlapping double peaks in the chromatograms. F ST was calculated in Arlequin 3.5 (Excoffier & Lischer, 2010).

| Phylogenetic analysis
We characterized phylogenetic relationships among B. bifarius Xdhlike region sequences, including B. b. bifarius-red, using statistical parsimony in the software TCS v1.21 (Clement, Posada, & Crandall, 2000). For this analysis, we extracted the RNAseq consensus sequences for the Xdh-like region (NT_176739.1:340,370-357,224) in IGV (Robinson et al., 2011), manually removed introns and other sites with missing data, and phased the sequences using the fastPHASE algorithm (Scheet & Stephens, 2006)   As expected from whole-genome comparisons showing strong synteny in bumble bees (Sadd et al., 2015), NT_176739.1 structure is highly conserved between subgenera (Appendix S4), suggesting that SNP positioning from B. impatiens is suitable for B. bifarius (both subgenus Pyrobombus). A spike in differentiation in Xdh-like is apparent for individual SNPs and windows (Figure 1c; Figs. S1-S4; Table   S1). F ST was significantly elevated (p < 2 × 10 −6 ) in all three windows overlapping Xdh-like. Xdh-like SNPs were recovered as significant in multiple outlier detection approaches (Appendix S3; Fig. S1), and no comparable patterns are observed elsewhere in the transcriptome.

| DISCUSSION
Color pattern variation is among the most striking examples of adaptive phenotypic evolution that is excellently represented in the high degree of polymorphism within and between species of the genus Bombus. One of the major goals for investigating color variation in bumble bees is to understand the genetic mechanism underlying F I G U R E 2 90% statistical parsimony TCS network for phased Xdh-like sequences from the complete RNAseq data extracted for all B. bifarius subspecies (4,049 bp with no missing sequence). Colored circles represent unique haplotypes, shaded by population of origin (largest circle = 12 observed haplotypes, smallest = 1 haplotype) with unobserved mutation steps as small gray dots. More stringent haplotype networks produce the same topology within B. bifarius, and so 90% was selected to include the B. impatiens reference

Bombus impatiens
Unobserved haplotype "color pattern elements" that facilitate adaptive divergence and convergence of color forms across the Bombus phylogeny (Rapti et al., 2014). Such loci have been identified in other color polymorphic mimicry complexes (e.g., optix in Heliconius) and are providing insights into evolutionary processes and genetic architectures underlying phenotypic variation (Kronforst et al., 2012;Van Belleghem et al., 2017).
We show that Xanthine dehydrogenase/oxidase-like contains unusual SNP density and divergence between B. b. nearcticus color variants.
Bombus b. nearcticus is genetically well-connected , but Xdh-like starkly contrasts with such genomewide patterns. Xdhlike homologs are known to be involved in pigmentation (Frost et al., 1985;Watt, 1972;Yasukochi, Kanda, & Tamura, 1998). Xdh-like is thus a strong candidate for one element contributing to bumble bee color pattern variation, representing a new example of selection using common pathways in novel evolutionary contexts.
nearcticus-int may be a unique state involving yellow, rather than a mix of pure red and black hairs. Although orange-red tinged hairs are present in B. b. nearcticus-int individuals, they are usually not as intense as in B. b. bifarius-red bees, possibly indicating a mixture of pigments that includes pterins. This could also explain the lack of any noteworthy patterns across this genomic region from comparisons with B. b. bifarius-red in our previous study .
Comparisons across B. bifarius do provide some insights into the dynamics of phylogeographic divergence and phenotypic divergence within the species, although the importance of Xdh-like in the evolution of the group remains to be fully understood. Bumble bee color pattern complexes are believed to be adaptive outcomes of Müllerian mimicry (Plowright & Owen, 1980;Williams, 2007). Color pattern may be more indicative of geography than phylogeny, as populations of the same species can diverge from one another and converge on a prominent local phenotype (Plowright & Owen, 1980). There is a clear relationship between geographic, phenotypic, and genetic divergence for B. b. bifarius-red and B. b. nearcticus, although any signal at genes important for color pattern differentiation in this comparison is swamped by the extent of genomewide divergence . In the case of the less extreme phenotypic differences within B. b. nearcticus, color variation is evident without differentiation across most of the genome, likely due to the greater range connectivity for these populations (Lozier et al., 2013 (Plowright & Owen, 1980;Williams et al., 2014), and it may therefore be advantageous to maintain alleles associated with color pattern variation to facilitate convergence on local phenotypes (Williams, 2007).
The unusual phylogenetic relationships among Xdh-like haplotypes are particularly intriguing from this perspective (Figure 2) (Sadd et al., 2015). The rapid decay of differentiation is highlighted by analysis of restriction site-associated DNA sequences that show no elevated F ST for intergenic SNPs flanking Xdh-like (Fig. S2).
It is also unclear, at this point, how Xdh-like alleles might interact with each other and other loci through dominance, epistasis, or regulatory differences. Additional work is thus necessary to determine which factors, either singly or in concert, are responsible for the development and maintenance of color pattern variation in B. bifarius and other bumble bee species. Examining differential expression of this gene and its alleles throughout development in different body segments will be a next step to determine how Xdh-like contributes to color pattern variation (Mallarino et al., 2016). Experimental work with laboratory colonies will be necessary for functional genomics; however, maintaining successful laboratory colonies with bumble bees over multiple generations can be challenging (Plowright & Jay, 1966;Velthuis & van Doorn, 2006 (Williams et al., 2014), could provide complementary evidence for the gene's role in adaptive color pattern variation.
In conclusion, we have identified a gene that may facilitate understanding the formation of some color pattern complexes in bumble bees. Knowledge that Xdh-like homologs affect pigmentation strengthens this hypothesis, but confirming mechanisms of action in B. bifarius will require additional experimentation. We also suspect that color in B. bifarius is likely to be complex, with multiple contributing loci. Our RNAseq data are likely to reveal only the strongest outlier loci, and by sequencing adult bees, will exclude sequence from genes expressed only during development. Developmental patterning genes (e.g., Hox-genes) are likely key players in bumble bee color pattern evolution, for instance (Rapti et al., 2014). Finally, it will be important to consider ecological pressures driving color variation, and why selection produces certain complexes in particular geographic regions.
Finer scale spatial sampling that incorporates whole-genome data will help resolve some of these issues, while functional genomics and multispecies comparative approaches will be useful for targeting Xdhlike's possible functions. Our findings provide a starting point for such studies of ontogenesis and evolution of color pattern in B. bifarius and other bumble bees.