Chloroplast genome sequence confirms distinctness of Australian and Asian wild rice

Cultivated rice (Oryza sativa) is an AA genome Oryza species that was most likely domesticated from wild populations of O. rufipogon in Asia. O. rufipogon and O. meridionalis are the only AA genome species found within Australia and occur as widespread populations across northern Australia. The chloroplast genome sequence of O. rufipogon from Asia and Australia and O. meridionalis and O. australiensis (an Australian member of the genus very distant from O. sativa) was obtained by massively parallel sequencing and compared with the chloroplast genome sequence of domesticated O. sativa. Oryza australiensis differed in more than 850 sites single nucleotide polymorphism or indel from each of the other samples. The other wild rice species had only around 100 differences relative to cultivated rice. The chloroplast genomes of Australian O. rufipogon and O. meridionalis were closely related with only 32 differences. The Asian O. rufipogon chloroplast genome (with only 68 differences) was closer to O. sativa than the Australian taxa (both with more than 100 differences). The chloroplast sequences emphasize the genetic distinctness of the Australian populations and their potential as a source of novel rice germplasm. The Australian O. rufipogon may be a perennial form of O. meridionalis.


Introduction
Oryza sativa, the dominant cultivated rice species, shares the genus Oryza with approximately 22 other species (Khush 1997). Genetically, the primary factor that differentiates the species within this genus is genome organization and ploidy as determined by hybrid chromosome pairing behavior ). On the basis of this evidence, six diploid genomes, AA, BB, CC, EE, FF, and GG, and four allotetraploid genomes BBCC, CCDD, HHKK, and HHJJ have been identified (Lu et al. 2009). Oryza sativa belongs to the AA genome group along with the only other cultivated species O. glaberrima, and the wild rice species O. rufipogon, O. nivara, O. longistaminata, O. glumaepatula, O. barthii, and O. meridonalis (Ge et al. 1999).
Within O. sativa there are two subspecies, indica and japonica. These subspecies are the product of either one (Molina et al. 2011) or two independent domestication events (Sweeney and McCouch 2007). Regardless of whether there have been one or two independent domestication events, the process of domestication and more recently selection during plant breeding has resulted in cultivated O. sativa resting on a relatively narrow genetic base. Because of this, AA genome wild rice species have been a valuable source of new genes and alleles for resistance to a range of pests and diseases (Brar and Khush 1997). However, Asian wild rice is in close contact with cultivated rice and there is constant gene flow between the cultivated and wild populations that contaminates the Asian wild rice gene pool with cultivated alleles, an example of which is the shattering gene being found in wild rice (Sweeney and McCouch 2007). In contrast, with the exception of failed attempts to establish commercial rice growing in the Northern Territory and Western Australia in the 1950s, the Burdekin irrigation area in the early 1990s (Anonymous, 2005) and the more recent crop of 650 ha in Western Australia, Australian wild rice has been largely genetically isolated from cultivated rice. Because of this, the Australian wild rice gene pool has not been contaminated with cultivated Oryza alleles to the same extent as the Asian wild rice gene pool making Australian wild rice a potential source of valuable alleles for rice breeding. The AA genome Oryza species endemic to northern Australia are O. meridionalis and O. rufipogon (Figs. 1 and 2). These species are primarily distinguished by anther size, Australian O. rufipogon does not share the small anthers of O. meridionalis and life history, O. meridionalis is an annual species while O. rufipogon is a perennial species. These species grow in close proximity to each other, O. rufipogon grows in transient pools and ponds where some water persists during the dry season while O. meridionalis grows on the periphery of these same bodies of water surviving the dry season as seed. This is analogous to the relationship between Asian O. rufipogon and O. nivara. Oryza nivara has been variously described as an annual species that grows in swamps, which dry out during the dry season, unlike O. rufipogon that grows in deep permanent water, or as an ecotype of O. rufipogon with which it shows a continuous distribution in location and morphology .
Despite many different approaches, the taxonomy of the AA genome Oryza remains a work in progress. The relationship between O. rufipogon, O. nivara, and O. meridionalis is unclear. Analysis of chromosome pairing has confirmed that O. meridionalis and O. rufipogon are AA genome species . In common with most early molecular taxonomic treatments, however, O. rufipogon samples used by Lu et al. (1997) were sourced from Asia only and therefore did not provide evidence of the relationship between Australian O. rufipogon and O. meridionalis. Experimental crosses between Australian O. rufipogon and O. meridionalis produced interspecific hybrids although fertility and seed set of the hybrids was low . Restriction fragment length polymorphism (RFLP) and Short interspersed elements (SINE) data derived from sample sets including both Australian and Asian O. rufipogon and O. meridionalis suggest these species are different (Wang et al. 1992;Xu et al. 2005), and that Australian O. rufipogon is more closely related to Asian O. rufipogon than it is to O. meridionalis (Wang et al. 1992).
Phylogenies derived from nuclear data can be problematic because recombination may confound phylogenetic resolution and lead to the construction of inconsistent trees (Poke et al. 2006;Takahashi et al. 2008). Plastid sequence data, in contrast, are haploid and offer the advantages of high copy number without recombination. As the number of informative characters increases, so does phylogenetic resolution. Next Generation (or massively parallel) sequencing can cost-effectively sample large numbers of informative characters and hence dramatically increase phylogenetic resolution. Whole chloroplast genome sequencing for phylogenetic analysis without prior isolation or amplification is now relatively straightforward for plant species (Nock et al. 2011). This approach captures a large quantity of chloroplast sequence data, and whole plastome sequences can be used to resolve phylogenetic relationships among even closely related species (e.g., Parks et al. 2009;Zhang et al. 2011). We have applied this approach to the analysis of the relationship between Australian and Asian wild AA genome wild rice populations and found the Australian wild rice species to be genetically distinct from closely related Asian AA genome wild rice.  Australia). Anther length was the primary morphological feature used for discrimination between Australian O. rufipogon (>3-7.4 mm length) and O. meridonalis (1.5-2.5 mm length).

DNA extraction and sequence analysis
DNA was extracted from leaf tissue of four individuals plants from each accession using a Qiagen DNeasy Plant kit (Qiagen, Hilden, Germany). Approximately 3 μg of total DNA from each sample was prepared for sequencing according to Illumina genomic, paired-end sample preparation protocol (Part # 1005063 Rev. A). DNA was sheared using an adaptive focused acoustics method on a Covaris S2 device with the following settings: duty cycle 10%; intensity 5; cycles per burst 200 for 180 sec at 6 • C.
Ligation products were purified by agarose gel electrophoresis (2% agarose, 120 V for 120 min). Fragments of predominantly 500 base pairs (bp) were excised from the gel and the products isolated with a QIAquick Gel Extraction kit (Qiagen, Hilden, Germany)without heating. PCR products were further purified with a QIAquick PCR Purification kit (Qiagen, Hilden, Germany) and quantified using a DNA 1000  less than 30 bp in length were discarded. Trimmed short read sequences were assembled by read mapping to a cultivated rice (O. sativa spp. japonica var. Nipponbare) chloroplast genome reference sequence (Genbank accession GU592207). Read mapping was undertaken in CLC Genomics Workbench with the following long-read parameters: global alignment, length fraction 0.9, similarity index 0.9, mismatch cost 3, deletion, and insertion costs 3. Match mode was random to allow for assembly of both inverted repeat regions and repetitive elements. In order to avoid contribution of less abundant nuclear and mitochondrial reads to the final consensus sequence, conflict resolution mode was vote majority. Consensus sequences for O. rufipogon and O. meridionalis were exported to Geneious 5.3 (www.geneious.com) and aligned with chloroplast genome sequences from Genbank ( Fig. 1) using Mauve (Darling et al., 2004). Genbank accessions included in the alignment were O. sativa japonica GU592207, O. nivara AP006728, O. sativa indica AY522329, and O. australiensis GU592209.
Appropriate nucleotide substitution models were selected using Modeltest and MrModeltest (Posada and Crandall 1998). Aligned data were analyzed under maximum parsimony (MP) and maximum likelihood (ML) criteria using the TVM + I model (G = 0.92) in PAUP* (www.paup.csit.fsu.edu) with gaps were treated as missing data. Heuristic searches were conducted with 200 random addition replicates and tree bisection-reconnection (TBR) branch swapping. Oryza australiensis was the outgroup in rooted trees with 2000 bootstrap replicates to evaluate nodal support. Bayesian phylogenetic analysis was conducted using MrBayes 3.1 (Ronquist and Huelsenbeck, 2004) using the GTR + I model. Two independent runs of 1 × 10 6 Monte Carlo Markov Chains (MCMC) were performed following burn in of 1 × 10 5 MCMC, each starting with a different random tree. Nodal support for Bayesian consensus trees was evaluated by posterior probability distribution. Consensus sequences were annotated using Dual Organellar Genome Annotator (DOGMA) (Wyman et al. 2004) and manually adjusted as needed before submission to Genbank. The alignment of seven chloroplast genomes was 134,701 bp in length. One of the inverted repeats (IR) was excluded from the alignment prior to phylogenetic analysis. The mod-  (Fig. 3). O. rufipogon and O. meridionalis chloroplast genomes from Australia differed by only 32 positions. The monophyly of Australian A genome wild rice was supported by 38 shared derived characters or synapomorphic SNPs (Table 2). Homoplasy in the dataset was not detected (homoplasy index = 0.00) and there were no derived characters shared between O. rufipogon chloroplast genome sequences from Asia and Australia. The monophyly of wild and cultivated Asian rice was supported by 16 synapomorphic SNPs.

Discussion
Previous analyses suggest the perennial Australian wild rice O. rufipogon is more closely related to Asian O. rufipogon than it is to the annual Australian wild rice O. meridionalis (Wang et al. 1992;Xu et al. 2005). Here we show that the plastome of Australian O. rufipogon is more closely This study utilized whole chloroplast sequence data that brings particular advantages to the analysis. Plastid genomes do not undergo recombination and are present in high copy number relative to nuclear loci (Takahashi et al. 2008). This attribute has been exploited for many studies including plant barcoding (CBOL 2009). However, until recently, a relatively small number of nucleotides have been routinely sampled for chloroplast based plant identification. For example, approximately 1450 base pairs from rbcL and matK were used as the foundation for a DNA barcode for land plants. Although useful, this approach only allowed discrimination of 72% species in a sample set of 907 species. The complete chloroplast genome has two orders of magnitude more information than the conventional rbcL and matK plant barcode loci and by accessing a greater number of characters, greater phylogenetic resolving power is possible.
Chloroplast DNA is maternally inherited in most angiosperms (Hagemann 2010). Interspecific hybridization can lead to "chloroplast capture" whereby the plastome of one species introgresses into another, and this has been used to explain inconsistencies between chloroplast and nuclear gene trees. Historical or more recent hybridization between sympatric populations of O. meridionalis and O. rufipogon in Australia provides an alternative explanation for the observed results.
During domestication, Asian cultivated rice went through a significant bottleneck and brought with it only 10-20% of the genetic diversity found within its progenitor species, O. rufipogon (Kovach and McCouch 2008). The genetic diversity within wild rice has been exploited to enhance cultivated rice, primarily by improving yield and agronomic traits (Kovach and McCouch 2008). In order to most effectively exploit the genetic diversity within wild rice, the hybrid offspring needs to be fertile. Oryza sativa is an AA genome species and other AA genome wild species are the most accessible in terms of generating fertile hybrid offspring, including O. rufipogon, O. nivara, O. barthii, O. longistaminata, O. glumaepatula, and O. meridionalis. Crosses between O. meridionalis, Australian O. rufipogon, and other AA genome Oryza species generate fertile hybrids and so the alleles within these species are available to O. sativa breeding programs following conventional crossing regimes . Because Australian AA genome wild rice has been largely isolated from O. sativa during the course of O. sativa domestication and cultivation, the Australian wild rice is a valuable source of novel alleles for rice improvement.
Oryza nivara is variously described as an annual ecotype of Asian O. rufipogon or as a separate species (Zheng and Ge 2010). The relationship between Australian O. rufipogon and O. meridonalis is somewhat similar with O. meridonalis until recently being described as an annual form of Australian O. rufipogon (Wang et al. 1992). In both cases the key differentiating feature is the life history of these species or ecotypes. Our results suggest the divergence of the Australian and Asian AA genome rice predates the divergence of O. nivara from Asian O. rufipogon and Australian O. rufipogon from O. meridonalis. If so, the appearance of the annual and perennial habits in each of these species and or ecotypes in Australia and Asia were separate events. Genetic and genomic analysis of Asian and Australian O. rufipogon, O. nivara, and O. meridonalis may allow identification of loci or gene networks that differentiate between the perennial and annual species or ecotypes in each of these cases.