Assessment of microsatellite and SNP markers for parentage assignment in ex situ African Penguin (Spheniscus demersus) populations

Abstract Captive management of ex situ populations of endangered species is traditionally based on pedigree information derived from studbook data. However, molecular methods could provide a powerful set of complementary tools to verify studbook records and also contribute to improving the understanding of the genetic status of captive populations. Here, we compare the utility of single nucleotide polymorphisms (SNPs) and microsatellites (MS) and two analytical methods for assigning parentage in ten families of captive African penguins held in South African facilities. We found that SNPs performed better than microsatellites under both analytical frameworks, but a combination of all markers was most informative. A subset of combined SNP (n = 14) and MS loci (n = 10) provided robust assessments of parentage. Captive or supportive breeding programs will play an important role in future African penguin conservation efforts as a source of individuals for reintroduction. Cooperation among these captive facilities is essential to facilitate this process and improve management. This study provided us with a useful set of SNP and MS markers for parentage and relatedness testing among these captive populations. Further assessment of the utility of these markers over multiple (>3) generations and the incorporation of a larger variety of relationships among individuals (e.g., half‐siblings or cousins) is strongly suggested.


Introduction
The growing role of captive institutions in the conservation of threatened species requires that they maintain sustainable and genetically diverse ex situ populations that can meaningfully contribute to in situ conservation (Lacy et al. 2013). Molecular tools have the potential to complement and validate traditional studbook-based genetic management of captive populations, with the goal of reducing the negative effects of inbreeding and loss of genetic diversity (Putnam and Ivy 2013). Complete pedigrees are required to effectively manage the genetic status of captive populations (Ivy and Lacy 2010), but these are not always available, as the parent-age of offspring is often uncertain (Putnam and Ivy 2013).
The endangered African penguin (Spheniscus demersus) is endemic to southern Africa, with 25 breeding colonies distributed along the coastline between central Namibia and St. Croix Island (Algoa Bay, South Africa). The population is declining despite multiple conservation interventions (IUCN Red List, BirdLife International, 2013) with an estimated 26,000 breeding pairs left (Crawford et al. 2011). Declines have been attributed to excessive egg and guano harvesting (Shelton et al. 1984), competition for food with seals (Crawford et al. 1992) and commercial fisheries (Frost et al. 1976), oil spills (Morant et al. 1981;Adams 1994;Underhill et al. 1999), loss of habitat, and climate change affecting prey distribution (Boersma 2008;Crawford et al. 2011).
African penguins breed well in captivity and are currently held in 11 zoos and aquariums across South Africa. Ex situ populations serve a number of different roles in conservation efforts including public education, resources for scientific discovery, and sources for supplementation or restoration of in situ populations (Lacy 2009). The latter has recently been identified as a potentially valuable conservation action, and looks likely to be implemented in the near future, necessitating a sound understanding of the genetic status of the captive populations. The African Association of Zoos and Aquaria (PAAZA) established a regional studbook as part of their African Preservation Programme as part of the ex situ management of this species. Similar to other studbooks, it uses the Single Population Analysis and Record Keeping System (SPARKS) developed by the International Species Information System (ISIS) and the PM2000 database program (Pollack et al., 2002). Studbook-based analyses indicated that 70.9% of the full pedigree information is known and that the population mean kinship is 0.02 (African Penguin Regional Studbook, 2011). The use of molecular methods to confirm parentage and analyze relatedness among ex situ individuals will complement studbookbased genetic management of the African penguin captive population.
Genealogical relationships among individuals in a population represent a simple concept in biology, but can be powerful when applied to answer evolutionary and ecological questions (Hauser et al. 2011). Pedigree information plays a central role in the study of diverse ecological and evolutionary topics, such as sexual selection, patterns of dispersal and recruitment, quantitative genetic variation, mating systems, and managing the conservation of populations of endangered species (Wang and Santure 2009;Jones et al. 2010). Molecular markers provide new possibilities in establishing genealogical relationships among individuals in populations where such information is difficult to collect from field observations (Pemberton 2008).
Microsatellites (MS) have been the marker of choice for parental assignment and reconstruction, owing to their high polymorphic information content (PIC) and wide availability (Glowatzki-Mullins et al. 1995;Hauser et al. 2011). However, these markers have several disadvantages including homoplasy, complex mutational patterns, and data analysis may be affected by genotyping errors (Angers et al., 2000;Hoffman et al., 2005). Despite being bi-allelic, resulting in lower resolving power, single nucleotide polymorphisms (SNPs) are becoming increasingly popular (Baruch and Weller 2008;Hauser et al. 2011) due to their low genotyping error rate (<0.1%), high-throughput screening applications, and the fact that SNPs are easier and cheaper to standardize between laboratories compared to microsatellites (Anderson and Garza 2006).
In parallel to the advances in genetic markers, many statistical methods have been proposed to analyze marker data for pedigree information (Jones and Ardren 2003). Jones et al. (2010) categorized parentage analysis techniques into six categories, namely exclusion, categorical allocation, fractional allocation, full probability parentage analysis, parental reconstruction, and sibship reconstruction. Exclusion-based methods compare the compatibility of offspring and parental genotypes with Mendelian inheritance, so that a putative parent is rejected as a true parent if both alleles at one locus mismatch with that of an offspring (Jones et al. 2010). Exclusion methods are appealing as they are simple in concept and implementation and quick in computation and do not require allele frequency information (Wang 2012). However, exclusion methods suffer from several weaknesses including false exclusion due to genotyping errors, valuable marker information is not fully utilized and exclusion rules are necessary, but insufficient for relationship inference (Jones et al. 2010;Wang 2012). A range of likelihood methods have been developed that seek to overcome these problems by determining probabilities of parentage assignment from simulations, Monte Carlo permutations or Bayesian approaches (Jones et al. 2010). Likelihood-based methods employ Mendel's laws quantitatively to calculate the likelihoods of different candidate relationships among a set of individuals and choose the relationship that has the highest likelihood as the best inference (Wang 2012).
In this study, we compare the power of parentage assignment of 31 SNPs and 12 MS markers in isolation and in combination in captive populations of African penguins. Development of a marker set that accurately determines parentage will provide information on the relationships and relatedness among individuals (e.g., extra-pair mating) and contribute to the management of captive African penguins worldwide.

Pedigrees and sampling
Blood samples were collected from 33 African penguins, which are housed in three captive facilities in South Africa: the Two Oceans Aquarium (Cape Town), the National Zoological Gardens of South Africa (Pretoria), and uShaka Sea World (Durban). All penguins are part of the permanent breeding population. Ten family-group pedigrees were constructed based on the regional studbook data (SPARKS) as shown in Figures 1 and 2

Molecular gender verification
For each individual, 30 lL of blood was collected on FTA paper. DNA was extracted using the Qiagen DNeasy â (Qiagen, Valencia, CA) Blood and Tissue Kit. The extraction protocol as outlined in the manufacturer's protocol was followed. Chromo Helicase DNA CHD (chromo-helicase-DNA-binding) gene-binding gene-based molecular sexing was conducted using the 2550F/2718R (Fridolfsson and Ellegren 1999) primer set. Promega GoTaq â Flexi DNA polymerase (Promega Corporation) Promega, Madison, WI was used for amplification in 25 lL reactions. The final reaction conditions were as follows: 19 PCR buffer, 1.5 mmol/L MgCl 2 , 200 lmol/L of each dNTP, 5 pmol of each of the forward and reverse primer, 0.25 U Taq DNA polymerase, and 10-20 ng genomic DNA template. A no template control as well as positive controls for a male and female bird of known sex was included. The conditions for PCR amplification were as follows: initial denaturation for 2 min (min) at 95°C, 30 cycles for 30 sec (sec) at 95°C, 30 sec at 50°C, and 2 min at 72°C, followed by final extension at 72°C for 10 min. The PCR was carried out in the BOECO TC-PRO Thermal Cycler. Amplicons were separated by electrophoresis in a 2% agarose gel for 45 min at 100 V in 19 Tris-borate-EDTA buffer. A single-band pattern was considered male (CHD-Z), while the two-band pattern was considered female (CHD-W/CHD-Z).

Microsatellite genotyping
A total of 12 microsatellite markers were typed as described in Schlosser et al. (2003) and Labuschagne et al. (2013). Promega GoTaq â Flexi DNA polymerase (Promega Corporation) was used for amplification in 12.5 lL reactions. The final reaction conditions were as follows: 19 PCR buffer, 1 mmol/L MgCl 2 , 200 lmol/L of each dNTP, 10 pmol of each of the forward and reverse primer, 1 U Taq DNA polymerase, and 50 ng genomic DNA template. The PCR was carried out in the BOECO TC-PRO Thermal Cycler. The conditions for PCR amplification were as follows: 5 min at 95°C denaturation, 30 cycles for 30 sec at 95°C, 30 sec at 50-60°C, and 30 sec at 72°C, followed by extension at 72°C for 40 min. PCR products were pooled and run against a GenescanTM 500 LIZTM internal size standard on an ABI 3130 Genetic Analyzer (Applied Biosystems, Foster City, CA). Samples were genotyped using GeneMapper v.4.0 (Applied Biosystems, Inc.).

SNP genotyping
A total of 31 SNP markers were typed as described in Labuschagne et al. (2012). These markers were developed via screening of a random genomic library. Thus far, these are the only SNP markers that have been reported for the species. Amplification was achieved using Dream TaqTM Green PCR Master Mix (29) supplied by Thermo Scientific, Lithuania. The PCR mix for each locus contained 12.5 lL of 29 Dream TaqTM PCR Master Mix (109 Dream TaqTM buffer, dATP, dCTP, dGTP, and dTTP, 0.4 mmol/L each, 4 mmol/L MgCl 2, and 1.25 U Dream TaqTM polymerase), 1 lL [10 lmol/L] of each primer, 50 ng of template DNA, and nuclease-free water to reach a final volume of 25 lL. Sequencing of resulting amplicons was conducted by Inqaba Biotechnical Industries (Pty) Ltd using the ABI Big Dye V3.1 kit and the ABI 3500XL Genetic Analyzer. Sequence data were screened and aligned using the Main workbench from CLC Bio (Denmark).

Parentage analysis
Parentage assignment was evaluated with likelihood-and exclusion-based approaches, using the MS and SNP data sets individually and combined. To assign parentage using a likelihood approach, we used the software program CERVUS v3.03 (Kalinowski et al. 2007). The program uses multilocus parental exclusion probabilities (Selvin 1980) and pairwise likelihood to assign parent pairs to offspring. CERVUS calculates the log-likelihood of each candidate parent being the true parent relative to an arbitrary individual and then calculates the difference between the two most likely parents (Delta, D). Critical values of D are determined by computer simulation. Using the real data for allele frequencies, simulation parameters were set at 10,000 offspring, with 100% of candidate parents sampled and a total proportion of loci typed over all individuals of 0.99, mistyping error rates = 0.01 and likelihood calculation error rates = 0.01, permitting two unscored loci. Strict confidence was set to 95%, while the relaxed confidence level was 80%. CERVUS was also used to calculate the summary statistics including allele number at each locus (k), observed heterozygosity (H obs ), expected heterozygosity (H exp ), polymorphic information content (PIC), average nonexclusion probability for one candidate parent (NE-1P), average nonexclusion probability for one candidate parent given the genotype of a known parent of the opposite sex (NE-2P), and significance of deviation from Hardy-Weinberg equilibrium (HW). Parentage assignment using exclusion was performed in PARFEX v1.0 (Sekino and Kakehi 2012). The exclusion method examines incompatibilities between putative parents and offspring genotypes based on Mendelian principles. Parentage assignments were made for zero, one, and two mismatches. PARFEX was further used to calculate a minimum marker set required for optimal parentage using the given data set through the PFX_Mchoice macro. The known parental genotypes are used to simulate offspring genotypes, which are then subjected to exclusion-based parentage testing with successive one-by-one addition of higher-ranked markers from which the cumulative success rate of parentage allocation is obtained (Sekino and Kakehi 2012). Markers are ranked through one of three statistics (proportion of unique alleles, polymorphic information content [PIC], and exclusion probability) and the success rate of parentage allocation defined as the number of simulated offspring whose true parental pair is unambiguously identified divided by the total number of offspring (Sekino and Kakehi 2012).

Results
The 33 individuals used in this study represented 17 males and 16 females according to the studbook data. Molecular sexing using the CHD gene verified the gender of all individuals. All samples were successfully genotyped, with the exception of one MS marker for one sample, while the SNP data set had five SNPs missing, affecting three samples. Genotyping was conducted once on all samples and was not repeated in cases of no amplification. Lack of amplification may be due to low sample quality. In total, 62 alleles were found over all 12 MS loci, with a mean PIC of 0.54 (Table 1). Thirty-one SNPs were identified with a mean PIC of 0.23 (Table 2). Deviations from HW and gametic disequilibrium were not observed for any of the markers. The NE-1P (average nonexclusion probability for one candidate parent) for the SNP set was 0.2126, 0.0389 for the MS set and 0.0082 for the combined data set. The SNP marker set presented with a mean expected heterozygosity of 0.2803, whereas the MS marker set was 0.5952. For the 33 samples collected, 25 parent-offspring relationships can be made from the studbook data (Figs. 1, 2). Among these relationships, nine are sire/dam/offspring trios ( Fig. 1C and F-J), seven single parent/offspring pairs (Fig. 1A-E), four sets have full-siblings (Fig. 1A, B, H and J), and two family groups include previous generations ( Fig. 1C and H). All potential maternal and paternal candidates were used in parentage analyses with no prior exclusions made with candidate subsets. Using the MS data set in PARFEX (Table 4), only 11 of the 25 relationships could be correctly assigned using the exclusion method (Fig. 3). The SNP data set performed better with 14 of the 25 relationships being assigned. When combining both data sets, 20 of the relationships could be assigned using the exclusion method (Figs. 1, 3). By applying the MS data in PARFEX, correct parents were mostly excluded due to a high number of mismatches, while in the SNP data set, there were often not enough differences to discern false parents from true parents ( Fig. 2; Tables 3 and 4). Using the MS data set in CERVUS ( Fig. 1; Table 3), 21 of the relationships could be correctly assigned when using a likelihood method. The SNP data set assigned 22 correct relationships with the same methodology (Fig. 1). When combining both data sets in CERVUS, all 25 relationships were correctly assigned (Fig. 1). Incorrect assignments with the MS data were limited to three family groups (Fig. 1B, D, and E), all single parent-offspring groups. All four Table 1. Parameters of genetic information content of 12 microsatellite loci estimated from ex situ population of African penguin. k = number of alleles; N = number of samples; H obs = observed heterozygosity; H exp = expected heterozygosity; PIC = polymorphic information content; NE-1P = average nonexclusion probability for one candidate parent; and NE-2P = average nonexclusion probability for one candidate parent given the genotype of a known parent of the opposite sex. assignments had low LOD scores (Fig. 1). Incorrect assignments with the SNP data were limited to two family groups ( Fig. 1I and J). The incorrect assignment in group I was made with 95% confidence, while both assignments in group J had 80% confidence. In contrast with the CERVUS MS data, the correct parent was assigned to PNN156 in group B. Dam PNN149 was the closest match although it contained two mismatches (Table 4). The remaining incorrect CERVUS assignments were also incorrect in PARFEX. A similar disparity was noted in the SNP data set where both parents are correctly assigned in group J for offspring PNN96 using PARFEX (Fig. 2). The incorrect assignments for groups I and J in CERVUS were nonexcluded in PAR-FEX. Several parents could be assigned without mismatches (Table 4). PFX_Mchoice only reached 99% accumulative success rate when ranking markers through exclusion probability or proportion of unique alleles. Using exclusion probability, 99% accumulative success rate was reached with 15 markers (10 MS and five SNPs). Using only these 15 markers, 22 of the 25 relationships could be assigned correctly. By ranking markers through the proportion of unique alleles, 99% accumulative success was achieved with 22 markers (11 MS and 11 SNPs). Using the 22 marker subset, 23 of the 25 relationships could be assigned accurately.
Ranking markers using PIC resulted in a 100% accumulative success rate with 34 markers (10 MS and 14 SNPs) (Fig. 4). All 25 relationships were assigned correctly when using these markers.

Discussion
As inaccuracies in the studbook can have implications on future genetic and demographic analysis and management of the captive population, a suitable validated marker set for genetic parentage verification is an important tool for captive management (Ivy and Lacy 2010). Such a marker set may not only exclude incorrectly recorded parents, but also help in assigning the correct individuals if sampled. We have described and verified a set of genetic markers for ascertaining parentage and sibling relationships in African penguins. Few published studies have investigated parentage or paternity in penguins, and to our knowledge, none have used SNP markers. Seven MS markers (including one, B3-2, employed in the present study) yielded a general exclusion probability (mother known) of 0.99 for little penguins (Billing et al. 2007), and eight MS markers (including one used in the present study -Sh1Ca9) yielded paternity exclusions of 0.94-0.99 for captive Adelie penguins (Sakaoka et al. 2014).
Concerning the discrimination power of both types of markers, MS and SNP, as expected, the MS markers with multiple alleles possible at each locus had an overall higher PIC value. Both marker sets had 62 independent alleles. However, with more loci, the optimized SNP marker set performed better than the MS marker set using both the exclusion and likelihood parental assignment methods. This study has indicated that the number of loci and their heterozygosity level may influence the power of markers for parentage exclusion approaches more than the number of independent alleles (Morin et al. 2004;Hauser et al. 2011). The power of molecular markers is also influenced by genotyping error (Kalinowski et al. 2007). The generally low error rate for SNPs is a definite advantage for parentage over the higher rates reported for MS markers (Walling et al. 2010;Hauser et al. 2011). However, as each locus adds linearly to the multilocus error, but provides diminishing information for parentage, even low error rates may become problematic as the number of loci screened becomes very large (Christie 2010;Hauser et al. 2011). The optimum number of loci should therefore be determined in preliminary experiments where the number of SNPs required may be less than commonly assumed (Christie 2010;Hauser et al. 2011). In the current study, we used PFX_Mchoice to establish whether a smaller subset of markers would achieve the same assignment power over the full combined marker set. A subset of 34 markers consisting of 10 MS markers and 14 SNP markers were identified that could accurately allocate all 25 parent-offspring relationships identified. Such a priori knowledge about a (PNN168)* 0.65 n/a n/a (PNN168) À2.81 n/a n/a (PNN168)* 3.46 n/a n/a PNN156 (PNN149)* 4.18 n/a n/a PNN135 À3.94 n/a n/a (PNN149)* 4.60 n/a n/a PNN165 (PNN141)* 5.57 n/a n/a (PNN141)* 3.33 n/a n/a (PNN141)* 2.24 n/a n/a PNN161 (PNN149) À2.36 n/a n/a PNN168 À7.12 n/a n/a (PNN149) 1.10 n/a n/a PNN175 (PNN141)* 8.49 n/a n/a (PNN141)* 5.79 n/a n/a (PNN141)* 2.70 n/a n/a PNN113 n/a n/a (PNN69)* 0.85 n/a n/a PNN80 À2.08 n/a n/a (PNN69)* 2.30 PNN122 n/a n/a (PNN74) À2.48 n/a n/a PNN80 À3.84 n/a n/a  Figure 3. Percentage correct parent-offspring assignments for all data sets using CERVUS and PARFEX. n/a n/a (PNN168) n/a 1 n/a n/a n/a 2 (PNN168) n/a PNN35, PNN135 n/a PNN43 n/a PNN156 0 n/a n/a PNN81, PNN135, n/a 1 (PNN149) n/a n/a PNN44, PNN141, n/a 2 PNN135 n/a (PNN149) n/a n/a PNN165 0 (PNN141) n/a (PNN141) n/a PNN35, PNN44, PNN81, PNN135, (PNN141), PNN168 n/a 1 n/a n/a n/a 2 PNN135 n/a n/a n/a PNN161 0 n/a n/a (PNN149) n/a 1 n/a n/a PNN35, PNN82 n/a 2 (PNN149) n/a n/a PNN43, PNN81, PNN135, PNN168 n/a PNN175 0 n/a n/a (PNN141) n/a 1 n/a n/a n/a 2 (PNN141) n/a PNN43, PNN135, n/a n/a PNN113 0 n/a n/a n/a PNN68, (PNN69) 1 n/a n/a n/a 2 n/a n/a PNN45, PNN80 n/a PNN39, PNN41, PNN45 PNN122 0 n/a PNN45 n/a PNN45 n/a PNN39, PNN45, (PNN74) 1 n/a PNN39, minimum set of markers providing a high resolution of parentage assignment helps reduce the experimental cost and labor involved in the subsequent parentage testing. As parentage inference is not concerned with inference of evolutionary history, ascertainment bias through discovery, in particular populations or genomic regions, does not bias the results of parentage inference (Anderson and Garza 2006). In effect, such ascertainment typically leads to an overrepresentation of SNPs at intermediate allele frequencies, an advantage in parentage inference (Anderson and Garza 2006). Those SNP markers with minor allele frequencies of 0.5 provide the most power for parentage inference, although little additional power is gained above frequencies of 0.4 (Anderson and Garza 2006). Choosing SNP markers with allele frequencies above 0.2 can achieve higher assignment power with fewer loci. Among the current 34 SNP markers, only 16 have heterozygosity above 0.3. Replacing the markers falling below these ranges with new marker with higher ranges may greatly improve the number of loci versus assignment power ration as well as provide a SNP-only marker set that takes full advantage of SNP marker benefits over MS markers. Advantages including low error rates, ease of typing, low-cost high-throughput genotyping, and SNP genotypes that are easily standardized across laboratories are all important factors for a multi-institutional studbook.

Conclusion
The aim of this study was to generate molecular genetic information to verify/complement studbook-based  Cumulative success rate(%) Figure 4. The cumulative success rate of parentage assignment based on exclusion with markers ranked on PIC value. The gray area encompasses all loci required to reach a 100% probability of assigning a correct parentoffspring relationship.
pedigree data from ex situ populations of African penguins. In addition, we compared the relative and combined utility of MS and SNP markers for parentage assignment. We found that a combined subset of these two types of markers attained a >99% correct cumulative parentage assignment probability. Information derived from this "optimal" marker set will be useful for future captive management of African penguins.