Population genetics and Geographic Information Systems

GEO 565: Annotated Bibliography

By Josef Uyeda

 

Phylogeography and especially the emerging field of landscape genetics are experiencing rapid growth due to increased availibility of spatial data and improved methods for analyzing this data using GIS. The goal of phylogeography is to understand the distribution of a taxon or taxa in a biogeographic and phylogenetic context. Analysis of current distributions can reveal the geographic features which limit range distributions, promote speciation or lead to secondary contact. The emerging field of ecological niche modeling is particularly promising, as it uses locality records to extract data from overlayed layers containing climatic and ecological data and uses this data to construct a model of the ecological niche of a species. This allows for prediction of species distributions, secondary contact, and revelation of the climatic variables which influence species distributions. Landscape genetics seeks to determine the landscape features that determine population structure on a much smaller scale than phylogeographic studies. My annotations are primarily focused on the analytical methods used in each paper so that I can quickly identify papers which will be useful for my own research applications. Consequently, I focus primarily on methods and results, including the names of software used when applicable.

 

 

DATA SOURCES

Global Biodiversity Information Facility

www.gbif.org

The Global Biodiversity Information Facility (GBIF) is a large catalog of locality records from museums across the world. The website provides open access to a wide variety of taxa, with particularly good representation of vertebrates due to integration with taxa specific databases such as Herpnet, Fishbase and MaNis. While each of these databases has an extremely useful website of their own, the integration of all of these sites into one allows for queries of entire communities of organisms to easily be obtained. This georeferenced data can be downloaded and imported into a GIS, complete with attributes such as coordinate precision, locality name, collector, museum where the specimen is held, year of collection and comments about the specimen (ie age, sex, gravidity etc.). Importing into a GIS allows for spatial analysis of population distributions, past and present, using existing data. The search option allows you to specify the scientific, common or english name of a taxon and to narrow to a single country of origin. In addition, there is a Google Earth page that allows you to obtain all the records of a taxon using its scientific name in a .kml file, which can then be opened in Google Earth for visualization of population distributions. The query builder allows you to choose predefined icons to represent localities, or import  your own graphics. In addition, you can use the website to create your own .kml files from existing data or build advanced queries in which you can specify dates of collection and coordinate precision and display different values for these attributes differently on the .kml file. While these features are great for visualization of population distributions, attributes are not easily accessed once in the Google Earth environment. Each data point will open a pop-up window in which you can click “View GBIF record”, however, this opens a web page that typically takes a while to load and does not allow for opening a single table with all locality records (and clearly, spatial analyses).

 

 

JOURNAL ARTICLES

POPULATION GENETICS

Piertney SB., MacColl ADC., Bacon PJ., and JF. Dallas. 1998. Local genetic structure in red grouse (Lagopus lagopus scoticus): evidence from microsatellite DNA markers. Molecular Ecology 7:1645-1654.

Key methods/concepts: Mantel test, Kriging interpolation on allele frequency PC1 scores

In this study, the authors examine variation at 7 mitochondrial loci in 14 populations of L. l. scoticus in Scotland, UK.  Despite the fact that birds typically disperse long distances, strong philopatry among male red grouse and localized food sources results in strong population structure. Population genetic structure was tested for isolation by distance by using a Mantel test on Rst estimates. This procedure tests for correlation between the degree of genetic divergence and the logarithm of euclidean geographical distance between populations. Principal component analysis (PCA) of allele frequencies of populations were used to cluster populations. The first principle component (PC1) was then used to delineate barriers to gene flow by employing a kriging interpolation procedure (Journel and Huijbregts 1978). The method interpolates hypothetical PC1 scores across the entire sampling area based on the scores at the actual data points. The software used was Surface III (Kansas Geological Survey). Geographic barriers were identified by locating the areas where the most change occurred in PC1 scores across the surface, and these boundaries were overlaid onto a vairety of maps to determine which features were responsible (using DeeCAMP GIS database). A significant Mantel test demonstrated an isolation by distance effect. PCA identified two clusters of populations, a group found north of the River Dee and a group south of the River Dee. A three dimensional contour plot of interpolated PC1 scores revealed 3 major “planes”, one north of the river, and two south of the river. The change in slope between these regions is greater than what would be predicted by isolation by distance alone, indicating barriers to gene flow and correspond to poor grouse habitat (agricultural and forestry land).

 

Powell M., Accad A., and A. Shapcott. 2005. Geographic information system (GIS) predictions of past, present habitat distribution areas for re-introduction of the endangered subtropical rainforest shrub Triunia robusta (Proteaceae) from south-east Queensland, Australia. Biological Conservation 123:165-175.

Key methods/concepts: Habitat modeling, Ecological niche modeling, predicting species distributions, planning re-introduction sites

Extensive destruction of habitat for the endangered shrub Triunia robusta has resulted from clearing of the Queensland region. The authors use habitat modeling to predict the present and past distributions of T. robusta. The distribution of T. robusta is modeled given two different layers; one representing pre-European vegetation types and one representing current remnant vegetation types. They test the hypothesis that the current distribution of T. robusta is a remnant of a formerly widespread distribution that has been shaped by clearing of habitat. They also seek to identify new sites that have a high potential for containing undiscovered populations of T. robusta. The authors georeferenced all known populations of T. robusta by undertaking a field survey using GPS units. The aspect, elevation, distance from a watercourse (where within 50 m), vegetation type (determined by indicator species) and slope of each locality was collected by the authors. The authors used ArcGIS and utilized their own primary data as well as geological and drainage layers obtained from government agencies. The environmental envelope for T. robusta was determined using the data layers: elevation, aspect, slope, geology, distance to drainage and vegetation type. Error estimates were derived by comparing field derived values to sites with known values with a Mann-Whitney test. Attributes of the data layers were extracted from the known localities and queries built for the whole study site to determine areas likely to contain suitable habitat. Habitat was assigned into one of three probability levels based on how many attribute classes were in common with known population ranges. Sites suitable for re-introduction were determined by evaluating habitat suitability, likely pre-clearing distributions, and spatial relationship to current populations. The model constructed yielded mixed results, as the resolution of the DEM used was not sufficient in highly heterogenous areas and also tended to overpredict potential habitat in the headwaters of creek systems. While 4 new populations were located based on predicted distributions, many areas of predicted habitat were unoccupied. The authors identified 7 areas as potential reintroduction sites that would reduce isolation between populations and improve connectivity.

 

Manier MK., and SJ. Arnold. 2006. Ecological correlates of population genetic structure: a comparative approach using a vertebrate metacommunity. Proceedings of the Royal Society B 273:3001-3009.

Key methods/concepts: Multiple regression analysis, Spatial analysis of molecular variance

Population genetic structure was determined from allelic variation at microsatellite loci for a vertebrate metacommunity consisting of the competing predators Thamnophis elegans and T. sirtalis and the prey species Bufo boreas in a Lassen Co, California ecosystem. The authors used a Mantel test to test the correlation between genetic distance between populations and log geographc distance using the software Arlequin v. 2000. Clustering of populations based on allele frequencies was performed using analysis of molecular variance (AMOVA) as implemented by the software SAMOVA v. 1.0 (Spatial Analysis of Molecular Variance). Latitude and longitude for each population was used to assign populations to a user-defined number of maximally differentiated groups. Values of Fst, effective population size (Ne) and migration rate (m) were obtained from microsatellite data. Multiple linear regression was used to determine which habitat characteristics best explained Fst, Ne and m. Habitat characteristics tested included site perimeter, elevation and pairwise distance between populations as well as abundance of the other species examined. Clustering of populations revealed a 300 m escarpment separated populations of both species of Thamnophis. Multiple regression revealed that Ne in T. sirtalis was positively correlated with elevation and negatively correlated with nearest-neighbor elevation. T. elegans migrated mostly to nearby sites that have the most T. elegans, the least T. sirtalis, are lower in elevation and have shallower water depths. T. sirtalis migrated mostly to sites with more T. sirtalis. B. boreas migrated mostly to nearby sites that were deeper, had fewer snakes, were higher in elevation and were larger and nearby. For all three species, variation in geographical distance was primarily related to distance between populations. In summary, multiple regression identified geographic distance, habitat variables and interacting species as important determinants of genetic parameters. 

 

Spear SF., Peterson CR., Mactocq MD., and A Storfer. 2005. Landscape genetics of the blotched tiger salamander (Ambystoma tigrinum melanostictum). Molecular ecology 14:2553-2564.

Key methods/concepts: GIS modeling of dispersal paths, Least-cost paths using ArcINFO, Partial Mantel tests, BIOENV

The authors use microsatellite markers in the salamander Ambystoma tigrinum to identify the landscape characteristics that determine genetic structure via GIS data. A total of 10 ponds were sampled across Yellowstone National Park. A Mantel test was used to test for isolation by distance between populations as implemented by the software ARLEQUIN v. 2.0.1.1. Six possible models of movement between localities were proposed and tested based on how well each model explained the observed Fst values. The null model was a straight line path between sites. The other models were determined using a GIS DEM of the area’s topography and included: topographically (1) A corrected straight line distance based on topography (2) A “stepping stone” model that utilized known localities of salamanders as intermediate steps between straight line transects (each route was digitized using ArcGIS) (3) Least cost route that minimized slope change (4) Least cost route that moved along wetlands (5) A combination of (3) and (4). The authors used a GIS raster that quantified the probabiliy of finding a wetland in each pixel. Least cost paths were developed in ArcINFO by creating cost-distance grids in which each cell surrounding a known locality site is assigned a cost by the slope and the reverse of the wetland probability. Thus, cells with high slope and low wetland likelihood were assigned high cost, and paths were determined that would minimize the cost of the path. Each path was evaluated based upon mean wetland likelihood across the path, slope, vegetation cover type, and number of stream crossings. Partial Mantel tests using the program FSTAT to obtain the percent of the variation in Fst explained by each landscape variable evaluated. Each of the path models were evaluated according to the Corrected Akaike Information Criterion (AICc). In addition to this method, they used a similar procedure by implementing BIOENV using the program PRIMER v. 5.2.4. This method calculates weighted Spearman rank correlation coefficients between the genetic distance matrix and landscape variables.  The partial Mantel tests found topographic distance to be the most consistent predictor of genetic structure, although inclusion of other variables significantly increased the amount of variation explained. Distance and elevation were positively correlated with genetic distance while rivers and open shrub habitat were negatively correlated. The full model (5) followed by the wetland likelihood model (4) were the best and second best models, respectively, while the straight line distance and least cost based on slope (3) performed the worse. The BIOENV procedure corroborated these results.

 

Kidd DM, and MG Ritchie. 2000. Inferring the patterns and causes of geographic variation in Ephippiger ephipigger (Orthoptera, Tettigoniidae) using geographical information systems (GIS). Biological Journal of the Linnean Society 71:269-295.

Key concepts/methods: Principle component analysis of traits, surface creation, exploratory data heuristics, Idrisi surfaces

The Ephippiger cricket complex in the Alps of Europe are a complex of species. This study combines trait data from previous studies which were geocoded with varying levels of precision and accuracy (this study is still rather early and many papers rarely gave exact coordinates for localities). Digital elevation models (DEMs) were downloaded from USGS, as well as solar irradiation and annual average precipitation. Irradiation and precipitation were interpolated into continuous surfaces using the Idrisi linear contour interpolator, which is a distance weighting method. Using the Idrisi PCA function, the authors determined the axes across the trait surfaces that explained the most variation in the observed traits. PC2 divided consistently divided the groups into northern and southern groups. Discriminant analysis was performed to test classification of northern and southern populations. They also implemented multiple regression of body size on altitude, irradiation, precipitation, lat and long, and distance from the sea to create a surface of body size across the landscape. The authors found several patterns of correlation between phenotypic traits and environmental clines. These patterns could be divided into two categories, general environmental clines causing ecotypic variation, and historical divergence resulting in trait divergence. This paper is also very important as it is the first instance I have seen of a journal article that includes a 3D tittilator plot.

 

Ritchie MG, Kidd DM, and JM Gleason. 2001. Mitochondrial DNA variation and GIS analysis confirm a secondary origin of geographical variation in the bushcricket Ephippiger ephippiger (Orthoptera: Tettigonoidea), and resurrect two subspecies. Molecular Ecology 10:603-611.

Key concepts/methods: RFLPs compared to predictions from GIS analysis for contact zones between geographical variants, partial Mantel tests

            This paper examines concordance of interpolated character clines in putative secondary contact zones of the bushcricket. The data include behavioral, morphological and allozyme data that are interpolated across the geographic surface and compared with bioclimatic layers for covariance and concordance. Environmental variables explained only body size, and few clines were concordant with subspecific designations. In this paper, the authors generated matrices of geographic data including geographic distance (generated using ArcInfo and mapping predicted distances for cricket dispersal), environmental dissimilarity (obtained from Kidd and Ritchie 2000) and vicariant models based on potential refugia during the last ice age. The response matrix for this study was the genetic distance matrix based on RFLPs. The different independent variables represented the different matrices, which were tested against the response matrix with a partial Mantel test. Isolation by distance is not supported, while environmental variables only approach significance. Historical refugial models perform the best, indicating past isolation has the greatest explanatory power for determining genetic distances. This paper has a cool figure of a neighbor-joining phylogeny overlaid on the geography of the region, which is likely created in GIS software. In conclusion, the author’s find strong support for vicariant hypotheses using GIS and Mantel tests, thus resulting in their resurrection of subspecific names to indicate geographic variants.

 

ECOLOGICAL NICHE MODELING

Phillips SJ., Anderson RP., and RE. Schapire. 2006. Maximum entropy modeling of species geographic distributions. Ecological Modelling 190:231-259.

Key methods/concepts: Maximum entropy niche modeling, Maxent

A common problem in species distribution data is that only presence is likely to be available, while absence data is almost always lacking. Consequently, the Maxent approach to modeling species distributions with only presence data is particularly valuable. The performance of this model is tested against the other commonly used presence-only data model, GARP, and other methods of estimating species ranges. An interesting complication with presence-only data presented in this paper is that localities are always assumed to be source populations, never sink populations; thus affecting the accuracy of predicted distributions. Furthermore, care must be taken when choosing layers and locality data (e.g. current landcover layer would not work well with a collection locality from the 1700s!). Maxent seeks to approximate the desired species probability distribution using everything that is known via locality data about the habitat requirements (extracted from layers) and maximizes the probability distribution subject to the constraints of what is known. “It agrees with everything is known, but carefully avoids assuming anything that is not known”. This paper goes over the many advantages of the Maxent approach, and paints a compelling picture for its use in habitat niche modeling. Maximum entropy modeling is a rapidly growing field of statistics with applications in numerous diverse fields, and thus has a robust literature associated with it. As an initial test of the Maxent approach, it is compared to the GARP model for Bradypus variegatus and Microryzomys minutus in South America. The study includes the data sources and climate, elevation and vegetation layers used. The authors tested models by using only a subset of the data and using other localities to test the model. Both Maxent and GARP outperformed random prediction, and Maxent generally outperformed GARP, with very few test localities omitted from the predicted distribution.

 

Kozak KH., and JJ. Wiens. 2006. Does niche conservatism promote speciation? A case study in North American salamanders. Evolution 60(12):2604-2621.

Key methods/concepts: Maxent, Utilizing both presence and absence data, testing hypotheses of speciation with a GIS

Ecological niche modeling could produce three possible options for a pair of sister species isolated from one another: (1) Niche conservatism results in spatial overlap of predicted ranges (2) Niche divergence results in no spatial overlap of predicted ranges (3) Other factors influencing range distributions result in spatial overlap in the current ranges as well as the intervening areas of species absence. The authors tested these predictions using 16 pairs of sister species of montane North American salamanders. Specimen localities for each species were imported into ArcGIS and ranges were estimated by enclosing the points with a minimum convex polygon. Degree of overlap was determined using Lynch’s method to predict ancestral distributions by summing the ranges of all species in a given clade, as well as ARCs using “the nested averages of pairwise overlaps between all species in a clade” (quotes because I don’t understand enough to paraphrase!). Degree of overlap was then regressed against age of most recent common ancestor. To test hypothesis of the factors that promote speciation, the authors utilized ecological niche modeling using maximum entropy methods as implemented by the software Maxent. This program computes probability distributions of habitat suitability over an entire grid by utilizing all the information contained in known species localities while avoiding unfounded constraints. It expresses the probability of finding a species in a grid as the function of environmental variables at known localities; which is then used to generate a species probability distribution over the entire data frame. Climatic data was extracted based on applicability to amphibian life history from the WORLD-CLIM dataset. Predicted ranges were imported from Maxent into DIVA-GIS v. 5.2. Minimum convex polygons were then constructed for absence localities in between the minimum convex polygons constructed earlier with localities known to have salamander species, but not the species pair of interest. To test the niche conservatism hypothesis, known localities of a species were mapped onto their sister species predicted range distribution and the cumulative probability of occuring at that site were extracted (a low probability of occurrence would reject the hypothesis). Similarity of climatic niches was evaluated using PCA to generate a “climate distance” between sister taxa. ARCs and Lynch’s method supported allopatric speciation as the primary mechanism for species divergence. Most allopatric taxa exhibited the pattern expected by niche conservatism. PC1 generally represents climatic variables that are typical of montane environments (temperature stability, precipitation and low temperatures) and explains the majority of the variation. However, some species pairs show significant niche divergence and support the niche divergence hypothesis. Parapatric sister species, in contrast to allopatric taxa, demonstrate niche divergence but borders of relatively broad overlap. This study shows how the climatic variables that promote speciation can be determined using GIS and ecological niche modeling.

 

Cicero, C. 2004. Barriers to sympatry between avian sibling species (Paridae: Baeolophus) in local secondary contact. Evolution, 58(7):1573-1587.

Key methods/concepts: Bioclimatic modeling applied to identifying secondary contact, testing endogenous vs. exogenous adaptation

Cicero sought to determine the whether local adaptation led to boundaries between species ranges and isolation, or whether interactions in secondary contact zones caused isolation and set range limits. To test these different models, Cicero examined sister species of titmouse, Baeolophus inornatus and B. ridgewayi using ecological niche modeling. In a previous paper (Cicero 1996), canonical correlation analysis was used across the ranges of B. inornatus and B. ridgewayi and found significant correlation between species presence and a number of climatic and geological variables. A total of 15 climatic variables were extracted using the BIOCLIM algorithm at 1-km resolution using the program DIVA-GIS. Most of these variables were associated with the different precipitation and temperature regimes characteristic of the different habitats of the two species. Locality data from museum specimens were used to extract the variables. Bioclimatic envelopes were then generated that explained 90-95% of the range of bioclimatic variables extracted from the locality data. Significant differences were found for all but one of the climatic variables for the two species. The predicted distributions of the two species barely overlapped, including at the recently identified secondary contact zone, the Modoc plateau. Models performed reasonably well in predicting gaps and breaks in population distributions that are evident via molecular methods. Examination of predicted contact zones revealed that these sites had steep clines in climatic variables, and the sites of contact represent ecotones between the wet coastal climate and the drier, more extreme temperature climate of the Great Basin. This study was also able to detect slight asymmetry in the abilities of some species to occupy the niche of the other. I have included the figure of the predicted distributions generated with DIVA-GIS as an example of the utility of this method.

 

Anderson RP, Peterson AT, and M Gomez-Laverde. 2002. Using niche-based GIS modeling to test geographic predictions of competitive exclusion and competitive release in South American pocket mice.

 Key concepts/methods: Testing hypotheses of competitive exclusion and competitive release using GARP

The authors using the Genetic Algorithm for Rule-Set Predictions (GARP; http://biodi.sdsc.edu) to model the distribution of pocket mice in South America. GARP utilizes bioclimatic-envelope rules as well as logistic regression to generate predicted distributions of species. The competitive exclusion principle hypothesizes that two species occupying the same niche will not be found in the same geographic area. Thus, sister species that occupy the same niche will not occupy the same geographic region, even though their predicted ranges will overlap. Competitive release is the idea that if the dominant species is removed from such a site, then the weaker competitor will expand its range into habitat that it is not able to inhabit with the presence of the competitor. Physical, biotic and climatic variables such as elevation, slope, aspect, soil, vegetation, solar radiation, temperature and precipitation were extracted from GIS coverages using known localities for the two species of mice. Predicted distributions revealed areas of limited overlap between the two species. Although the two species occupied different bioclimatic regimes, there was considerable overlap in the predicted envelope for most variables examined. The results of this study supported the competitive release hypothesis for Heteromys anomalus. This was demonstrated the significantly different bioclimatic variables between H. anomalus from H. australis in areas of predicted contact, while there is no significant difference in areas where H. australis is absent due to historical reasons. In areas of predicted sympatry, H. australis was the predominant species found, again supporting competitive exclusion of H. anomalus by H. australis.

 

SOFTWARE

Coming soon:

DIVA-GIS

 

MAXENT