Summary

Microsatellite markers are short, highly variable, multi-repeat DNA sequences (aka short tandem repeats) that appear throughout the genome and can be used to estimate population genetic metrics (Silva, Liu, and Blanton 2006), (Vieira et al. 2016). These markers are frequently evaluated using fragment analysis which is based on Sanger sequencing. The pooledpeaks R package provides tools to analyze fragment analysis results (.fsa files). It provides functions that fall in three subcategories: 1) peak scoring, 2) data manipulation, and 3) genetic analysis. The package was designed for the use of microsatellite markers on pooled parasite samples, but the peak scoring functions are applicable to any fragment analysis. The peak scoring functions were partially adapted from Fragman, a package designed to score microsatellite markers in cranberries (Covarrubias-Pazaran et al. 2016). Although Fragman works for the older file version, newer versions cannot be read. In addition to revising this outdated function, we also added features including expanded scoring parameter options and exporting resulting scoring plots as a pdf file for review. The data manipulation functions were created to clean and format the data from the called peaks and transform them into allele frequencies. These frequencies can then be input into the genetic analysis functions for calculation of diversity and differentiation measures adapted from a range of papers (Long et al. 2022),(Jost 2008),(Nei 1973),(Foulley and Ollivier 2006),(Chao et al. 2008). An in-depth walk-through of how to use the analysis pipeline can be found in the vignette.

Statement of Need

While a plethora of methods exist for downstream statistical analysis of allele frequencies, processing raw fragment data is limited by available software. Of the limited software that can read the .fsa binary raw data file format, nearly all require purchase or registration, are primarily built for windows, are inefficient for analyzing large batches of files, and are highly dependent on individual researcher experience. Additionally, a previous R package allowing for the analysis of .fsa files is incompatible with the updated file version. When using fragment analysis for microsatellite markers on pooled samples, once the raw data is extracted and scored, it must be cleaned and transformed into allele frequencies using a second software, such as excel, which is limited in its capacity for automation and version control. Another platform shift is often required to analyze the resulting allele frequencies. These factors highlight the need for a comprehensive scoring and analysis pipeline that is open-source, offline, reproducible, consistent between researchers, and that does not require platform switching between steps.

Ongoing Research Projects

This package is currently being used to analyze genetic clustering of Schistosoma mansoni pooled egg samples from four Brazilian communities, as well as the relatedness of Schistosoma haematobium populations around Lac de Guiers in Senegal and from Gabon.

Acknowledgements (Financial)

This work was financially supported by the NIH as part of 1R01AI121330.

References

Chao, Anne, Lou Jost, S. C. Chiang, Y.‐H. Jiang, and Robin L. Chazdon. 2008. “A TwoStage Probabilistic Approach to MultipleCommunity Similarity Indices.” Biometrics 64 (4): 1178–86. https://doi.org/10.1111/j.1541-0420.2008.01010.x.
Covarrubias-Pazaran, Giovanny, Luis Diaz-Garcia, Brandon Schlautman, Walter Salazar, and Juan Zalapa. 2016. “Fragman: An R Package for Fragment Analysis.” BMC Genetics 17 (1): 62. https://doi.org/10.1186/s12863-016-0365-6.
Foulley, Jean-Louis, and Louis Ollivier. 2006. “Estimating Allelic Richness and Its Diversity.” Livestock Science 101 (1-3): 150–58. https://doi.org/10.1016/j.livprodsci.2005.10.021.
Jost, Lou. 2008. G\(_{\textrm{{ST}}}\) and Its Relatives Do Not Measure Differentiation.” Molecular Ecology 17 (18): 4015–26. https://doi.org/10.1111/j.1365-294X.2008.03887.x.
Long, Jeffrey C., Sarah E. Taylor, Lucio M. Barbosa, Luciano K. Silva, Mitermayer G. Reis, and Ronald E. Blanton. 2022. “Cryptic Population Structure and Transmission Dynamics Uncovered for Schistosoma Mansoni Populations by Genetic Analyses.” Scientific Reports 12 (1): 1059. https://doi.org/10.1038/s41598-022-04776-0.
Nei, Masatoshi. 1973. “Analysis of Gene Diversity in Subdivided Populations.” Proceedings of the National Academy of Sciences 70 (12): 3321–23. https://doi.org/10.1073/pnas.70.12.3321.
Silva, L. K., S. Liu, and R. E. Blanton. 2006. “Microsatellite Analysis of Pooled Schistosoma Mansoni DNA: An Approach for Studies of Parasite Populations.” Parasitology 132 (3): 331–38. https://doi.org/10.1017/S0031182005009066.
Vieira, Maria Lucia Carneiro, Luciane Santini, Augusto Lima Diniz, and Carla De Freitas Munhoz. 2016. “Microsatellite Markers: What They Mean and Why They Are so Useful.” Genetics and Molecular Biology 39 (3): 312–28. https://doi.org/10.1590/1678-4685-GMB-2016-0027.