Skip to main content Skip to main navigation menu Skip to site footer
Type: Article
Published: 2016-09-12
Page range: 571–580
Abstract views: 57
PDF downloaded: 1

VARSEDIG: an algorithm for morphometric characters selection and statistical validation in morphological taxonomy

Facultad de Ciencias, Universidad de Vigo, 36310-Vigo, Spain
Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, D.C., 20560, USA
Departamento de Informática, Edificio Fundición, 26310-Universidad de Vigo, Vigo, Spain
Departamento de Informática, Edificio Fundición, 26310-Universidad de Vigo, Vigo, Spain
Departamento de Informática, Edificio Fundición, 26310-Universidad de Vigo, Vigo, Spain
Escuela Superior de Ingeniería Informática, Edificio Politécnico s/n, Campus As Lagoas, Universidad de Vigo, 32004-Orense, Spain
Facultad de Ciencias, Universidad de Vigo, 36310-Vigo, Spain
Grupo de Ictiología, Universidad de Antioquia, Medellín, Colombia
General Statistical method Fishes Morphological discrimination

Abstract

We present and discuss VARSEDIG, an algorithm which identifies the morphometric features that significantly discriminate two taxa and validates the morphological distinctness between them via a Monte-Carlo test. VARSEDIG is freely available as a function of the RWizard application PlotsR (http://www.ipez.es/RWizard) and as R package on CRAN. The variables selected by VARSEDIG with the overlap method were very similar to those selected by logistic regression and discriminant analysis, but overcomes some shortcomings of these methods. VARSEDIG is, therefore, a good alternative by comparison to current classical classification methods for identifying morphometric features that significantly discriminate a taxon and for validating its morphological distinctness from other taxa. As a demonstration of the potential of VARSEDIG for this purpose, we analyze morphological discrimination among some species of the Neotropical freshwater family Characidae.

 

References

  1. Breiman, L., Friedman, J., Olshen, R. & Stone, C. (1984) Classification and Regression Trees. Chapman and Hall/CRC, Wadsworth, 368 pp.

    Calenge, C. (2006) The package adehabitat for the R software: a tool for the analysis of space and habitat use by animals. Ecological Modelling, 197 (3–4), 516–519.
    http://dx.doi.org/10.1016/j.ecolmodel.2006.03.017

    Calenge, C. (2015) Analysis of habitat selection by animals. R package version 1.8-18. Version (03/2015). Available from: https://cran.r-project.org/web/packages/adehabitat/index.html (accessed 9 August 2016)

    Chessel, D., Dufour, A.B. & Thioulouse, J. (2004) The ade4 package-I- One-table methods. R News, 4 (1), 5–10.

    Dray, S. & Dufour, A.B. (2007) The ade4 package: implementing the duality diagram for ecologists. Journal of Statistical Software, 22 (4), 1–20.
    http://dx.doi.org/10.18637/jss.v022.i04

    Dray, S., Dufour, A.B. & Chessel, D. (2007) The ade4 package-II: Two-table and K-table methods. R News, 7 (2), 47–52.

    Dray, S., Dufour, A-B. & Thioulouse, J. (2015) Analysis of ecological data : Exploratory and euclidean methods in environmental sciences. R package version 1.7-2. Version (03/2015). Available from: https://cran.r-project.org/web/packages/ade4/index.html (accessed 9 August 2016)

    Ekstrom, C., Skovgaard, I. M. & Martinussen, T. (2015) Datasets and functions from the (now non-existing). R package version 0.1-14. Version (03/2015). Available from: https://cran.r-project.org/web/packages/kulife/index.html (accessed 9 August 2016)

    Fox, J. & Weisberg, S. (2011) An R Companion to Applied Regression. Thousand Oaks Sage, CA, 312 pp.

    Fox, J., Weisberg, S., Adler, D., Bates, D., Baud-Bovy, G., Ellison, S., Firth, D., Friendly, M., Gorjanc, G., Graves, S., Heiberger, R., Laboissiere, R., Monette, G., Murdoch, D., Nilsson, H., Ogle, D., Ripley, B., Venables, W. & Zeileis, A. (2014) Companion to applied regression. R package version 2.0-20. Version (05/2014). Available from: https://cran.r-project.org/web/packages/car/index.html (accessed 9 August 2016)

    Friendly, M. & Fox, J. (2015) candisc: Visualizing Generalized Canonical Discriminant and Canonical Correlation Analysis. R package version 0.6-5. Version (03/2015). Available from: https://cran.r-project.org/web/packages/candisc/index.html (accessed 9 August 2016)

    Friendly, M. (2007) HE Plots for Multivariate General Linear Models. Journal of Computational and Graphical Statistics, 16 (2), 421–444.
    http://dx.doi.org/10.1198/106186007X208407

    Guisande, C. & Vaamonde, A. (2012) Gráficos estadísticos y mapas con R. Ediciones Díaz de Santos, Madrid, 367 pp.

    Guisande, C., Barreiro, A., Maneiro, I., Riveiro, I., Vergara, A.R. & Vaamonde, A. (2006) Tratamiento de datos. Ediciones Díaz de Santos, Madrid, 356 pp.

    Guisande, C., Manjarrés-Hernández, A., Pelayo-Villamil, P., Granado-Lorencio, C., Riveiro, I., Acuña, A., Prieto-Piraquive, E., Janeiro, E., Matías, J.M., Patti, C., Patti, B., Mazzola, S., Jiménez, S., Duque, V. & Salmerón, F. (2010) IPez: An expert system for the taxonomic identification of fishes based on machine learning techniques. Fisheries Research, 102 (3), 240–247.
    http://dx.doi.org/10.1016/j.fishres.2009.12.003

    Guisande, C., Vaamonde, A. & Barreiro, A. (2011) Tratamiento de datos con R, SPSS y STATISTICA. Ediciones Díaz de Santos, Madrid, 978 pp.

    Leigh, J.W. & Bryant, D. (2015) Monte Carlo strategies for selecting parameter values in simulation experiments. Systematic Biology, 64 (5), 741–751.
    http://dx.doi.org/10.1093/sysbio/syv030

    Locher, R. & Ruckstuhl, A. (2014) Utilities of Institute of Data Analyses and Process Design. R package version 1.1.17. Version (03/2014). Available from: https://cran.r-project.org/web/packages/IDPmisc/index.html (accessed 9 August 2016)

    Lucena, C.A.S. (1987) Revisão e redefinição do gênero Neotropical Charax Scopoli, 1777 com a descrição de quatro espécies novas (Pisces; Characiformes; Characidae). Comunicações do Museu de Cências da PUCRS, 40, 5–124.

    Matías, J.M., Rivas, T., Martín, J.E. & Taboada, J. (2008) A machine learning methodology for the analysis of workplace accidents. International Journal of Computer Mathematics, 85, 559–578.
    http://dx.doi.org/10.1080/00207160701297346

    Matías, J.M., Taboada, J., Ordóñez, C. & Nieto, P.G. (2007) Machine learning techniques applied to the determination of road suitability for the transportation of dangerous substances. Journal of Hazardous Materials, 147, 60–66.
    http://dx.doi.org/10.1016/j.jhazmat.2006.12.042

    Mattox, G.M.T. & Toledo-Piza, M. (2012) Phylogenetic study of the Characinae (Teleostei: Characiformes: Characidae). Zoological Journal of the Linnean Society, 165 (4), 809–915.
    http://dx.doi.org/10.1111/j.1096-3642.2012.00830.x

    Pohar, M., Blas, M. & Turk, S. (2004) Comparison of Logistic Regression and Linear Discriminant Analysis: A simulation study. Metodološki zvezki, 1 (1), 143–161.

    R Development Core Team (2015) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. Version (02/2015). Available from: http://www.R-project.org/ (accessed 9 August 2016)

    Ripley, B., Venables, B., Bates, D., Hornik, K. & Gebhardt, A. (2015) MASS: Support functions and datasets for Venables and Ripley's MASS. R package version 7.3-39. Version (02/2015). Available from: https://cran.r-project.org/web/packages/MASS/index.html (accessed 9 August 2016)

    Tabachnick, B.G. & Fidell, L.S. (1996) Using Multivariate Statistics. HarperCollins, New York, 110 pp.

    Venables, W.N. & Ripley, B.D. (2002) Modern Applied Statistics with S. Springer-Verlag, New York, 495 pp.
    http://dx.doi.org/10.1007/978-0-387-21706-2