ARPHA Conference Abstracts : Conference Abstract
Print
Conference Abstract
VTAM: A robust pipeline for validating metabarcoding data using optimized parameters based on internal controls
expand article infoEmese Meglécz, Vincent Dubut, Emmanuel Corse§, Aitor González|
‡ Aix Marseille Univ, Avignon Université, CNRS, IRD, IMBE, Marseille, France
§ Centre Universitaire de Mayotte/ MARBEC, CNRS, Ifremer, IRD, University of Montpellier, Dembeni, France
| Aix Marseille Univ, INSERM, TAGC, Turing Center for Living Systems, Marseille, France
Open Access

Abstract

Metabarcoding has become a powerful approach to study biodiversity from environmental samples but it is still prone to some pitfalls. Several papers have called for good practice in study design, data production and analyses to ensure repeatability and comparability between studies. Notably, the importance of mock community samples, negative controls, and replicates is frequently highlighted (Alberdi et al. 2018, O'Rourke et al. 2020). However, their use in bioinformatics pipelines is often limited to post hoc verification of expectations by the user. Indeed, one of the biggest challenges in metabarcoding analyses is to take into account the trade-off between false positive (FP) and false negative (FN) occurrences. We thus developed the VTAM (Validation and Taxonomic Assignation of Metabarcoding data) pipeline, which is the first tool to use explicitly the negative control and mock samples to find optimal parameters to minimize false positive and negative occurrences. In addition, VTAM addresses all known technical error types including tag-jumps, repeatability among replicates, and also it is able to integrate more than one overlapping markers to further minimize false negative occurrences.

In order to evaluate VTAM, we compared it with two other pipelines: a pipeline based on DADA2 (Callahan et al. 2016) and LULU (Frøslev et al. 2017), and a pipeline based on OBITools3 (Boyer et al. 2016) and metabaR (Zinger et al. 2020). Two datasets from fish and bat diet studies were analysed with the three different pipelines. Based on mock and negative samples, we demonstrate that VTAM showed the best precision for mock samples in both datasets, while specificity in negative controls were comparable among the three pipelines (Fig. 1).

Figure 1.  

Precision (True positives / (True positives + False positives)) and Specificity (True negatives / (True negative + False positives)) of three pipelines, based on mock samples and negative controls, respectively.

VTAM therefore constitutes a complete pipeline to filter and validate metabarcoding data, from raw FASTQ data to Amplicon Sequence Variant tables with taxonomic assignments. Our pipeline aggregates a series of features rarely grouped in a single pipeline and performs a non-arbitrary parameter optimization based on internal control samples to generate conservative but informative metabarcoding datasets. We believe VTAM provides a very valuable tool for the validation of metabarcoding data, which is essential for conducting robust analyses of biodiversity.

Keywords

metabarcoding, mock sample, negative control, replicates, taxonomic assignation, false positives, false negatives

Presenting author

Emese Meglécz

Presented at

1st DNAQUA International Conference (March 9-11, 2021)

References