(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .



FAVIS: Fast and versatile protocol for non-destructive metabarcoding of bulk insect samples [1]

['Elzbieta Iwaszkiewicz-Eggebrecht', 'Department Of Bioinformatics', 'Genetics', 'Swedish Museum Of Natural History', 'Stockholm', 'Piotr Łukasik', 'Institute Of Environmental Sciences', 'Faculty Of Biology', 'Jagiellonian University', 'Kraków']

Date: 2023-10

Abstract Insects are diverse and sustain essential ecosystem functions, yet remain understudied. Recent reports about declines in insect abundance and diversity have highlighted a pressing need for comprehensive large-scale monitoring. Metabarcoding (high-throughput bulk sequencing of marker gene amplicons) offers a cost-effective and relatively fast method for characterizing insect community samples. However, the methodology applied varies greatly among studies, thus complicating the design of large-scale and repeatable monitoring schemes. Here we describe a non-destructive metabarcoding protocol that is optimized for high-throughput processing of Malaise trap samples and other bulk insect samples. The protocol details the process from obtaining bulk samples up to submitting libraries for sequencing. It is divided into four sections: 1) Laboratory workspace preparation; 2) Sample processing—decanting ethanol, measuring the wet-weight biomass and the concentration of the preservative ethanol, performing non-destructive lysis and preserving the insect material for future work; 3) DNA extraction and purification; and 4) Library preparation and sequencing. The protocol relies on readily available reagents and materials. For steps that require expensive infrastructure, such as the DNA purification robots, we suggest alternative low-cost solutions. The use of this protocol yields a comprehensive assessment of the number of species present in a given sample, their relative read abundances and the overall insect biomass. To date, we have successfully applied the protocol to more than 7000 Malaise trap samples obtained from Sweden and Madagascar. We demonstrate the data yield from the protocol using a small subset of these samples.

Citation: Iwaszkiewicz-Eggebrecht E, Łukasik P, Buczek M, Deng J, Hartop EA, Havnås H, et al. (2023) FAVIS: Fast and versatile protocol for non-destructive metabarcoding of bulk insect samples. PLoS ONE 18(7): e0286272. https://doi.org/10.1371/journal.pone.0286272 Editor: Ruslan Kalendar, University of Helsinki: Helsingin Yliopisto, FINLAND Received: March 21, 2023; Accepted: May 11, 2023; Published: July 19, 2023 Copyright: © 2023 Iwaszkiewicz-Eggebrecht et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: Raw data is made available on The Sequence Read Archive (SRA) with accession number PRJNA946790. Funding: This project was supported by the Knut and Alice Wallenberg Foundation, URL: https://url11.mailanyone.net/scanner?m=1pnO6j-0001w3-3a&d=4%7Cmail%2F90%2F1681496400%2F1pnO6j-0001w3-3a%7Cin11d%7C57e1b682%7C12918722%7C9499237%7C64399A89A36B1B445C9A64A73745266C&o=%2Fphta%3A%2Fktsnaw.elebwlne%2Fgro.gr&s=ocsMSCxtEbWPKspOLHYioJenJH8 (grant KAW 2017.088 to FR), Swedish Research Council, URL: https://www.vr.se/english.html (grant 2018-04620 to FR, 2019-04493 to AJMT and 2018-05973 to The Swedish National Infrastructure for Computing (SNIC)), Polish National Agency for Academic Exchange, URL: https://url11.mailanyone.net/scanner?m=1pnO6j-0001w3-3a&d=4%7Cmail%2F90%2F1681496400%2F1pnO6j-0001w3-3a%7Cin11d%7C57e1b682%7C12918722%7C9499237%7C64399A89A36B1B445C9A64A73745266C&o=%2Fphta%3A%2Fntspgwa%2Fv.l.o%26nenbwan%2Faps&s=MHyRVvm2JttgRlGPNhtBu1kbxIU; (grant PPN/PPO/2018/1/00015 to PL) and Polish National Science Centre, URL: https://url11.mailanyone.net/scanner?m=1pnO6j-0001w3-3a&d=4%7Cmail%2F90%2F1681496400%2F1pnO6j-0001w3-3a%7Cin11d%7C57e1b682%7C12918722%7C9499237%7C64399A89A36B1B445C9A64A73745266C&o=%2Fphtw%3A%2Fwtsocw…gvnnne%2Flp&s=xp6RyBR_KENx9dKoeP-UipbjmGM (grant 2018/31/B/NZ8/01158 to PL). TR was funded by the European Research Council Synergy, https://url11.mailanyone.net/scanner?m=1pnO6j-0001w3-3a&d=4%7Cmail%2F90%2F1681496400%2F1pnO6j-0001w3-3a%7Cin11d%7C57e1b682%7C12918722%7C9499237%7C64399A89A36B1B445C9A64A73745266C&o=%2Fphtr%3A%2Fetsauc.eop.eraou%2Feepghm&s=d7238XBRw7A7kzQXg7KkyKOMFcA, Grant 856506 (LIFEPLAN) and a Career Support grant from the Swedish University of Agricultural Sciences, https://url11.mailanyone.net/scanner?m=1pnO6j-0001w3-3a&d=4%7Cmail%2F90%2F1681496400%2F1pnO6j-0001w3-3a%7Cin11d%7C57e1b682%7C12918722%7C9499237%7C64399A89A36B1B445C9A64A73745266C&o=%2Fphtw%3A%2Fwtselw.e.s%2Fsu%2Fn&s=B6saXQFQvI-SFi3HzbQiNgfzdfo. The funders did not and will not have a role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." Competing interests: The authors have declared that no competing interests exist.

Introduction Insects are key players in ecosystems—they are a crucial part of food webs and provide a wealth of ecosystem functions and services. They are therefore indispensable for the maintenance of natural systems as well as for food production [1]. Insects are also highly diverse with estimates ranging from 4 to 7 million species, which makes them one of the most species-rich groups of animals on Earth [2–4]. However, despite insects’ tremendous diversity and ecological importance, our knowledge about them is still fragmentary, and an estimated 75 to 85% of insect species still remain undescribed [5]. Worryingly, recent studies of trends in insect abundance and diversity [6–8] have raised alarm about worldwide insect declines and subsequent threats to the stability of terrestrial ecosystems [9]. Thus, there is a pressing need to speed up efforts in insect diversity discovery and monitoring. Traditional methods used to study and describe insects involve collecting specimens with a range of different traps, followed by sorting and classification of samples into taxonomic fractions. Whilst the first part of this process—sampling—is relatively straightforward and can be performed by volunteers [10], the latter—taxonomic identification—is complex, demanding specialized knowledge which is in short supply, and can be incredibly time-consuming. For instance, the Swedish Malaise Trap Project—an ambitious project aiming to characterize the entire insect fauna of Sweden—collected 1919 bulk insect community samples, containing an estimated 20 million individuals, over three years. Sorting those samples into some 350 taxonomic fractions suitable for processing by specialists took 15 years, despite a considerable investment in manpower [11]. Furthermore, the material identified to species level accounted for only 1% of the total specimen number [12]. In another study, the mapping of insect diversity in a single tropical forest took a decade, and involved the work of 110 taxonomists [13]. Such delays in obtaining results hamper the development of meaningful conservation or protection measures in a timely fashion. For adequate insect diversity discovery, insect community monitoring and real-time study of the spatio-temporal dynamics of insect communities, it is imperative that we develop high-throughput methods for taxonomic processing of insect samples. DNA-based methods appear particularly well suited to address these high-throughput needs [14]. Reference databases are constantly growing and the cost of sequencing is decreasing, adding to their appeal. Methods such as metagenomics or genome skimming–i.e., filtering of high-copy loci, such as mitochondria, chloroplasts or rRNA, mitochondrial sequences after sequencing–can provide high taxonomic resolution and even promise to provide accurate abundance estimates from bulk samples [15, 16]. However, these techniques still remain prohibitively expensive for most large-scale insect monitoring projects. Metabarcoding—i.e., the amplification of large numbers of barcode sequences from bulk samples—is a cost-effective alternative that has gained popularity in recent years [17–19] and has been successfully applied in arthropod community surveys [14, 20, 21]. Metabarcoding relies on the use of the DNA barcoding technique, developed by Hebert and colleagues [22, 23], in which a short DNA fragment of an individual (i.e., a barcode) can provide us with species-level identification. The standard barcode used in eukaryotic diversity studies is the Folmer region [24] of the mitochondrial cytochrome c oxidase subunit 1 (COI) gene, for which vast reference databases exist [25]. In metabarcoding studies, DNA is extracted from bulk, multi-species samples (as derived from e.g. a Malaise trap, or a water or soil sample). Then barcodes are amplified via PCR, sequenced and compared to the reference database for taxonomic identification. The species can be named by matching the barcode to a reference database, providing that the species is represented in the database. Due to high insect diversity and large knowledge gaps, certain taxonomic groups are poorly represented among the references—both because of the lack of voucher material for described species, and because of a high proportion of undescribed species. Both aspects will contribute to lowering the success of species-level identification. Nevertheless, even for those poorly represented groups, it is still possible to group sequences into clusters based on their genetic similarity, obtain taxonomic assignment for these clusters at higher levels (i.e., order, family or genus), and compare their presence among samples–thereby allowing the efficient characterization of the community composition of the original sample collection. Despite the great potential of metabarcoding, many methodological questions concerning early stages of sample processing remain open. Perhaps most importantly, the operating procedures for large-scale insect monitoring projects remain motley and poorly documented. In recent years, many different protocols have emerged. Some advocate destructive DNA extraction methods like homogenizing specimens into an “insect soup” [18, 26–28]. Others propose non-destructive mild lysis treatments, in which insects soak in a buffer, gradually releasing their DNA, with minimal damage to specimens [19, 29–31]. The mild lysis treatment yields smaller DNA amounts [32] but is less laborious and preserves specimens for future molecular and taxonomic work [33–36]. Furthermore, it was recently shown that mild lysis also decreases the rate of false negatives during metabarcoding, as the capability to detect small specimens is improved [32]. Each laboratory and institution has to design a workflow best fitting their aims and needs. To aid those searching for a versatile and scalable solution for their purposes, here we present a complete metabarcoding protocol, from insect bulk samples to sequencing data, initially designed for a large-scale insect monitoring project—the Insect Biome Atlas (www.insectbiomeatlas.org). The project’s field campaign took place in Sweden and Madagascar over 12 months during 2019–2020, and yielded 7398 insect community samples collected with Malaise traps, each sample typically representing one week. All samples were processed using this protocol within 12 months. When adapted and optimized, the wet-lab protocol allows one lab technician to process 180 insect community samples from bulk samples to submission for sequencing in one week, allowing the timely delivery of results. The use of the protocol and further bioinformatic processing result in a dataset that can be used to produce comprehensive lists of species present in a sample, their relative read abundances, and the overall insect biomass. In defining the protocol, we made efforts to reduce costs and adopt universal reagents and materials that can be easily obtained worldwide. For steps that remain costly or inaccessible, such as DNA purification robots, we suggest alternative low-cost solutions when possible. We opted for a non-destructive lysis protocol with a short incubation time (2h 45 min) in a mild lysis buffer [37] as this allows the efficient processing of a large number of samples per day whilst maximizing the power to recover the original species composition of each sample [32]. In order to introduce a correction factor and allow more accurate estimates of species’ abundances, we added to each sample a pre-defined number of biological spike-ins—size-standardized insect species that do not occur in our sampled area (e.g., in the processing of Swedish Malaise trap collection we selected six tropical species that have never been detected in Sweden or neighboring countries). Furthermore, we minimize the damage to specimens and preserve the insect material for further taxonomic or molecular work by returning them to ethanol immediately after the lysis step. Another important aspect of the protocol is the fact that insects never leave the collection bottle, minimizing the risk of cross-sample contamination during sample processing and DNA extraction. The two-step-PCR strategy for COI amplicon library preparation results in double-uniquely indexed libraries obtained using broad-spectrum BF3-BR2 primers [38] with variable-length inserts (phased), reducing cross-contamination through index hopping and increasing signal complexity within the sequencing lane, thus translating to higher quality of results [39].

Materials and methods The protocol described in this article is published on protocols.io https://www.protocols.io/private/C609E2107CD8B7CFF46EFF1461DBE4C3 and is included for printing as S1 File with this article. The protocol is divided into four sections. Section 1 (Preparation) describes how to prepare workspace and equipment before starting to process samples. The remaining three sections—sections 2 (Sample Processing), 3 (DNA Purification) and 4 (Library Preparation and Sequencing)—cover the main parts of the protocol (Fig 1). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 1. Schematic workflow of FAVIS metabarcoding protocol, comprising three main sections. Sample processing consists of decanting ethanol, measuring ethanol concentration, wet-weighing the sample, adding lysis buffer, incubating the sample, decanting lysates, taking lysate aliquot, and refilling the insect community sample with the previously decanted ethanol. DNA is then extracted and purified from the lysate aliquot with magnetic beads and subsequently used as a template for the amplification of the target COI fragment via PCR (PCR I). After amplification, PCR products are cleaned using magnetic beads and used as template in the second round of PCR (PCR II), where sample-specific tags are added and Illumina adapters completed. The concentration of PCR II products is assessed based on the band brightness on an agarose gel, and all samples are pooled approximately equimolarly to form the sequencing pool. Finally, the pool is purified with magnetic beads and then sequenced on an Illumina NovaSeq platform. https://doi.org/10.1371/journal.pone.0286272.g001 To demonstrate the utility of the protocol we summarize the sequence data obtained by processing fifteen Malaise trap samples representing three different habitats in Southern Sweden: a forest, a wetland, and a grassland. From each of these habitats, we present data from samples collected during five consecutive weeks between April and May 2019. The 15 samples presented here were processed as part of our high-throughput sample processing, which involved processing 180 bulk insect samples per week. After completing all steps of the protocol and sequencing on an Illumina NovaSeq 6000 SPrime flow cell, sequencing data was processed bioinformatically following pipelines that can be accessed via links: https://github.com/biodiversitydata-se/amplicon-multi-cutadapt (read trimming and filtering); https://nf-co.re/ampliseq (ASV reconstruction and taxonomic annotation). In short, we use cutadapt v.3.2 [40] for primer trimming and R package DADA2 v.4.2.1 for denoising [41]. Then we use SINTAX [42] in order to get the taxonomic assignment for all ASVs using a custom-made reference COI database (https://doi.org/10.17044/scilifelab.20514192.v4). Krona plots were prepared with the q2-krona plug-in from the qiimeII v.2022.2 library [43, 44]. Visualizations of the results were done with ggplot2 v.3.4.1 [45] and ggvenn v.0.1.9 [46] packages in the R environment [47]. Non-metric multidimensional scaling (nMDS) was calculated using the metaMDS function from the vegan v.2.6–4 package [48]. Code used for data manipulation and plotting of the results as well as interactive Krona plots are available on GitHub under https://github.com/ela-iwaszkiewicz/Lab_protocols.git. Ethics statement Samples used in this study were covered by Sweden´s right of access to private land (Allemansrätten) and did not necessitate a collection permit. More information about utilizing Swedish genetic resources can be found at the Swedish Environmental Protection Agency website: https://www.naturvardsverket.se/en/guidance/species-protection/utilizing-genetic-resources

Conclusions Novel DNA-based methods have the potential to revolutionize biodiversity discovery and monitoring when applied in a high-throughput fashion. Swift processing is crucial for monitoring purposes as well as for informed decision making in conservation efforts. The metabarcoding protocol described here allows a trained lab technician to process 180 samples (2 x 96-well plates when we include all negative and positive controls), from bulk insect catches to ready-to-sequence libraries, in 7 working days, translating to over 500 Malaise trap samples processed per month (for details see S1 Table). When processing samples at a scale of thousands, an estimated average per-sample reagent cost amounts to about 5 EUR for DNA extraction and purification using homemade magnetic bead solution and 3 EUR for library preparation; additionally, the costs of generating ~1M paired-end reads (2 x 250bp) per sample was about 10 EUR when using NovaSeq 6000 SPrime flow cell. Costs presented here are average costs when implementing the protocol in a high throughput manner, making use of bulk purchase of reagents and consumables, and using home made magnetic beads instead of standardized kits for DNA purification (as described in alternative step 17 in the step-by-step protocol uploaded in protocols.io). Neither these consumables/services costs nor the amount of labor involved (and associated human resources costs) are trivial. However, for large projects addressing grand questions about the biodiversity patterns during times of global change, they are not implausible. Also, there is space for the improvement of time- and cost-efficiency through more extensive use of laboratory automation, or skipping or replacing labor-intensive steps such as agarose gel-based library quality control after both the first and the second PCR. We have shown using a small subset of processed samples that using FAVIS results in good quality metabarcoding data that can be used in biodiversity studies and be subject to biological interpretation. While hard to demonstrate in a quantitative manner, we have invested substantial effort in addressing and controlling some of the known methodological challenges including cross-contamination during sample processing and through index hopping, both of which had a measurable effect in our early datasets. The non-destructive nature of the protocol and the retention of specimens post-digestion allows for their future individual characterization using sequencing- or morphology-based studies. At the same time, it is important to pinpoint some of the challenges, likely to become more significant as sample collection and processing accelerates. Among the most important is sample management and tracking. When processing 7000 bulk insect samples from the Insect Biome Atlas project using this protocol, we simplified and streamlined sample management and data recording through the use of QR codes for sample labeling and storage location that are read and registered into a database via a handheld barcode scanner. Another important challenge is the long-term storage of samples and lysates. Those processed as a part of the current project occupy a substantial portion of a custom-build freezer house; but the availability of infrastructure and long-term storage costs could hamper some projects. The third major consideration are the challenges in the analysis and biological interpretation of tremendous amounts of data generated by the project. The bioinformatic workflow presented here is suitable for the analysis of much larger datasets, but dedicated statistical, modeling, and visualization solutions are needed before we can understand the patterns.

Acknowledgments The authors acknowledge support by NBIS (National Bioinformatics Infrastructure Sweden) as well as from the National Genomics Infrastructure in Stockholm and SNIC/Uppsala Multidisciplinary Center for Advanced Computational Science for assistance with massively parallel sequencing and access to the UPPMAX computational infrastructure.

[END]
---
[1] Url: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0286272

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/