Be positive: customised reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies
List of commands to produce customized reference databases and for taxonomic assignments of metabarcoding data.
Downloaded from www.zenodo.org/record/6555985. This DB is ready to use as it is for formatting and for selecting a desired target region. It comes with an associated taxonomy .tsv file.
mkdir -p ~/DB_compcd ~/DB_compwget https://zenodo.org/record/6555985/files/COInr_2022_05_06.tar.gztar -zxvf COInr_2022_05_06.tar.gzmv COInr_2022_05_06 COInrrm COInr_2022_05_06.tar.gz
Remove insect sequences from COInr
xxxxxxxxxxperl ~/mkCOInr/scripts/select_taxa.pl \-taxon_list metafiles/taxon_list_insecta.txt \-tsv COInr/COInr.tsv \-taxonomy COInr/taxonomy.tsv \-outdir COInr_WO_Insecta/ \-out COInr_WO_Insecta.tsv \-negative_list 1
metafiles/taxon_list_insecta.txt:
xxxxxxxxxxtaxon_nameInsecta
Derived from COInr-WO-Insecta and refined for Mediterranean marine Families gathered from OBIS*
xxxxxxxxxxperl ~/mkCOInr/scripts/select_taxa.pl \-taxon_list metafiles/Data_S4.tsv \-tsv COInr_WO_Insecta/COInr_WO_Insecta.tsv \-taxonomy COInr/taxonomy.tsv \-outdir COInr_Med/ \-out COInr_Med.tsv \-negative_list 0
Data_S4.tsv is the list of taxonomic families present in the Mediterranean Sea.
Add new barcodes to COInr-Med
xxxxxxxxxxperl ~/mkCOInr/scripts/format_custom.pl \-custom metafiles/Data_S2_barcodes.tsv \-taxonomy COInr/taxonomy.tsv \-outdir COInr_Med_plus/format_custom
Data_S2_barcodes.tsv is a tab separated file with seqID, taxon, sequence as columns. It can be created from Data_S2.tsv by selecting the appropriate columns.
xxxxxxxxxxseqID taxon sequenceSeq1 Orbinia sertulata ATCAGTAGATATAGCAATC...Seq10 Achelia langi TTTATCATCGAGATTGGC...
The output is a lineage file COInr_Med_plus/format_custom/custom_lineages.tsv and a sequence file COInr_Med_plus/format_custom/custom_sequences.tsv.
Revise the output lineage file to complete lineages if taxon name is new to taxonomy.tsv and choose between homonyms if necessary.
custom_lineages_verified.tsv
xxxxxxxxxxphylum class order family subfamily genus species seqIDsCnidaria Anthozoa Pennatulacea Funiculinidae Funiculina Funiculina quadrangularis Seq103;Seq130...
xxxxxxxxxxperl ~/mkCOInr/scripts/add_taxids.pl \-lineages metafiles/custom_lineages_verified.tsv \-sequences COInr_Med_plus/format_custom/custom_sequences.tsv \-outdir COInr_Med_plus/add_taxids \-taxonomy COInr/taxonomy.tsv
This command will update the taxonomy.tsv file by adding new taxIDs. Remember to use the generated COInr_Med_plus/add_taxids/taxonomy_updated.tsv file for further taxonomic assignation steps. It will also produce COInr_Med_plus/add_taxids/sequences_with_taxIDs.tsv used in the next step.
xxxxxxxxxxperl ~/mkCOInr/scripts/dereplicate.pl \-tsv COInr_Med_plus/add_taxids/sequences_with_taxIDs.tsv \-outdir COInr_Med_plus/dereplicate \-out custom_dereplicated_sequences.tsv
xxxxxxxxxxperl ~/mkCOInr/scripts/pool_and_dereplicate.pl \-tsv1 COInr_Med/COInr_Med.tsv \-tsv2 COInr_Med_plus/dereplicate/custom_dereplicated_sequences.tsv \-outdir COInr_Med_plus \-out COInr_Med_plus.tsv
Move the updated taxonomy file to the same folder as the COInr_Med_plus.tsv.
xxxxxxxxxxmv COInr_Med_plus/add_taxids/taxonomy_updated.tsv COInr_Med_plus
Select sequences that cover at least 80% of the region amplified by metabarcoding primer pairs and trim sequences to this region.
xxxxxxxxxxperl ~/mkCOInr/scripts/select_region.pl \-tsv COInr/COInr.tsv \-outdir leray/COInr \-e_pcr 1 \-fw GGNTGAACNGTNTAYCCNCC \-rv TAWACTTCDGGRTGNCCRAARAAYCA \-trim_error 0.3 \-min_amplicon_length 280 \-max_amplicon_length 345 \-min_overlap 20 \-tcov 0.8 \-identity 0.7
xxxxxxxxxxperl ~/mkCOInr/scripts/select_region.pl \-tsv COInr_WO_Insecta/COInr_WO_Insecta.tsv \-outdir leray/COInr_WO_Insecta \-e_pcr 1 \-fw GGNTGAACNGTNTAYCCNCC \-rv TAWACTTCDGGRTGNCCRAARAAYCA \-trim_error 0.3 \-min_amplicon_length 280 \-max_amplicon_length 345 \-min_overlap 20 \-tcov 0.8 \-identity 0.7
xxxxxxxxxxperl ~/mkCOInr/scripts/select_region.pl \-tsv COInr_Med/COInr_Med.tsv \-outdir leray/COInr_Med \-e_pcr 1 \-fw GGNTGAACNGTNTAYCCNCC \-rv TAWACTTCDGGRTGNCCRAARAAYCA \-trim_error 0.3 \-min_amplicon_length 280 \-max_amplicon_length 345 \-min_overlap 20 \-tcov 0.8 \-identity 0.7
xxxxxxxxxxperl ~/mkCOInr/scripts/select_region.pl \-tsv COInr_Med_plus/COInr_Med_plus.tsv \-outdir leray/COInr_Med_plus \-e_pcr 1 \-fw GGNTGAACNGTNTAYCCNCC \-rv TAWACTTCDGGRTGNCCRAARAAYCA \-trim_error 0.3 \-min_amplicon_length 280 \-max_amplicon_length 345 \-min_overlap 20 \-tcov 0.8 \-identity 0.7
xxxxxxxxxxperl ~/mkCOInr/scripts/format_db.pl \-tsv leray/COInr/trimmed.tsv \-taxonomy COInr/taxonomy.tsv \-outfmt vtam \-outdir leray/COInr/vtam \-out COInr_vtam
xxxxxxxxxxperl ~/mkCOInr/scripts/format_db.pl \-tsv leray/COInr_WO_Insecta/trimmed.tsv \-taxonomy COInr/taxonomy.tsv \-outfmt vtam \-outdir leray/COInr_WO_Insecta/vtam \-out COInr_WO_Insecta_vtam
xxxxxxxxxxperl ~/mkCOInr/scripts/format_db.pl \-tsv leray/COInr_Med/trimmed.tsv \-taxonomy COInr/taxonomy.tsv \-outfmt vtam \-outdir leray/COInr_Med/vtam \-out COInr_Med_vtam
xxxxxxxxxxperl ~/mkCOInr/scripts/format_db.pl \-tsv leray/COInr_Med_plus/trimmed.tsv \-taxonomy COInr_Med_plus/taxonomy_updated.tsv \-outfmt vtam \-outdir leray/COInr_Med_plus/vtam \-out COInr_Med_plus_vtam
xxxxxxxxxxperl ~/mkCOInr/scripts/format_db.pl \-tsv leray/COInr/trimmed.tsv \-taxonomy COInr/taxonomy.tsv \-outfmt rdp \-outdir leray/COInr/rdp \-out COInr_rdp
xxxxxxxxxxperl ~/mkCOInr/scripts/format_db.pl \-tsv leray/COInr_WO_Insecta/trimmed.tsv \-taxonomy COInr/taxonomy.tsv \-outfmt rdp \-outdir leray/COInr_WO_Insecta/rdp \-out COInr_WO_Insecta_rdp
xxxxxxxxxxperl ~/mkCOInr/scripts/format_db.pl \-tsv leray/COInr_Med/trimmed.tsv \-taxonomy COInr/taxonomy.tsv \-outfmt rdp \-outdir leray/COInr_Med/rdp \-out COInr_Med_rdp
xxxxxxxxxxperl ~/mkCOInr/scripts/format_db.pl \-tsv leray/COInr_Med_plus/trimmed.tsv \-taxonomy COInr_Med_plus/taxonomy_updated.tsv \-outfmt rdp \-outdir leray/COInr_Med_plus/rdp \-out COInr_Med_plus_rdp
xxxxxxxxxxperl ~/mkCOInr/scripts/format_db.pl \-tsv leray/COInr/trimmed.tsv \-taxonomy COInr/taxonomy.tsv \-outfmt qiime \-outdir leray/COInr/qiime \-out COInr_qiime
xxxxxxxxxxperl ~/mkCOInr/scripts/format_db.pl \-tsv leray/COInr_WO_Insecta/trimmed.tsv \-taxonomy COInr/taxonomy.tsv \-outfmt qiime \-outdir leray/COInr_WO_Insecta/qiime \-out COInr_WO_Insecta_qiime
xxxxxxxxxxperl ~/mkCOInr/scripts/format_db.pl \-tsv leray/COInr_Med/trimmed.tsv \-taxonomy COInr/taxonomy.tsv \-outfmt qiime \-outdir leray/COInr_Med/qiime \-out COInr_Med_qiime
xxxxxxxxxxperl ~/mkCOInr/scripts/format_db.pl \-tsv leray/COInr_Med_plus/trimmed.tsv \-taxonomy COInr_Med_plus/taxonomy_updated.tsv \-outfmt qiime \-outdir leray/COInr_Med_plus/qiime \-out COInr_Med_plus_qiime
Create output directory
xxxxxxxxxxmkdir -p taxassign/vtam
xxxxxxxxxxvtam taxassign \--mode reset \--db metafiles/db.sqlite \--asvtable metafiles/Data_S3.tsv \--output taxassign/vtam/COInr_vtam_taxassign.tsv \--taxonomy leray/COInr/vtam/COInr_vtam_taxonomy.tsv \--blastdbdir leray/COInr/vtam/ \--blastdbname COInr_vtam \-v
xxxxxxxxxxvtam taxassign \--mode reset \--db metafiles/db.sqlite \--asvtable metafiles/Data_S3.tsv \--output taxassign/vtam/COInr_WO_Insecta_vtam_taxassign.tsv \--taxonomy leray/COInr_WO_Insecta/vtam/COInr_WO_Insecta_vtam_taxonomy.tsv \--blastdbdir leray/COInr_WO_Insecta/vtam/ \--blastdbname COInr_WO_Insecta_vtam \-v
xxxxxxxxxxvtam taxassign \--mode reset \--db metafiles/db.sqlite \--asvtable metafiles/Data_S3.tsv \--output taxassign/vtam/COInr_Med_vtam_taxassign.tsv \--taxonomy leray/COInr_Med/vtam/COInr_Med_vtam_taxonomy.tsv \--blastdbdir leray/COInr_Med/vtam/ \--blastdbname COInr_Med_vtam \-v
xxxxxxxxxxvtam taxassign \--mode reset \--db metafiles/db.sqlite \--asvtable metafiles/Data_S3.tsv \--output taxassign/vtam/COInr_Med_plus_vtam_taxassign.tsv \--taxonomy leray/COInr_Med_plus/vtam/COInr_Med_plus_vtam_taxonomy.tsv \--blastdbdir leray/COInr_Med_plus/vtam/ \--blastdbname COInr_Med_plus_vtam \-v
The "Xmx216g" command has to be adjusted according to your available RAM (e.g., 216 = 216GB). Do not use all the available RAM of your machine, it will freeze.
Create output directories
xxxxxxxxxxmkdir -p leray/COInr/rdp/trainedmkdir -p leray/COInr_WO_Insecta/rdp/trainedmkdir -p leray/COInr_Med/rdp/trainedmkdir -p leray/COInr_Med_plus/rdp/trained
xxxxxxxxxxjava \-Xmx216g \-jar rdp_classifier_2.13/rdp_classifier_2.13/dist/classifier.jar \train \-o leray/COInr/rdp/trained/ \-s leray/COInr/rdp/COInr_rdp_trainseq.fasta \-t leray/COInr/rdp/COInr_rdp_taxon.txt
xxxxxxxxxxjava \-Xmx216g \-jar rdp_classifier_2.13/rdp_classifier_2.13/dist/classifier.jar \train \-o leray/COInr_WO_Insecta/rdp/trained/ \-s leray/COInr_WO_Insecta/rdp/COInr_WO_Insecta_rdp_trainseq.fasta \-t leray/COInr_WO_Insecta/rdp/COInr_WO_Insecta_rdp_taxon.txt
xxxxxxxxxxjava \-Xmx216g \-jar rdp_classifier_2.13/rdp_classifier_2.13/dist/classifier.jar \train \-o leray/COInr_Med/rdp/trained/ \-s leray/COInr_Med/rdp/COInr_Med_rdp_trainseq.fasta \-t leray/COInr_Med/rdp/COInr_Med_rdp_taxon.txt
xxxxxxxxxxjava \-Xmx216g \-jar rdp_classifier_2.13/rdp_classifier_2.13/dist/classifier.jar \train \-o leray/COInr_Med_plus/rdp/trained/ \-s leray/COInr_Med_plus/rdp/COInr_Med_plus_rdp_trainseq.fasta \-t leray/COInr_Med_plus/rdp/COInr_Med_plus_rdp_taxon.txt
Create output directory
xxxxxxxxxxmkdir -p taxassign/rdp
xxxxxxxxxxjava \-Xmx216g \-jar rdp_classifier_2.13/rdp_classifier_2.13/dist/classifier.jar \classify \-t leray/COInr/rdp/trained/rRNAClassifier.properties \-o taxassign/rdp/COInr_rdp_taxassign.tsv \metafiles/Data_S3.fasta
xxxxxxxxxxjava \-Xmx216g \-jar rdp_classifier_2.13/rdp_classifier_2.13/dist/classifier.jar \classify \-t leray/COInr_WO_Insecta/rdp/trained/rRNAClassifier.properties \-o taxassign/rdp/COInr_WO_Insecta_rdp_taxassign.tsv \metafiles/Data_S3.fasta
xxxxxxxxxxjava \-Xmx216g \-jar rdp_classifier_2.13/rdp_classifier_2.13/dist/classifier.jar \classify \-t leray/COInr_Med/rdp/trained/rRNAClassifier.properties \-o taxassign/rdp/COInr_Med_rdp_taxassign.tsv \metafiles/Data_S3.fasta
xxxxxxxxxxjava \-Xmx216g \-jar rdp_classifier_2.13/rdp_classifier_2.13/dist/classifier.jar \classify \-t leray/COInr_Med_plus/rdp/trained/rRNAClassifier.properties \-o taxassign/rdp/COInr_Med_plus_rdp_taxassign.tsv \metafiles/Data_S3.fasta
xqiime tools import \--type 'FeatureData[Sequence]' \--input-path leray/COInr/qiime/COInr_qiime_trainseq.fasta \--output-path leray/COInr/qiime/COInr_sequences.qzaqiime tools import \--type 'FeatureData[Taxonomy]' \--input-format HeaderlessTSVTaxonomyFormat \--input-path leray/COInr/qiime/COInr_qiime_taxon.txt \--output-path leray/COInr/qiime/COInr_taxonomy.qza
xxxxxxxxxxqiime tools import \--type 'FeatureData[Sequence]' \--input-path leray/COInr_WO_Insecta/qiime/COInr_WO_Insecta_qiime_trainseq.fasta \--output-path leray/COInr_WO_Insecta/qiime/COInr_WO_Insecta_sequences.qzaqiime tools import \--type 'FeatureData[Taxonomy]' \--input-format HeaderlessTSVTaxonomyFormat \--input-path leray/COInr_WO_Insecta/qiime/COInr_WO_Insecta_qiime_taxon.txt \--output-path leray/COInr_WO_Insecta/qiime/COInr_WO_Insecta_taxonomy.qza
xxxxxxxxxxqiime tools import \--type 'FeatureData[Sequence]' \--input-path leray/COInr_Med/qiime/COInr_Med_qiime_trainseq.fasta \--output-path leray/COInr_Med/qiime/COInr_Med_sequences.qzaqiime tools import \--type 'FeatureData[Taxonomy]' \--input-format HeaderlessTSVTaxonomyFormat \--input-path leray/COInr_Med/qiime/COInr_Med_qiime_taxon.txt \--output-path leray/COInr_Med/qiime/COInr_Med_taxonomy.qza
xxxxxxxxxxqiime tools import \--type 'FeatureData[Sequence]' \--input-path leray/COInr_Med_plus/qiime/COInr_Med_plus_qiime_trainseq.fasta \--output-path leray/COInr_Med_plus/qiime/COInr_Med_plus_sequences.qzaqiime tools import \--type 'FeatureData[Taxonomy]' \--input-format HeaderlessTSVTaxonomyFormat \--input-path leray/COInr_Med_plus/qiime/COInr_Med_plus_qiime_taxon.txt \--output-path leray/COInr_Med_plus/qiime/COInr_Med_plus_taxonomy.qza
Sequences should be in CAPITAL letters
xxxxxxxxxxqiime tools import \--type 'FeatureData[Sequence]' \--input-path metafiles/Data_S3.fasta \--output-path metafiles/Data_S3.qza
xxxxxxxxxxqiime feature-classifier fit-classifier-naive-bayes \--i-reference-reads leray/COInr/qiime/COInr_sequences.qza \--i-reference-taxonomy leray/COInr/qiime/COInr_taxonomy.qza \--o-classifier leray/COInr/qiime/COInr_trained.qza
xxxxxxxxxxqiime feature-classifier fit-classifier-naive-bayes \--i-reference-reads leray/COInr_WO_Insecta/qiime/COInr_WO_Insecta_sequences.qza \--i-reference-taxonomy leray/COInr_WO_Insecta/qiime/COInr_WO_Insecta_taxonomy.qza \--o-classifier leray/COInr_WO_Insecta/qiime/COInr_WO_Insecta_trained.qza
xxxxxxxxxxqiime feature-classifier fit-classifier-naive-bayes \--i-reference-reads leray/COInr_Med/qiime/COInr_Med_sequences.qza \--i-reference-taxonomy leray/COInr_Med/qiime/COInr_Med_taxonomy.qza \--o-classifier leray/COInr_Med/qiime/COInr_Med_trained.qza
xxxxxxxxxxqiime feature-classifier fit-classifier-naive-bayes \--i-reference-reads leray/COInr_Med_plus/qiime/COInr_Med_plus_sequences.qza \--i-reference-taxonomy leray/COInr_Med_plus/qiime/COInr_Med_plus_taxonomy.qza \--o-classifier leray/COInr_Med_plus/qiime/COInr_Med_plus_trained.qza
Create output directory
xxxxxxxxxxmkdir -p taxassign/qiime_sklearn
xxxxxxxxxxqiime feature-classifier classify-sklearn \--i-classifier leray/COInr/qiime/COInr_trained.qza \--i-reads metafiles/Data_S3.qza \--o-classification taxassign/qiime_sklearn/COInr_qiime_sklearn_taxassign.qza
xxxxxxxxxxqiime feature-classifier classify-sklearn \--i-classifier leray/COInr_WO_Insecta/qiime/COInr_WO_Insecta_trained.qza \--i-reads metafiles/Data_S3.qza \--o-classification taxassign/qiime_sklearn/COInr_WO_Insecta_qiime_sklearn_taxassign.qza
xxxxxxxxxxqiime feature-classifier classify-sklearn \--i-classifier leray/COInr_Med/qiime/COInr_Med_trained.qza \--i-reads metafiles/Data_S3.qza \--o-classification taxassign/qiime_sklearn/COInr_Med_qiime_sklearn_taxassign.qza
xxxxxxxxxxqiime feature-classifier classify-sklearn \--i-classifier leray/COInr_Med_plus/qiime/COInr_Med_plus_trained.qza \--i-reads metafiles/Data_S3.qza \--o-classification taxassign/qiime_sklearn/COInr_Med_plus_qiime_sklearn_taxassign.qza
Use three different percentage of identity: 0.97, 0.9, 0.8
Create output directory
xxxxxxxxxxmkdir -p taxassign/qiime_blast
xxxxxxxxxxqiime feature-classifier classify-consensus-blast \--i-metafiles/Data_S3.qza \--i-reference-reads leray/COInr/qiime/COInr_sequences.qza \--i-reference-taxonomy leray/COInr/qiime/COInr_taxonomy.qza \--p-perc-identity 0.97 \--o-classification taxassign/qiime_blast/COInr_qiime_blast_97_taxassign.qza \--verboseqiime feature-classifier classify-consensus-blast \--i-metafiles/Data_S3.qza \--i-reference-reads leray/COInr/qiime/COInr_sequences.qza \--i-reference-taxonomy leray/COInr/qiime/COInr_taxonomy.qza \--p-perc-identity 0.90 \--o-classification taxassign/qiime_blast/COInr_qiime_blast_90_taxassign.qza \--verboseqiime feature-classifier classify-consensus-blast \--i-metafiles/Data_S3.qza \--i-reference-reads leray/COInr/qiime/COInr_sequences.qza \--i-reference-taxonomy leray/COInr/qiime/COInr_taxonomy.qza \--p-perc-identity 0.80 \--o-classification taxassign/qiime_blast/COInr_qiime_blast_80_taxassign.qza \--verbose
xxxxxxxxxxqiime feature-classifier classify-consensus-blast \--i-metafiles/Data_S3.qza \--i-reference-reads leray/COInr_WO_Insecta/qiime/COInr_WO_Insecta_sequences.qza \--i-reference-taxonomy leray/COInr_WO_Insecta/qiime/COInr_WO_Insecta_taxonomy.qza \--p-perc-identity 0.97 \--o-classification taxassign/qiime_blast/COInr_WO_Insecta_qiime_blast_97_taxassign.qza \--verboseqiime feature-classifier classify-consensus-blast \--i-metafiles/Data_S3.qza \--i-reference-reads leray/COInr_WO_Insecta/qiime/COInr_WO_Insecta_sequences.qza \--i-reference-taxonomy leray/COInr_WO_Insecta/qiime/COInr_WO_Insecta_taxonomy.qza \--p-perc-identity 0.90 \--o-classification taxassign/qiime_blast/COInr_WO_Insecta_qiime_blast_90_taxassign.qza \--verboseqiime feature-classifier classify-consensus-blast \--i-metafiles/Data_S3.qza \--i-reference-reads leray/COInr_WO_Insecta/qiime/COInr_WO_Insecta_sequences.qza \--i-reference-taxonomy leray/COInr_WO_Insecta/qiime/COInr_WO_Insecta_taxonomy.qza \--p-perc-identity 0.80 \--o-classification taxassign/qiime_blast/COInr_WO_Insecta_qiime_blast_80_taxassign.qza \--verbose
xxxxxxxxxxqiime feature-classifier classify-consensus-blast \--i-metafiles/Data_S3.qza \--i-reference-reads leray/COInr_Med/qiime/COInr_Med_sequences.qza \--i-reference-taxonomy leray/COInr_Med/qiime/COInr_Med_taxonomy.qza \--p-perc-identity 0.97 \--o-classification taxassign/qiime_blast/COInr_Med_qiime_blast_97_taxassign.qza \--verboseqiime feature-classifier classify-consensus-blast \--i-metafiles/Data_S3.qza \--i-reference-reads leray/COInr_Med/qiime/COInr_Med_sequences.qza \--i-reference-taxonomy leray/COInr_Med/qiime/COInr_Med_taxonomy.qza \--p-perc-identity 0.90 \--o-classification taxassign/qiime_blast/COInr_Med_qiime_blast_90_taxassign.qza \--verboseqiime feature-classifier classify-consensus-blast \--i-metafiles/Data_S3.qza \--i-reference-reads leray/COInr_Med/qiime/COInr_Med_sequences.qza \--i-reference-taxonomy leray/COInr_Med/qiime/COInr_Med_taxonomy.qza \--p-perc-identity 0.80 \--o-classification taxassign/qiime_blast/COInr_Med_qiime_blast_80_taxassign.qza \--verbose
xxxxxxxxxxqiime feature-classifier classify-consensus-blast \--i-metafiles/Data_S3.qza \--i-reference-reads leray/COInr_Med_plus/qiime/COInr_Med_plus_sequences.qza \--i-reference-taxonomy leray/COInr_Med_plus/qiime/COInr_Med_plus_taxonomy.qza \--p-perc-identity 0.97 \--o-classification taxassign/qiime_blast/COInr_Med_plus_qiime_blast_97_taxassign.qza \--verboseqiime feature-classifier classify-consensus-blast \--i-metafiles/Data_S3.qza \--i-reference-reads leray/COInr_Med_plus/qiime/COInr_Med_plus_sequences.qza \--i-reference-taxonomy leray/COInr_Med_plus/qiime/COInr_Med_plus_taxonomy.qza \--p-perc-identity 0.90 \--o-classification taxassign/qiime_blast/COInr_Med_plus_qiime_blast_90_taxassign.qza \--verboseqiime feature-classifier classify-consensus-blast \--i-metafiles/Data_S3.qza \--i-reference-reads leray/COInr_Med_plus/qiime/COInr_Med_plus_sequences.qza \--i-reference-taxonomy leray/COInr_Med_plus/qiime/COInr_Med_plus_taxonomy.qza \--p-perc-identity 0.80 \--o-classification taxassign/qiime_blast/COInr_Med_plus_qiime_blast_80_taxassign.qza \--verbose
Bolyen E, et al. 2019. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37:852–857. DOI: 10.1038/s41587-019-0209-9.Microbiome, 6, 90.
González,A. et al. (2020) VTAM: A robust pipeline for validating metabarcoding data using internal controls. bioRxiv, 2020.11.06.371187.
Meglécz,E. (2022a) COInr a comprehensive, non-redundant COI database from NCBI-nt and BOLD. DOI: 10.5281/zenodo.6555985.
Meglécz,E. (2022b) COInr and mkCOInr: Building and customizing a non-redundant barcoding reference database from BOLD and NCBI using a lightweight pipeline. BioRxiv:2022.05.18.492423.
Meglécz,E. (2022c) meglecz/mkCOInr: mkCOInr-v.0.2.0. DOI: 10.5281/zenodo.6961340
Wang,Q. et al. (2007) Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol., 73, 5261–5267.