nf-core/viralmetagenome
Detect iSNV and construct whole viral genomes from metagenomic samples
Introduction
Viralgenie is a bioinformatics best-practice analysis pipeline for reconstructing consensus genomes and to identify intra-host variants from metagenomic sequencing data or enriched based sequencing data like hybrid capture.
Pipeline summary
- Read QC (
FastQC) - Performs optional read pre-processing
- Metagenomic diversity mapping
- Denovo assembly (
SPAdes,TRINITY,megahit), combine contigs. - [Optional] extend the contigs with sspace_basic and filter with
prinseq++ - [Optional] Map reads to contigs for coverage estimation (
BowTie2,BWAmem2andBWA) - Contig reference idententification (
blastn)- Identify top 5 blast hits
- Merge blast hit and all contigs of a sample
- [Optional] Precluster contigs based on taxonomy
- Cluster contigs (or every taxonomic bin) of samples, options are:
- [Optional] Remove clusters with low read coverage.
bin/extract_clusters.py - Scaffolding of contigs to centroid (
Minimap2,iVar-consensus) - [Optional] Annotate 0-depth regions with external reference
bin/nocov_to_reference.py. - [Optional] Select best reference from
--mapping_constraints: - Mapping filtered reads to supercontig and mapping constraints(
BowTie2,BWAmem2andBWA) - [Optional] Deduplicate reads (
Picardor if UMI’s are usedUMI-tools) - Variant calling and filtering (
BCFTools,iVar) - Create consensus genome (
BCFTools,iVar) - Repeat step 12-15 multiple times for the denovo contig route
- Consensus evaluation and annotation (
QUAST,CheckV,blastn,prokkammseqs-search,MAFFT- alignment of contigs vs iterations & consensus) - Result summary visualisation for raw read, alignment, assembly, variant calling and consensus calling results (
MultiQC)
Usage
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv:
sample,fastq_1,fastq_2
sample1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
sample2,AEG588A5_S5_L003_R1_001.fastq.gz,
sample3,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gzEach row represents a fastq file (single-end) or a pair of fastq files (paired end).
Now, you can run the pipeline using:
nextflow run Joon-Klaps/viralgenie \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR>Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.
For more details and further functionality, please refer to the usage documentation and the parameter documentation.
Pipeline output
To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.
Credits
Viralgenie was originally written by Joon-Klaps.
We thank the following people for their extensive assistance in the development of this pipeline:
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines.
Citations
Viralgenie is currently not Published. Please cite as: Github https://github.com/Joon-Klaps/viralgenie
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.