nf-core/rarevariantburden
Pipeline for performing consistent summary count based rare variant burden test, which is useful when we only have sequenced cases data. For example, we can compare the cases against public summary count data, such as gnomAD.
Define where the pipeline should find input data and save output data.
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
stringEmail address for completion summary.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Define different pipeline parameters.
Joined called and VQSR applied vcf file from the case cohort. If you already have splitted joined called VCF file by chromosome, you need to put ‘NA’ in here and provide the splitted VCF file list in the ‘caseVCFFileList’ parameter. Your caseJointVCF will be splitted into chr 1-22 by default, if you want something else, you need to specify that in ‘chrSet’ parameter.
stringNAOne column text file containing list of samples, one sample ID per line.
stringInput files needed for the pipeline from the control dataset, for now we support 3 gnomAD datasets as control, gnomADv2exome, gnomADv4.1exome, gnomADv4.1genome. You need to download these datasets from our Amazon AWS s3 bucket: s3://cocorv-resource-files/
stringReference genome build version, allowed values are ‘GRCh37’, ‘GRCh38’. Default value: ‘GRCh38’
stringGRCh38gnomAD version, allowed values are ‘v2exome’, ‘v4exome’, ‘v4genome’ (for GRCh37 data use ‘v2exome’, for GRCh38, use ‘v4exome’ or ‘v4genome’). Default value: ‘v4exome’
stringv4exomeBed file containing good coverage positions from case vcf files where 90% samples have coverage >= 10.
stringNAList of chromosomes you want to split your jointVCF file, you can test only for chromosome 21 and 22, in that case, it will be ‘21 22’
string1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22An optional R file with functions or a tab separated two column text file defining the variants of interest
stringNAAn optional JASON file to provide extra parameters to the custuomized file variantGroupCustom
stringNAA one column file without the header listing all required annotations used in variant filtering. AC AN annotations from ACANConfig will be added automatically
stringNAOptional options for CoCoRV module
stringNAThe column header used to pull Gene name from annotation file, default: ‘Gene.refGene’, for VEP, use ‘SYMBOL’
stringGene.refGeneA one column file specifying the variants to be included, it will also include variants specified in variantExcludeFile
stringNABatch size for CoCoRV module
integer10000The folder path containing CoCoRV R package
string/opt/cocorv/The maximum of the alternate allele frequency, for gnomADv2exome, we used 0.0001, for gnomADv4.1genome and v4.1exome, we used 0.0005
number0.0005The maximum missingness allowed for a variant
number0.1A specified variant group to use for test or a self defined function to define the variants of interest
stringannovar_pathogenicThe minimum REVEL score for pathogenic missense variants
number0.65The p-value threshold to detect high LD variants in control
number0.05Bed file containing good coverage positions from gnomAD control files where 90% samples have coverage >= 10.
stringnull/coverage10x.bed.gzThe reference genome file
stringnull/reference.fasta.gzIf you split you joined called VCF file by chromosome, you can supply the VCF file list here, the list needs to be in csv format (comma seperated), the column headers are chr,vcf. If you want to run for some chromosomes, i.e., chr 21 and 22, then you need to give here only chr number and vcf files for chr 21 and 22.
stringNAPre-normalized case VCF files, you can use the pre-normalized case VCF files to skip the normalization steps in the pipeline, the list needs to be in csv format (comma seperated), the column headers are chr,vcf,index. Then you need to list the chromosome number and the coresponding vcf file and .tbi file for that chromosome.
stringNAPre-annotated case VCF files, you can use the pre-annotated case VCF files to skip the annotation steps in the pipeline, the list needs to be in csv format (comma seperated), the column headers are chr,vcf,index. Then you need to list the chromosome number and the coresponding vcf file and .tbi file for that chromosome.
stringNAGDS converted genotype case VCF files, you can use it to skip the genotype GDS conversion steps in the pipeline, the list needs to be in csv format (comma seperated), the column headers are chr,gds. Then you need to list the chromosome number and the coresponding gds file for that chromosome.
stringNAGDS converted annotated case VCF files, you can use it to skip the annotation GDS conversion steps in the pipeline, the list needs to be in csv format (comma seperated), the column headers are chr,gds. Then you need to list the chromosome number and the coresponding vcf file for that chromosome.
stringNAGDS converted genotype control VCF files from gnomAD control data
stringnull/controlGenotypeGDS.csvGDS converted annotated control VCF files from gnomAD control data
stringnull/controlAnnotationGDS.csvthe configuration file specifying the ancestry groups for analysis
stringnull/stratified_config_gnomadV4.asj.txtA one column file specifying the variants to be excluded
stringnull/gnomAD41WGSExtraExcludeInCodingExcludeTAS2R46.txt.gzDefine different pipeline parameters for annotating with ANNOVAR and VEP.
Resource folder for the annotation tool Annovar, you can download this folder from our Amazon AWS s3 bucket: s3://cocorv-resource-files/annovarFolder/
stringResource folder for the annotation tool VEP, you can download this folder from our Amazon AWS s3 bucket: s3://cocorv-resource-files/vepFolder/
stringNAAnnotation options for Annovar, default: ‘refGene,gnomad211_exome,revel’
stringrefGene,gnomad211_exome,revelAnnotation options for Annovar, default: ‘g,f,f’
stringg,f,fIf you want to run only VEP annotation, you need to put ‘VEP’ for this parameter, if you want both annovar and vep, you need to put ‘ANNOVAR_VEP’. Default: ‘ANNOVAR’, meaning only run Annovar.
stringANNOVARVEP annotation parameters, list of VEP plugins to run
stringAM,SPLICEAI,LOFTEE,CADDAnnotation options for Annovar
stringNAAnnotation options for Annovar
stringNADefine different pipeline parameters for predicting ancestry.
File containing estimation of the population/ethnicity of case samples using gnomAD classifier, optional, if not specified nextflow app will estimate the population using gnomAD classifier.
stringNAThe files containing variant positions needed for gnomAD ancestry prediction classifier
stringnull/ancestry/hail_positions.chr.pos.tsvFiles needed for gnomAD ancestry prediction classifier
stringnull/ancestry/gnomad.pca_loadings.ht/Files needed for gnomAD ancestry prediction classifier
stringnull/ancestry/gnomad.RF_fit.onnxDefine different pipeline parameters for sex-stratified analysis.
If you want to do sex stratified analysis, need to put ‘true’ here. Default: ‘false’
booleana file with header, 1st column is ID, and must have a column named ‘Sex’. The sex is coded either as Male/Female, 1/2, or XY/XX, NA means missing
stringNADefine different pipeline parameters for post checking the CoCoRV results by generating variant sample info for top K genes.
Top K genes for generating variant-sample information for each gene
integer1000Case or control for which top K genes need to be examined
stringcaseParameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
stringmasterBase directory for Institutional configs.
stringhttps://raw.githubusercontent.com/nf-core/configs/masterInstitutional config name.
stringInstitutional config description.
stringInstitutional config contact information.
stringInstitutional config URL link.
stringLess common options for the pipeline, typically set in a config file.
Display version and exit.
booleanMethod used to save pipeline results to output directory.
stringEmail address for completion summary, only when pipeline fails.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Send plain-text email instead of HTML.
booleanDo not use coloured log outputs.
booleanIncoming hook URL for messaging service
stringBoolean whether to validate parameters against the schema at runtime
booleantrueBase URL or local path to location of pipeline test dataset files
stringhttps://raw.githubusercontent.com/nf-core/test-datasets/Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.
string