nf-core/differentialabundance      
 Differential abundance analysis for feature/ observation matrices from platforms such as RNA-seq
Define where the pipeline should find input data and save output data.
A string to identify results in the output directory
stringstudyA string identifying the technology used to produce the data
stringPath to comma-separated file containing information about the samples in the experiment.
string^\S+\.(csv|tsv|txt)$A CSV file describing sample contrasts
string^\S+\.(csv|tsv|txt)$The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
stringType of abundance measure used, platform-dependent
stringcountsWays of providing your abundance values
TSV-format abundance matrix
string^\S+\.(tsv|csv|txt)$(RNA-seq only): optional transcript length matrix with samples and genes as the abundance matrix
stringAlternative to matrix: a compressed CEL files archive such as often found in GEO
stringnullUse SOFT files from GEO by providing the GSE study identifier
stringnullColumn in the samples sheet to be used as the primary sample identifier
stringsampleType of observation
stringsampleColumn in the sample sheet to be used as the display identifier for observations. If unset, will use value of —observations_id_col.
stringOptions related to features
Feature ID attribute in the abundance table as well as in the GTF file (e.g. the gene_id field)
stringgene_idFeature name attribute in the abundance table as well as in the GTF file (e.g. the gene symbol field)
stringgene_nameType of feature we have, often ‘gene’
stringgeneWhen set, use the control features in scaling/ normalisation
booleanA text file listing technical features (e.g. spikes)
stringComma-separated string, specifies feature metadata columns to be used for exploratory analysis, platform-specific
stringgene_id,gene_name,gene_biotypeThis parameter allows you to supply your own feature annotations. These can often be automatically derived from the GTF used upstream for RNA-seq, or from the Bioconductor annotation package (for affy arrays).
string^\S+\.(csv|tsv|txt)$Where a GTF file is supplied, which feature type to use
stringtranscriptWhere a GTF file is supplied, which field should go first in the converted output table
stringgene_idOptions for processing of affy arrays with justRMA()
Column of the sample sheet containing the Affymetrix CEL file name
stringfilelogical value. If TRUE, then background correct using RMA background correction.
booleantrueinteger value indicating which RMA background to use
integer2logical value. If TRUE, then works on the PM matrix in place as much as possible, good for large datasets.
booleanUsed to specify the name of an alternative cdf package. If set to NULL, then the usual cdf package based on Affymetrix’ mappings will be used.
stringnulllogical value. If TRUE, a matrix of probe annotations will be derived.
booleantrueshould the spots marked as ‘MASKS’ set to NA?
booleanshould the spots marked as ‘OUTLIERS’ set to NA?
booleanif TRUE, then overrides what is in rm.mask and rm.oultiers.
booleanOptions for processing of proteomics MaxQuant tables with the Proteus R package
Prefix of the column names of the MaxQuant proteingroups table in which the intensity values are saved; the prefix has to be followed by the sample names that are also found in the samplesheet. Default: ‘LFQ intensity’; will search for both the prefix as entered and the prefix followed by one whitespace.
stringLFQ intensityNormalization function to use on the MaxQuant intensities.
stringWhich method to use for plotting sample distributions of the MaxQuant intensities; one of ‘violin’, ‘dist’, ‘box’.
stringShould a loess line be added to the plot of mean-variance relationship of the conditions? Default: true.
booleantrueValid R palette name
stringSet1Options related to filtering upstream of differential analysis
Minimum abundance value
number1Minimum observations that must pass the threshold to retain the row/ feature (e.g. gene).
number1A minimum proportion of observations, given as a number between 0 and 1, that must pass the threshold. Overrides minimum_samples
numberAn optional grouping variable to be used to calculate a min_samples value
stringA minimum proportion of observations, given as a number between 0 and 1, that must have a value (not NA) to retain the row/ feature (e.g. gene).
number0.5Minimum observations that must have a value (not NA) to retain the row/ feature (e.g. gene). Overrides filtering_min_proportion_not_na.
numberOptions related to data exploration
Clustering method used in dendrogram creation
stringward.D2Correlation method used in dendrogram creation
stringspearmanNumber of features selected before certain exploratory analyses. If -1, will use all features.
integer500Length of the whiskers in boxplots as multiple of IQR. Defaults to 1.5.
number1.5Threshold on MAD score for outlier identification
integer-5How should the main grouping variable be selected? ‘auto_pca’, ‘contrasts’, or a valid column name from the observations table.
stringauto_pcaSpecifies assay names to be used for matrices, platform-specific.
stringraw,normalised,variance_stabilisedSpecifies final assay to be used for exploratory analysis, platform-specific
stringvariance_stabilisedOf which assays to compute the log2 during exploratory analysis. Not necessary for maxquant data as this is controlled by the pipeline.
stringValid R palette name
stringSet1Options related to differential operations
Advanced option: the suffix associated tabular differential results tables. Will by default use the appropriate suffix according to the study_type.
stringThe feature identifier column in differential results tables
stringgene_idThe fold change column in differential results tables
stringlog2FoldChangeThe p value column in differential results tables
stringpvalueThe q value column in differential results tables.
stringpadjMinimum fold change used to calculate differential feature numbers
number2Maximum p value used to calculate differential feature numbers
number1Maximum q value used to calculate differential feature numbers
number0.05Where a features file (GTF) has been provided, what attributed to use to name features
stringgene_nameIndicate whether or not fold changes are on the log scale (default is to assume they are)
booleantrueValid R palette name
stringSet1In differential analysis (DEseq2 or Limma), subset to the contrast samples before modelling variance?
booleantest parameter passed to DESeq()
stringfitType parameter passed to DESeq()
stringsfType parameter passed to DESeq()
string‘minReplicatesForReplace’ parameter passed to DESeq()
integer7useT parameter passed to DESeq2
booleanindependentFiltering parameter passed to results()
booleantruelfcThreshold parameter passed to results()
integeraltHypothesis parameter passed to results()
stringgreaterAbspAdjustMethod parameter passed to results()
stringBHalpha parameter passed to results()
number0.1minmu parameter passed to results()
number0.5variance stabilisation method to use when making a variance stabilised matrix
stringShink fold changes in results?
booleantrueNumber of cores
integer1blind parameter for rlog() and/ or vst()
booleantruensub parameter passed to vst()
integer1000passed to lmFit(), positive integer giving the number of times each distinct probe is printed on each array.
numberpassed to lmFit(), positive integer giving the spacing between duplicate occurrences of the same probe, spacing=1 for consecutive rows.
stringnullSample sheet column to be used to derive a vector or factor specifying a blocking variable on the arrays
stringnullpassed to lmFit(), the inter-duplicate or inter-technical replicate correlation
stringnullpassed to lmFit(), the fitting method
stringpassed to eBayes(), a numeric value between 0 and 1, assumed proportion of genes which are differentially expressed
number0.01passed to eBayes(), logical, should an intensity-dependent trend be allowed for the prior variance?
booleanpassed to eBayes(), logical, should the estimation of df.prior and var.prior be robustified against outlier sample variances?
booleanpassed to eBayes, comma separated string of two values, assumed lower and upper limits for the standard deviation of log2-fold-changes for differentially expressed genes
string0.1,4passed to eBayes, comma separated string of length 1 or 2, giving left and right tail proportions of x to Winsorize. Used only when robust=TRUE.
string0.05,0.1passed to topTable(), minimum absolute log2-fold-change required
integerpassed to topTable(), logical, should confidence 95% intervals be output for logFC? Alternatively, can take a numeric value between zero and one specifying the confidence level required.
booleanpassed to topTable(), method used to adjust the p-values for multiple testing.
stringcutoff value for adjusted p-values. Only genes with lower p-values are listed.
number1Set to run GSEA to infer differential gene sets in contrasts
booleanPermutation type
stringNumber of permutations
integer1000Enrichment statistic
stringMetric for ranking genes
stringGene list sorting mode
stringGene list ordering mode
stringMax size: exclude larger sets
integer500Min size: exclude smaller sets
integer15Normalisation mode
stringRandomization mode
stringMake detailed geneset report?
booleantrueUse median for class metrics
booleanNumber of markers
integer100Plot graphs for the top sets of each phenotype
integer20Seed for permutation
stringtimestampSave random ranked lists
booleanMake a zipped file with all reports
booleanSet to run gprofiler2 and do a pathway enrichment analysis.
booleanShort name of the organism that is analyzed, e.g. hsapiens for homo sapiens.
stringShould only significant enrichment results be considered?
booleantrueShould underrepresentation be measured instead of overrepresentation?
booleanThe method that should be used for multiple testing correction.
stringOn which source databases to run the gprofiler query
stringWhether to include evcodes in the results.
booleanMaximum q value used for significance testing.
number0.05Token that should be used as a query.
stringPath to CSV/TSV/TXT file that should be used as a background for the query; alternatively, ‘auto’ (default) or ‘false’.
string^\S+\.(csv|tsv|txt)$|auto|falseWhich column to use as gene IDs in the background matrix.
stringHow to calculate the statistical domain size.
stringHow many genes must be differentially expressed in a pathway for it to be considered enriched? Default 1.
integer1Valid R palette name
stringBluesShould a Shiny app be built?
booleantrueShould the app be deployed to shinyapps.io?
booleanYour shinyapps.io account name
stringnullThe name of the app to push to in your shinyapps.io account
stringnullShould we guess the log status of matrices and unlog for the app?
booleantrueFiles and options used by gene set analysis modules.
Gene sets in GMT or GMX-format; for GSEA: multiple comma-separated input files in either format are possible. For gprofiler2: A single file in GMT format is possible; this has lowest priority and will be overridden by —gprofiler2_token and —gprofiler2_organism.
stringnullRmd report template from which to create the pipeline report
string${projectDir}/assets/differentialabundance_report.Rmd^\S+\.Rmd$Email address for completion summary.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$A logo to display in the report instead of the generic pipeline logo
string${projectDir}/docs/images/nf-core-differentialabundance_logo_light.pngCSS to use to style the output, in lieu of the default nf-core styling
string${projectDir}/assets/nf-core_style.cssA markdown file containing citations to include in the fiinal report
string${projectDir}/CITATIONS.mdA title for reporting outputs
stringnullAn author for reporting outputs
stringnullSemicolon-separated string of contributor info that should be listed in the report.
stringA description for reporting outputs
stringnullWhether to generate a scree plot in the report
booleantrueTo how many digits should numeric output in different modules be rounded? If -1, will not round.
integer4Reference genome related files and options required for the workflow.
Name of iGenomes reference.
stringGenome annotation file in GTF format
string^\S+\.gtf(\.gz)?Do not load the iGenomes reference config.
booleanParameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
stringmasterBase directory for Institutional configs.
stringhttps://raw.githubusercontent.com/nf-core/configs/masterInstitutional config name.
stringInstitutional config description.
stringInstitutional config contact information.
stringInstitutional config URL link.
stringSet the top limit for requested resources for any single job.
Maximum number of CPUs that can be requested for any single job.
integer16Maximum amount of memory that can be requested for any single job.
string128.GB^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$Maximum amount of time that can be requested for any single job.
string240.h^(\d+\.?\s*(s|m|h|d|day)\s*)+$Less common options for the pipeline, typically set in a config file.
Display help text.
booleanDisplay version and exit.
booleanMethod used to save pipeline results to output directory.
stringEmail address for completion summary, only when pipeline fails.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Send plain-text email instead of HTML.
booleanDo not use coloured log outputs.
booleanIncoming hook URL for messaging service
stringBoolean whether to validate parameters against the schema at runtime
booleantrueShow all params when using --help
booleanValidation of parameters fails when an unrecognised parameter is found.
booleanValidation of parameters in lenient more.
boolean