nf-core/proteinfamilies
Generation and updating of protein families
Define where the pipeline should find input data and save output data.
Path to comma-separated file ‘.csv’ containing information about the samples in the experiment.
string^\S+\.csv$The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
stringEmail address for completion summary. Example: name.surname@example.com
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$MultiQC report title. Printed as page header, used for filename if not otherwise specified.
stringParameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
stringmasterBase directory for Institutional configs.
stringhttps://raw.githubusercontent.com/nf-core/configs/masterInstitutional config name.
stringInstitutional config description.
stringInstitutional config contact information.
stringInstitutional config URL link.
stringBase path / URL for data used in the modules
stringLess common options for the pipeline, typically set in a config file.
Display version and exit.
booleanMethod used to save pipeline results to output directory.
stringEmail address for completion summary, only when pipeline fails. Example: name.surname@example.com
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Send plain-text email instead of HTML.
booleanFile size limit when attaching MultiQC reports to summary emails. Example: name.surname@example.com
string25.MB^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$Do not use coloured log outputs.
booleanIncoming hook URL for messaging service
stringCustom config file to supply to MultiQC.
stringCustom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
stringCustom MultiQC yaml file containing HTML including a methods description.
stringBoolean whether to validate parameters against the schema at runtime
booleantrueBase URL or local path to location of pipeline test dataset files
stringhttps://raw.githubusercontent.com/nf-core/test-datasets/proteinfamilies/Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.
stringDisplay the help message.
boolean,stringDisplay the full detailed help message.
booleanDisplay hidden parameters in the help message (only works when —help or —help_full are provided).
booleanUse these parameters to control the flow of the quality check subworkflow execution.
Skip all default QC steps for sequences (gap trimming, length filtering, validation, duplicate removal).
booleanThe minimum allowed sequence length
integer30The maximum allowed sequence length
integer5000Remove duplicate input amino acid sequences, based on the sequence.
booleanUse these parameters to control the flow of the clustering subworkflow execution.
Save the db output folder of mmseqs createdb
booleanChoose clustering algorithm. Either simple ‘cluster’ for medium size inputs, or ‘linclust’ for less sensitive clustering of larger datasets.
stringmmseqs parameter for minimum sequence identity
number0.3mmseqs parameter for minimum sequence coverage ratio
number0.5mmseqs parameter for coverage mode: 0 for both, 1 for target and 2 for query sequence
integerSave the clustering output folder of mmseqs cluster or linclust
booleanMinimum clustering chunk size threshold to create seed Multiple Sequence Alignments upon.
integer25Save membership-filtered initial mmseqs clusters in fasta format
booleanUse these parameters to control the Multiple Sequence Alignment subworkflow execution.
Choose alignment tool. FAMSA is recommended as best time-memory-accuracy combination option.
stringBoolean whether to skip the trimming process of gappy positions from Multiple Sequence Alignments (MSAs)
booleanChoose the output format of the clipped alignment.
stringclipkitChoose if ClipKIT should only clip gaps at the ends of the MSAs.
booleantrueMultiple Sequence Alignment (MSA) positions with gappiness greater than this threshold will be trimmed
number0.5Skip recruitment of additional sequences from the input FASTA file using the family Hidden Markov Models (HMMs) into the full alignment
booleanBoolean whether to generate target results file of hmmsearch
booleanBoolean whether to generate domain results file of hmmsearch
booleantruehmmsearch e-value cutoff threshold for reported results
number0.001Save the output of hmmsearch (.domtbl.gz and .tbl.gz)
booleanhmmsearch minimum length percentage filter of hit env vs query length
number0.9Save family fasta files after recruiting sequences with hmmsearch
booleanUse these parameters to control the redundancy removal subworkflow execution.
Skip removal of between-family redundancy via hmmsearch sequence to family model matching.
booleanFlag to skip merging of similar families.
booleanhmmsearch minimum length percentage filter of hit env vs query length, for redundant family removal
number1hmmsearch minimum length percentage of hit env vs query length, to flag and report similar families (and to optionally merge)
number0.9Save only the fasta files of non-redundant families (might still contain redundant sequences)
booleanSkip removal of inside-family redundancy of sequences via mmseqs clustering.
booleanmmseqs parameter for minimum sequence identity
number0.9mmseqs parameter for minimum sequence coverage ratio
number0.9mmseqs parameter for coverage mode: 0 for both, 1 for target and 2 for query sequence
integerSave the final family fasta files with sequence redundancy removed
boolean