proteinfamilies: Parameters

Define where the pipeline should find input data and save output data.

Path to comma-separated file ‘.csv’ containing information about the samples in the experiment.

required

type: string

pattern: ^\S+\.csv$

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required

type: string

Email address for completion summary. Example: name.surname@example.com

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

type: string

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

Base path / URL for data used in the modules

hidden

type: string

Less common options for the pipeline, typically set in a config file.

Display version and exit.

hidden

type: boolean

Method used to save pipeline results to output directory.

hidden

type: string

Email address for completion summary, only when pipeline fails. Example: name.surname@example.com

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Send plain-text email instead of HTML.

hidden

type: boolean

File size limit when attaching MultiQC reports to summary emails. Example: name.surname@example.com

hidden

type: string

default: 25.MB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Do not use coloured log outputs.

hidden

type: boolean

Incoming hook URL for messaging service

hidden

type: string

Custom config file to supply to MultiQC.

hidden

type: string

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

hidden

type: string

Custom MultiQC yaml file containing HTML including a methods description.

type: string

Boolean whether to validate parameters against the schema at runtime

hidden

type: boolean

default: true

Base URL or local path to location of pipeline test dataset files

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/test-datasets/proteinfamilies/

Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.

hidden

type: string

Display the help message.

type: boolean,string

Display the full detailed help message.

type: boolean

Display hidden parameters in the help message (only works when —help or —help_full are provided).

type: boolean

Use these parameters to control the flow of the quality check subworkflow execution.

Skip all default QC steps for sequences (gap trimming, length filtering, validation, duplicate removal).

type: boolean

The minimum allowed sequence length

type: integer

default: 30

The maximum allowed sequence length

type: integer

default: 5000

Remove duplicate input amino acid sequences, based on the sequence.

type: boolean

Use these parameters to control the flow of the clustering subworkflow execution.

Save the db output folder of mmseqs createdb

type: boolean

Choose clustering algorithm. Either simple ‘cluster’ for medium size inputs, or ‘linclust’ for less sensitive clustering of larger datasets.

type: string

mmseqs parameter for minimum sequence identity

type: number

default: 0.3

mmseqs parameter for minimum sequence coverage ratio

type: number

default: 0.5

mmseqs parameter for coverage mode: 0 for both, 1 for target and 2 for query sequence

type: integer

Save the clustering output folder of mmseqs cluster or linclust

type: boolean

Minimum clustering chunk size threshold to create seed Multiple Sequence Alignments upon.

type: integer

default: 25

Save membership-filtered initial mmseqs clusters in fasta format

type: boolean

Use these parameters to control the Multiple Sequence Alignment subworkflow execution.

Choose alignment tool. FAMSA is recommended as best time-memory-accuracy combination option.

type: string

Boolean whether to skip the trimming process of gappy positions from Multiple Sequence Alignments (MSAs)

hidden

type: boolean

Choose the output format of the clipped alignment.

type: string

default: clipkit

Choose if ClipKIT should only clip gaps at the ends of the MSAs.

type: boolean

default: true

Multiple Sequence Alignment (MSA) positions with gappiness greater than this threshold will be trimmed

type: number

default: 0.5

Skip recruitment of additional sequences from the input FASTA file using the family Hidden Markov Models (HMMs) into the full alignment

hidden

type: boolean

Boolean whether to generate target results file of hmmsearch

hidden

type: boolean

Boolean whether to generate domain results file of hmmsearch

hidden

type: boolean

default: true

hmmsearch e-value cutoff threshold for reported results

type: number

default: 0.001

Save the output of hmmsearch (.domtbl.gz and .tbl.gz)

type: boolean

hmmsearch minimum length percentage filter of hit env vs query length

type: number

default: 0.9

Save family fasta files after recruiting sequences with hmmsearch

type: boolean

Use these parameters to control the redundancy removal subworkflow execution.

Skip removal of between-family redundancy via hmmsearch sequence to family model matching.

hidden

type: boolean

Flag to skip merging of similar families.

hidden

type: boolean

hmmsearch minimum length percentage filter of hit env vs query length, for redundant family removal

type: number

default: 1

hmmsearch minimum length percentage of hit env vs query length, to flag and report similar families (and to optionally merge)

type: number

default: 0.9

Save only the fasta files of non-redundant families (might still contain redundant sequences)

type: boolean

Skip removal of inside-family redundancy of sequences via mmseqs clustering.

hidden

type: boolean

mmseqs parameter for minimum sequence identity

type: number

default: 0.9

mmseqs parameter for minimum sequence coverage ratio

type: number

default: 0.9

mmseqs parameter for coverage mode: 0 for both, 1 for target and 2 for query sequence

type: integer

Save the final family fasta files with sequence redundancy removed

type: boolean

nf-core/proteinfamilies

Input/output options

Institutional config options

Generic options

Quality check parameters

Clustering parameters

Alignment parameters

Redundancy removal parameters