user-configurable parameters

Analysis parameters

Any {scamp} parameter that can be provided in the project `_defaults` stanza or under a specific dataset's stanza.

Tag	Description	Type
`adt set path`	Path to antibody-derived tags reference file.	`path`
`barcode`	A barcode identifier, for example BC001.	`string(s)`
`dataset id`	A directory-safe name for a dataset, taken from the `dataset name` if omitted.	`string`
`dataset name`	Human readable name for a dataset in an analysis. The YAML key will be used if omitted.	`string`
`dataset tag`	A very short name for the dataset. This will be appended to cell barcodes so should be very short and concise with no spaces or funny characters. An unhelpful default will be provided but should not be trusted.	`string`
`description`	A short textual description of the dataset, mainly as an aide-memoire. A default value of `dataset name` is used if missing.	`string`
`fastq paths`	Paths to any directory (non-recursively) containing FastQ files for the project.	`paths`
`feature identifiers`	Whether to use `accession` or `name` as the feature index. In the Seurat workflows, the non-selected identifier may be saved into an `RNA_alt` assay in the object. The default is to use feature names.	`string`
`feature types`	A map of assay types in the project and the relevant LIMS IDs.	`map of strings`
`hto set path`	Path to hashtag oligos reference file.	`path`
`index path`	The path to an index for the analysis. If omitted, it is assumed that an index is to be created and will be provided by a {scamp} process.	`path`
`limsid`	Identifier(s) for the sample in the project. This will be used to identify FastQ files for the dataset/sample. No default value can be provided. Some samples provide multiple libraries, so this may be a collection of strings in certain cases.	`string(s)`
`probe set path`	A 10x-provided file linking probes and gene targets.	`path`
`quantification method`	The method used to create the data in `quantification path`. This is a curated set of methods and depends on the analysis workflows: `cell_ranger` and `cell_ranger_arc` for example. This will be provided by {scamp} if a quantification workflow is applied, otherwise it is required.	`string`
`quantification path`	Path to quantified data that can be read and used by an analysis workflow. Can be provided by a {scamp} workflow.	`path`
`vdj index path`	Path to VDJ reference index.	`path`
`workflows`	A collection of (unordered) workflows to apply in an analysis. These are a curated list of workflows available in {scamp} and should be specified as a path. (Spaces will be converted to underscores). Omitting this parameter will prevent workflows from launching but will not cause {scamp} to fail.	`strings`

Project parameters

A reserved stanza that defines the project, rather than specifc data.

Tag	Description	Type
`babs id`	Unique identifier for the project.	`string`
`lab`	The `<last name><first initial>` formatted name of the lab.	`string`
`lims id`	Unique identifier for the project.	`string`
`scientist`	The `<first name>.<last name>` formatted name of the scientist, which may help find data in the filesystem. Be careful with double-barraled or multiple last names.	`string`
`type`	Type of project as recorded by ASF. This is a curated list of: “10X-3prime”, “10X-multiome” etc. The default value is `10X-3prime`.	`string`

Genome parameters

A dictionary of parameters that define a genome. This can be used to define the parameters for a custom genome.

Tag	Description	Type
`assembly`	Name of the genome assembly, such as “mm10”.	`string`
`ensembl release`	Number of Ensembl release, such as 98.	`string`
`fasta file`	Genomic sequence in FastA format. Can be provided by the `fasta path` option. This parameter takes precedence over the `fasta path` parameter.	`file`
`fasta path`	A directory with FastA files that can be used to create a genome index. When provided, the files in the directory will be concatenated together into a genome FastA.No default is provided but is probably only needed to build an index.	`path`
`gtf file`	GTF file of features in the genome. This parameter will be used in preference to the `gtf path`.	`file`
`gtf path`	A directory with GTF files that can be used to quantify activity of features. The files in this directory will be concatenated into a single GTF file and the result used in `gtf file`. No default is provided but is probably only needed to build an index.	`path`
`id`	A directory-safe name of the genome, which will be converted from `assembly` if missing.	`string`
`motifs file`	A JASPAR-formatted file of motifs that can be used by Cell Ranger ARC to build an index. No default is provided.	`file`
`non-nuclear contigs`	A collection of chromosomes in the `genome` that may be treated differently - for example by Cell Ranger ARC to created an index.	`strings`
`organism`	Latin name for the species, such as “mus musculus”.	`string`

Nextflow parameters

Parameters used by the pipeline but are not directly part of {scamp} and specified with the `--` command line option. Default values are defined in `params.config`.

Tag	Description	Type
`only_validate_parameters`	Do not start the piipeline but check and validate that the parameters in `--scamp_file` are probably OK to use. The checks are for types against the expected and whether sufficent parameters were provided for each of a dataset’s `workflows`. Defaults to `false`.	`boolean`
`publish_dir`	The root directory (default: `results`) under which task results will be published.	`path`
`publish_mode`	How results of tasks are outuput, defaults to `copy`. Other modes may affect the pipeline so the only alterantive to `copy` is `symlink`.	`string`
`scamp_file`	YAML file (default: `scamp_file.yaml`) that contains the configuration parameters for the analyses.	`file`
`show_parameter_validation`	Show a summary of the parameters that were checked and validated for each dataset. The default is to not show the summary (`--show_parameter_validation false`). If any parameter fails validation, the summary of failed parameters is printed and scamp will stop.	`boolean`