Metadata#
The Snakemake framework was designed to allow developers to both define and execute their workflows. This often means that the workflow parameters are sometimes ill-defined and scattered throughout the project as configuration values, static values in the Snakefile
or command line flags.
To construct a graphical interface from a snakemake workflow, the file parameters need to be explicitly identified and defined so that they can be presented to scientists to be filled out through a web application.
The latch_metadata.py
file holds these parameter definitions, along with any styling or cosmetic modifications the developer wishes to make to each parameter.
To generate a latch_metadata.py
file, type:
latch generate-metadata <path_to_config.yaml>
The command automatically parses the existing config.yaml
file in the Snakemake repository, and create a Python parameters file.
Examples#
Below is an example config.yaml
file from the rna-seq-star-deseq2 workflow from Snakemake workflow catalog.
config.yaml
# path or URL to sample sheet (TSV format, columns: sample, condition, ...)
samples: config/samples.tsv
# path or URL to sequencing unit sheet (TSV format, columns: sample, unit, fq1, fq2)
# Units are technical replicates (e.g. lanes, or resequencing of the same biological
# sample).
units: config/units.tsv
ref:
# Ensembl species name
species: homo_sapiens
# Ensembl release (make sure to take one where snpeff data is available, check 'snpEff databases' output)
release: 100
# Genome build
build: GRCh38
trimming:
# If you activate trimming by setting this to `True`, you will have to
# specify the respective cutadapt adapter trimming flag for each unit
# in the `units.tsv` file's `adapters` column
activate: False
pca:
activate: True
# Per default, a separate PCA plot is generated for each of the
# `variables_of_interest` and the `batch_effects`, coloring according to
# that variables groups.
# If you want PCA plots for further columns in the samples.tsv sheet, you
# can request them under labels as a list, for example:
# - relatively_uninteresting_variable_X
# - possible_batch_effect_Y
labels: ""
diffexp:
# variables for whome you are interested in whether they have an effect on
# expression levels
variables_of_interest:
treatment_1:
# any fold change will be relative to this factor level
base_level: B
treatment_2:
# any fold change will be relative to this factor level
base_level: C
# variables whose effect you want to model to separate them from your
# variables_of_interest
batch_effects:
- jointly_handled
# contrasts for the deseq2 results method to determine fold changes
contrasts:
A-vs-B_treatment_1:
# must be one of the variables_of_interest, for details see:
# https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#contrasts
variable_of_interest: treatment_1
# must be a level present in the variable_of_interest that is not the
# base_level specified above
level_of_interest: A
# The default model includes all interactions among variables_of_interest
# and batch_effects added on. For the example above this implicitly is:
# model: ~jointly_handled + treatment_1 * treatment_2
# For the default model to be used, simply specify an empty `model: ""` below.
# If you want to introduce different assumptions into your model, you can
# specify a different model to use, for example skipping the interaction:
# model: ~jointly_handled + treatment_1 + treatment_2
model: ""
params:
cutadapt-pe: ""
cutadapt-se: ""
star: ""
The Python latch_metadata.py
generated from the Latch command:
from dataclasses import dataclass
import typing
from latch.types.metadata import SnakemakeParameter, SnakemakeFileParameter
from latch.types.file import LatchFile
from latch.types.directory import LatchDir
@dataclass
class ref:
species: str
release: int
build: str
@dataclass
class trimming:
activate: bool
@dataclass
class pca:
activate: bool
labels: str
@dataclass
class treatment_1:
base_level: str
@dataclass
class treatment_2:
base_level: str
@dataclass
class variables_of_interest:
treatment_1: treatment_1
treatment_2: treatment_2
@dataclass
class A_vs_B_treatment_1:
variable_of_interest: str
level_of_interest: str
@dataclass
class contrasts:
A_vs_B_treatment_1: A_vs_B_treatment_1
@dataclass
class diffexp:
variables_of_interest: variables_of_interest
batch_effects: typing.List[str]
contrasts: contrasts
model: str
@dataclass
class params:
cutadapt_pe: str
cutadapt_se: str
star: str
# Import these into your `__init__.py` file:
#
# from .parameters import generated_parameters
#
generated_parameters = {
'samples': SnakemakeFileParameter(
display_name='samples',
type=LatchFile,
config=True,
),
'units': SnakemakeFileParameter(
display_name='units',
type=LatchFile,
config=True,
),
'ref': SnakemakeParameter(
display_name='ref',
type=ref,
default=ref(species='homo_sapiens', release=100, build='GRCh38'),
),
'trimming': SnakemakeParameter(
display_name='trimming',
type=trimming,
default=trimming(activate=False),
),
'pca': SnakemakeParameter(
display_name='pca',
type=pca,
default=pca(activate=True, labels=''),
),
'diffexp': SnakemakeParameter(
display_name='diffexp',
type=diffexp,
default=diffexp(variables_of_interest=variables_of_interest(treatment_1=treatment_1(base_level='B'), treatment_2=treatment_2(base_level='C')), batch_effects=['jointly_handled'], contrasts=contrasts(A_vs_B_treatment_1=A_vs_B_treatment_1(variable_of_interest='treatment_1', level_of_interest='A')), model=''),
),
'params': SnakemakeParameter(
display_name='params',
type=params,
default=params(cutadapt_pe='', cutadapt_se='', star=''),
),
}
Once the workflow is registered to Latch, it will receive an interface like below: