Getting started#
Motivation#
Latch’s snakemake integration allows developers to build graphical interfaces to expose their workflows to wet lab teams. It also provides managed cloud infrastructure for the execution of the workflow’s jobs.
A primary design goal for the Snakemake integration is to allow developers to register existing projects with minimal added boilerplate and modifications to code. Here, we outline these changes and why they are needed.
How to Upload a Snakemake Workflow#
Recall a snakemake project consists of a Snakefile
, which describes workflow
rules in an “extension” of Python, and associated python code imported and called by these rules. To make this project compatible with Latch, we need to do the following:
Identify and construct explicit parameters for each file dependency in
latch_metadata.py
Build a container with all runtime dependencies
Ensure your
Snakefile
is compatible with cloud execution
In this guide, we will walk through how you can upload a simple Snakemake workflow to Latch.
The example being used here comes from the short tutorial in Snakemake’s documentation.
Prerequisites#
pip install latch[snakemake]
Step 1#
First, initialize an example Snakemake workflow:
latch init snakemake-wf --template snakemake
The workflow generated contains what is typically seen in a Snakemake workflow, such as python scripts and a Snakefile.
snakemake-wf
├── Dockerfile # Latch specific
├── Snakefile
├── data
│ ├── genome.fa
│ ├── genome.fa.amb
│ ├── genome.fa.ann
│ ├── genome.fa.bwt
│ ├── genome.fa.fai
│ ├── genome.fa.pac
│ ├── genome.fa.sa
│ └── samples
│ ├── A.fastq
│ ├── B.fastq
│ └── C.fastq
├── environment.yaml
├── scripts
│ ├── __pycache__
│ │ └── plot-quals.cpython-39.pyc
│ └── plot-quals.py
├── version
└── wf
To make the workflow compatible to execute on Latch, two additional files are needed:
Dockerfile
to specify dependencies the workflow needs to runlatch_metadata.py
to specify workflow parameters to expose on the user interface.
In this tutorial, we will walk through how these two files can be constructed.
Step 2: Add a metadata file#
The latch_metadata.py
is used to specify the input parameters that the Snakemake workflow needs to run.
For example, by examining the Snakefile, we determine there are two parameters that the workflow needs: a reference genome and a list of samples to be aligned against the reference genome.
# latch_metadata.py
from latch.types.metadata import SnakemakeMetadata, SnakemakeFileParameter
from latch.types.directory import LatchDir
from latch.types.metadata import LatchAuthor, LatchMetadata, LatchParameter
from pathlib import Path
SnakemakeMetadata(
display_name="snakemake_tutorial_workflow",
author=LatchAuthor(
name="latchbio",
),
parameters={
"samples" : SnakemakeFileParameter(
display_name="Sample Input Directory",
description="A directory full of FastQ files",
type=LatchDir,
path=Path("data/samples"),
),
"ref_genome" : SnakemakeFileParameter(
display_name="Indexed Reference Genome",
description="A directory with a reference Fasta file and the 6 index files produced from `bwa index`",
type=LatchDir,
path=Path("genome"),
),
},
)
For each LatchFile
/LatchDir
parameter, the path
keyword specifies the path where files will be copied before the Snakemake workflow is run and should match the paths of the inputs for each rule in the Snakefile.
If your Snakemake project has an existing config.yaml
file, you can automatically generate the latch_metadata.py
file by typing:
latch generate-metadata <path_to_config.yaml>
Step 3: Add dependencies#
Next, create an environment.yaml
file to specify the dependencies that the Snakefile needs to run successfully:
# environment.yaml
channels:
- bioconda
- conda-forge
dependencies:
- snakemake=7.25.0
- jinja2
- matplotlib
- graphviz
- bcftools =1.15
- samtools =1.15
- bwa =0.7.17
- pysam =0.19
- pygments
A Dockerfile can be automatically generated by typing:
latch dockerfile snakemake-wf --snakemake
Step 3: Upload the workflow to Latch#
Finally, type the following command to register the workflow to Latch:
cd snakemake-wf &&\
latch register . --snakefile Snakefile
During registration, a workflow image is built based on dependencies specified in the environment.yaml
file. Once the registration finishes, the stdout
provides a link to your workflow on Latch.
Step 4: Run the workflow#
Snakemake support is currently uses JIT (Just-In-Time) registration. This means that the workflow produced by latch register
will register a second workflow, which will run the actual Snakemake jobs.
Once the workflow finishes running, results will be deposited to Latch Data under the Snakemake Outputs
folder.
Next Steps#
Learn more about the lifecycle of a Snakemake workflow on Latch by reading our manual.
Learn about how to modify Snakemake workflows to be cloud-compatible here.
Visit troubleshooting to diagnose and find solutions to common issues.
Visit the repository of public examples of Snakemake workflows on Latch.