Setting up a fully reproducible analysis

EOS now has the capability to run a fully reproducible analysis, using snakemake. This means that the analysis file describes fully all the steps needed, including the relative dependencies, and can be easily reproduced by anyone with that analysis file and the same EOS version.

Below is a Snakefile that can be used with minimal modifications - the only change you need to make is to set the base directory for all the EOS output, via the basedir variable.

from snakemake.utils import min_version
min_version("7")

rule all:
    input:
        "analysis.yaml",
        expand([f"steps/{s}" for s in shell("eos-analysis list-steps -f analysis.yaml", iterable=True)])
    output:
        touch("steps/all")

STEPS=[s for s in shell("eos-analysis list-steps -f analysis.yaml", iterable=True)]
for s in STEPS:
    rule:
        name: f"{s}"
        params:
            step=s,
            basedir='YOUR_BASEDIR_HERE'
        input:
            "analysis.yaml",
            expand([f"steps/{d}" for d in shell(f"eos-analysis list-step-dependencies -f analysis.yaml {s}", iterable=True)])
        output:
            f"steps/{s}"
        shell:
            "eos-analysis run -f analysis.yaml -b {params.basedir} {params.step} > {output[0]} 2> /dev/null"

If you have snakemake installed, you can run this Snakefile by running the snakemake command in the same directory as the Snakefile and your analysis file, which should be named analysis.yaml. See the snakemake documentation for more information on how to use snakemake, in particular other features such as parallelising locally or batch running on a cluster.