Skip to content
Advertisement

Snakemake: Difference between wildcard.wildcard_name and {wildcard}?

I’m in the process of learning Snakemake, and I’m confused about the difference between wildcard.wildcard_name and {wildcard_name}. For example, if this is the rule:

rule get_genome_fasta:
"""
Retrieve the sequence in fasta format for a genome.
"""
output:
    "data/raw_external/{genome_id}.fa.gz"
params:
    fasta_path = lambda wildcards: config["genomes"][wildcards.genome_id]["fasta"]
log:
    "results/logs/get_genome_fasta/{genome_id}.log"
shell:
    """
    wget {params.fasta_path} -O {output} -o {log}
    """

What is the difference between wildcards.genome_id and {genome_id}? Thank you so much!

Advertisement

Answer

First, you need to be aware that Snakemake is based on Python. It will help if you are familiar with the syntax of this programming language.

You use the two forms in different contexts: {wildcard_name} inside of strings that define file name patterns, wildcards.wildcard_name inside of more “active”/”customized” python code.

In input, output and log file names, you use wildcards as {wildcard_name} when defining the pattern that those files names follow.

When resolving dependencies between rules, Snakemake will match the output file name pattern of rules with the concrete file names it already knows the downstream rules need as input (this process starts from the top-most rule, which should have only concrete file names as input, not patterns (note that if you use expand, this has the effect of generating a list of concrete file names)). When this matching process is successful, several new things will be known by Snakemake:

  • a rule that will be able to provide the required file
  • the values that the wildcards extracted from this rule’s output name pattern will assume during the execution of this rule
  • the concrete names of the log and input file this rule will generate or need respectively

These values will be used to create the wildcards object that you can manipulate in your python code: strings representing the shell commands to execute using shell, python code to run using run, and python code that should be executed when determining params or input file names in case this is done using a function rather than simple strings or file name patterns.

With your example, by matching, say "data/raw_external/D_melanogaster.fa.gz" with "data/raw_external/{genome_id}.fa.gz", Snakemake determines that your get_genome_fasta rule is expected to be able to generate a file "data/raw_external/D_melanogaster.fa.gz", and that to do this, it will need to set value "D_melanogaster" for the wildcard genome_id. This is directly plugged into the log file name pattern. This is also provided as an attribute of a wildcards object which is passed to the function computing the value of the fasta_path param. In this case, you used a “lambda function”, but you could have done the same by defining a standard Python function:

def set_fasta_path(wildcards):
    return config["genomes"][wildcards.genome_id]["fasta"]

# [...]

    params:
        fasta_path = set_fasta_path
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement