Skip to content
Advertisement

Is there a function in snakemake to make the list of output dependent on the arguments passed into the shell command

I have a snakemake rule that calls a python program, the output of the python program is dependent on the arguments passed. I would like to make snakemake aware of the differences between the expected output when a certain parameter is passed (Boolean) or not passed.

My current solution is to create a list of output list_phen_gen_output dependent on the configuration of the arguments. However, this increases exponentially when the source program takes in 3 arguments that alter the list of outputs produced by the source program.

See my current solution for one of the arguments. For when extract_genotypes == "T" or when extract_genotypes is false.

if extract_genotypes == "T":
    list_phen_gen_output = [f"{output_dir}phen_{breed}.txt",
        f"{output_dir}non_phenotyped_{breed}.txt",
        f"{output_dir}listcodeall{breed}.txt",
        f"{output_dir}genotypes_{breed}.txt"]
else:
    list_phen_gen_output = [f"{output_dir}phen_{breed}.txt",
        f"{output_dir}non_phenotyped_{breed}.txt",
        f"{output_dir}listcodeall{breed}.txt"]

rule create_phen_gen:
    input:
        f"{output_dir}/ZW.{breed_code}.fwf",
        f"{output_dir}/all_phenotypes.fwf",
    output:
        list_phen_gen_output
    log:
        f"{output_dir}SNAKEMAKE_{breed_code}.log"
    shell:
        f"python {SOURCE}wr_workflow.py {code} {YYMM_S} {extract_genotypes} {run_validation} {post_2000} {val_folder}"

How can I make snakemake outputs dependent on the input parameters of the source program?

Advertisement

Answer

I don’t think there is really a way to use functions as output files in snakemake. You are specializing the rule to one sample at the moment, but if you wanted to extend it to multiple breeds, you will likely need to use checkpoints instead. The basic setup is to make create_phen_gen a checkpoint, output the parent folder as a directory, then the “consuming” rule needs to check the output to decide what to do.

For your current setup (which is fine for a single breed) you can simplify the logic and duplication somewhat. I’m assuming var == "T" indicates an additional file will be present in the outputs:

list_phen_gen_output = [
        f"{output_dir}phen_{breed}.txt",
        f"{output_dir}non_phenotyped_{breed}.txt",
        f"{output_dir}listcodeall{breed}.txt",
    ]
if extract_genotypes == "T":
    list_phen_gen_output.append(f"{output_dir}genotypes_{breed}.txt")

if OTHER_THING == "T":
    list_phen_gen_output.append(f"{output_dir}OTHER_{breed}.txt")
else:
    list_phen_gen_output.append(f"{output_dir}ALT_{breed}.txt")

Should only grow linearly with the number of options.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement