Skip to content
Advertisement

Return minimum “sub-DAG” generated from dictionary

I have an input data and some transformation functions t1, t2, t3, t4, t5, t6. Each of them requires some columns as input and outputs some columns.

JavaScript

The DAG associated with these transformations is

enter image description here

I get as input the columns I want to generate and I should return the sub-DAG required to obtain it (I am not interested only in the DAG nodes, the order also matters) For example,

if the required_columns are ['C1', 'C2,', ..., 'C25'], the output should be ['t1', 't2', 't3', 't5', 't4', 't6'] or other possible path such as ['t1', 't4', 't2', 't6', 't3', 't5']

if the required_columns are [‘C8’, ‘C9’, ‘C10’, ‘C11’, ‘C12’, ‘C13’, ‘C114’, ‘C15’, ‘C16’, ‘C20’, ‘C21’, ‘C22’] I should output [‘t1’, ‘t2’, ‘t3’, ‘t5’]

if the required_columns are [‘C1’, ‘C3’, ‘C5’, ‘C8’, ‘C9’, ‘C10’, ‘C11’, ‘C12’, ‘C17’, ‘C18’, ‘C19’, ‘C20’] I should output [‘t1’, ‘t4’]

Note that the presence / absence of ‘C1’ … ‘C7’ in my required_columns doesn’t influence the desired output, as these columns are not generated by t1, t2, t3, t4, t5, t6

My approach for solving this problem would be:

  1. Create a DAG D based on details dictionary
  2. For each leaf in D if all the output_cols of the current leaf are not in the required_columns -> delete the node and move upwards to the nodes that point to the node and repeat the step
  3. Print the values of the existing topological sorted trimmed DAG

I intuitively think my approach is far from optimal. Given the current input data format (details is a dictionary and required_columns is a list), how can I do better?

Advertisement

Answer

Build the transpose DAG and do depth-first search from the required columns?

JavaScript
Advertisement