Skip to content
Advertisement

Create a dataframe based on 3 linked dataframes using a constraint on cumsum

I do have three dataframes like this:

JavaScript

that looks as follows

JavaScript

and I would like to create another dataframe using these 3 dataframes that looks as follows:

JavaScript

Here is the logic for C1:

  1. First, one checks the first value in column C1 in df3 which is an a.
  2. Second, one checks in df2 where one first finds the letter determined in 1) – in our case an a for the respective column (here: C1) and notes down the value of position1 (here: 11).
  3. Now one goes to df1 and notes down the respective value for C1 (here: 2)
  4. That gives us the first row of the desired outcome: position2 = 1, position1 = 11 and the value = 2.

So far, so good. The issue comes in due to a constraint:

In df2 each position1 can only be used as long as the sum of all corresponding values from df1 do not exceed 10; if that happens the next valid position in df2 should be found.

So, for the example above: In df3 if I go to the next row in C1 I again find an a, therefore I again check df2 and end up again with position1 = 11. If I check in df1 I find a value of 7, the cumulative sum would be 9 which is below 10, so all good and I have the next row of my desired dataframe: position2 = 2, position1 = 11 and the value = 7.

Now I go to the next row in df3 in column C1 and find a b, checking df2 gives me position 13 and the value from df is 3, so I get the row: position2 = 3, position1 = 13 and the value = 3.

Doing it once more gives position2 = 4, position1 = 13 and the value = 6.

Doing it again, gives me now letter a again which would point to position1 = 11 in df2. The value from df1 is 5; as the cumulative sum is already 9, I cannot use this position but have to find the next one in df2 which is position2 = 14. Therefore I can add the row: position2 = 5, position1 = 14 and the value = 5.

And so on…

I am struggling with incorporating the check for the cumsum. Does anyone see an elegant solution to create the desired dataframe from the 3 inputs? Only solutions I have contain several loops and the code is not very readable.

The example might be tricky to follow but I could not design an easier one.

Advertisement

Answer

The answer by @mitoRibo got me on the right track; pd.melt is indeed key to solve it, it seems. Here is my solution with a few comments:

JavaScript

Now we can assign groups to the individual rows based on threshold: whenever threshold is exceeded, a new label is created for each column, mapper group.

JavaScript

Now we can also determine the respective group labels in df2

JavaScript

and the only thing left to do is to merge df2 and df3_long

JavaScript

Now we can check whether result is equal to desired

JavaScript

which is indeed the case.

Might be better options, so, please post them! And thanks again to mitoRibo for the inspiration!

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement