Iterating through multiple rows using multiple values from nested dictionary to update data frame in python

Tags: , , , ,



I created nested dictionary to keep multiple values for each combination, example rows in the dictionary is as follows:-

dict = {'A': {B: array([1,2,3,4,5,6,7,8,9,10]), C: array([array([1,2,3,4,5,6,7,8,9,10],...}}

There are multiple As and in that multiple arrays for each array. Now I want to updated the data frame which has following rows:

Col 1 Col 2 Col 3 Col 4
A B 2 10
A C 3 10

In this data frame depending on the value in col 3 I need to create rows so for example A and B will have two rows and then each rows multiply the first value from the dictionary to col 4, for example first row will multiplied by 1 then 2nd by 2 from the array and so on, the output will be as follows:-

Col 1 Col 2 Col 3 Col 4 Col 5
A B 1 10 10
A B 1 10 20
A C 1 10 10
A C 1 10 20
A C 1 10 30

I can access all the value from array by iterating in the dictionary as follows:-

for i in dict:
    for j in dict[i]:
        dict[i][j]  

But then not able to iterate through each row of the data frame to multiply the value for each Col1 and Col2 combination to create Col 5. Please suggest the most optimal way to loop through the dataframe on Col 1 and Col 2 and use the value from dictionary based on the number of rows from from col3 to multiply Col 4, considering there are multiple values for Col 1 and Col 2 combination and dictionary has 10 values for each combination.

EDIT:

Iterating through the dictionary is important as each combination will have different value for simple explanation I put it like 1,2…etc. But the dictionary is getting created through another code in which each combination will have different values,

for example it can be like

"dict = {'A': {B: array([0.5,0.2,3,4,5,6,7,8,9,10]), C: array([array([0.9,0.6,0.2,4,5,6,7,8,9,10],...}}" 

and in this case Col4 of first rows on A and B combination will be multiplied by 0.5, 2nd will be by 0.2, in case and A and C the first row will be with 0.9, 2nd with 0.6 and 3rd with 0.2.

Looking for help that how to iterate through those values from dictionary and update the data frame, also the dictionary have 10 values for each combination and in data frame each combination can have any rows between 0 to 10 so accordingly values needs to be updated.

Answer

EDIT Ver 2: Reference Dict and pick dict index val

The dictionary you created is a big confusing. I assume you wanted to reference it like the way I have shown (not an array of array as shown in C). Also assume B and C are values and not variables B and C.

I created dictionary dct (dict is a reserved word in python), with different values to show that it picks the value not the index.

import pandas as pd
import numpy as np

dct = {'A': {'B': np.array([.2,.4,.6,.8,1.0,1.2,1.4,1.6,1.8,2.0]),
              'C': np.array([.3,.6,.9,1.2,1.5,1.8,2.1,2.4,2.7,3.0])
             }
        }

c = ['Col 1','Col 2','Col 3','Col 4']
d = [['A','B',2,10], ['A','C',3,10]]

df = pd.DataFrame(d,columns=c)

#repeat the values as per times in Col 3. This will create dups in 1 and 2 
df = df.loc[df.index.repeat(df['Col 3'])]

#Now groupby Col 1 and Col 2 and count the number of times we have Col 3 value
#This will give you index to reference the dictionary
df['Col 5'] = (df.groupby(['Col 1','Col 2'])['Col 3'].transform('cumcount'))

#Using the cumcount as index, pick the value from dict using keys Col 1, Col 2 and index Col 5
df['Col 5'] = df.apply(lambda x: dct[x['Col 1']][x['Col 2']][x['Col 5']],axis=1)
print (df)

The output of this will be:

  Col 1 Col 2  Col 3  Col 4  Col 5
0     A     B      2     10    0.2
0     A     B      2     10    0.4
1     A     C      3     10    0.3
1     A     C      3     10    0.6
1     A     C      3     10    0.9

If you want to multiply Col 5 with Col 4 value, its very simple. Change the equation to (multiply Col 4 to results from dictionary value):

df['Col 5'] = df.apply(lambda x: x['Col 4'] * dct[x['Col 1']][x['Col 2']][x['Col 5']],axis=1)

The result of this will be:

  Col 1 Col 2  Col 3  Col 4  Col 5
0     A     B      2     10    2.0
0     A     B      2     10    4.0
1     A     C      3     10    3.0
1     A     C      3     10    6.0
1     A     C      3     10    9.0

EDIT Ver 1: Not referencing dictionary

If you are just looking to have increments of 10 in Col 5 for each group of Col 1 and Col 2, then you can do this.

c = ['Col 1','Col 2','Col 3','Col 4']
d = [['A','B',2,10],
['A','C',3,10]]
import pandas as pd
df = pd.DataFrame(d,columns=c)
df = df.loc[df.index.repeat(df['Col 3'])]
df['Col 5'] = (df.groupby(['Col 1','Col 2'])['Col 3'].transform('cumcount')+1)*10
print (df)

The output of this will be:

  Col 1 Col 2  Col 3  Col 4  Col 5
0     A     B      2     10     10
0     A     B      2     10     20
1     A     C      3     10     10
1     A     C      3     10     20
1     A     C      3     10     30

If you want Col 3 to have a value of 1, then:

df['Col 3'] = 1

This will then result in:

  Col 1 Col 2  Col 3  Col 4  Col 5
0     A     B      1     10     10
0     A     B      1     10     20
1     A     C      1     10     10
1     A     C      1     10     20
1     A     C      1     10     30

If you need it to reference the dictionary, then I need to change the code.



Source: stackoverflow