I have two DataFrames of 20 rows and 4 columns. The names and value types of the columns are the same.
One of the columns is the title, the other 3 are values.
df1 title col1 col2 col3 apple a d g pear b e h grape c f i df2 title col1 col2 col3 carrot q t w pumpkin r u x sprouts s v y
Now I would like to create 3 separate tables/lists subtracting each value of df1.col1 - df2.col1 | df1.col2 - df2.col2 | df1.col3 - df2.col3. For df1.col1 - df2.col1 I expect an output that looks something among the lines of:
df1.title df2.title score apple carrot (a - q) apple pumpkin (a - r) apple sprouts (a - s) pear carrot (b - t) pear pumpkin (b - u) pear sprouts (b - v) grape carrot (c - w) grape pumpkin (c - x) grape sprouts (c - y)
I tried to create a for loop using the following code:
for i in df1.iterrows():
score_col1 = df1.col1[[i]] - df2.col2[[j]]
score_col2 = df1.col2[[i]] - df2.col2[[j]]
score_col3 = df1.col3[[i]] - df2.col3[[j]]
score_total = score_col1 + score_col2 + score_col3
i = i + 1
In return, I received an output for score_col1 looking like this:
df1.title df2.title score apple carrot (a - q) pear carrot (b - t) grape carrot (c - w)
Can someone help me to obtain the expected output?
Advertisement
Answer
a1 = ['apple','pear', 'banana']
b1 = [56,32,23]
c1 = [12,34,90]
d1 = [87,65,23]
a2 = ['carrot','pumpkin','sprouts']
b2 = [16,12,93]
c2 = [12,32,70]
d2 = [81,55,21]
df1 = pd.DataFrame({'title':a1, 'col1':b1, 'col2':c1, 'col3':d1})
df2 = pd.DataFrame({'title':a2, 'col1':b2, 'col2':c2, 'col3':d2})
res_df = pd.DataFrame([])
cols = ['col1','col2','col3']
for c in cols:
res_df = pd.DataFrame([])
for i,j in df1.iterrows():
for k,l in df2.iterrows():
res_df = res_df.append(pd.DataFrame({'title_df1':j.title, 'title_df2':l.title, 'score':j[str(c)] - l[str(c)]},index=[0]), ignore_index=True)
print(res_df)