Skip to content
Advertisement

Subtracting the values in another DataFrame from every column in a pandas DataFrame

I have two DataFrames of 20 rows and 4 columns. The names and value types of the columns are the same. One of the columns is the title, the other 3 are values.

df1
title  col1 col2 col3
apple    a    d    g
pear     b    e    h
grape    c    f    i

df2
title  col1 col2 col3
carrot   q    t    w
pumpkin  r    u    x
sprouts  s    v    y

Now I would like to create 3 separate tables/lists subtracting each value of df1.col1 - df2.col1 | df1.col2 - df2.col2 | df1.col3 - df2.col3. For df1.col1 - df2.col1 I expect an output that looks something among the lines of:

df1.title  df2.title score
apple      carrot    (a - q)
apple      pumpkin    (a - r)
apple      sprouts   (a - s)
pear       carrot    (b - t)
pear       pumpkin   (b - u)
pear       sprouts   (b - v)
grape      carrot    (c - w)
grape      pumpkin   (c - x)
grape      sprouts   (c - y)

I tried to create a for loop using the following code:

for i in df1.iterrows():
    score_col1 = df1.col1[[i]] - df2.col2[[j]]
    score_col2 = df1.col2[[i]] - df2.col2[[j]]
    score_col3 = df1.col3[[i]] - df2.col3[[j]]
    score_total = score_col1 + score_col2 + score_col3
    i = i + 1

In return, I received an output for score_col1 looking like this:

df1.title  df2.title score
apple      carrot    (a - q)
pear       carrot    (b - t)
grape      carrot    (c - w)

Can someone help me to obtain the expected output?

Advertisement

Answer

a1 = ['apple','pear', 'banana']
b1 = [56,32,23]
c1 = [12,34,90]
d1 = [87,65,23]

a2 = ['carrot','pumpkin','sprouts']
b2 = [16,12,93]
c2 = [12,32,70]
d2 = [81,55,21]

df1 = pd.DataFrame({'title':a1, 'col1':b1, 'col2':c1, 'col3':d1})
df2 = pd.DataFrame({'title':a2, 'col1':b2, 'col2':c2, 'col3':d2})

res_df = pd.DataFrame([])
cols = ['col1','col2','col3']

for c in cols:
    res_df = pd.DataFrame([])
    for i,j in df1.iterrows():
        for k,l in df2.iterrows():
            res_df = res_df.append(pd.DataFrame({'title_df1':j.title, 'title_df2':l.title, 'score':j[str(c)] - l[str(c)]},index=[0]), ignore_index=True)

    print(res_df)
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement