I have two data_frames, as below:
df_name: Student_ID Name DOB 0 1 Raju 1993-02-02 1 2 Indu 1987-01-04 2 3 Laya 2000-06-24
df_marks: Student_ID Subject Int1/40 Int2/40 0 1 Eng 10 35 1 1 Tam 30 38 2 1 Mat 20 30 3 1 Sci 15 20 4 2 Eng 35 25 5 2 Tam 25 15 6 2 Mat 22 30 7 2 Sci 29 23 8 3 Eng 18 17 9 3 Tam 19 16 10 3 Mat 27 26
The task is to create a data_frame(below one), where I need to add df_marks['Int1/40']
& df_marks['Int2/40']
, if df_name['Student_ID'] == df_marks['Student_ID']
Student_id Name DOB Tam/50 0 1 Raju 1993-02-02 NaN 1 2 Indu 1987-01-04 NaN 2 3 Laya 2000-06-24 NaN
I tried
df_out['Tam/50'] = df_marks[['Int1/40','Int2/40']].sum(axis=1).where(df_marks['Subject']==df_out['Student_id'])
But its giving error as,
ValueError: Can only compare identically-labeled Series objects
Do we have any simple way to do this?
Regards, Deepak Dash
Advertisement
Answer
Use DataFrame.join
with aggregated sum
for new column in df_name
:
df_marks['Tam/50'] = df_marks[['Int1/40','Int2/40']].sum(axis=1) df_name = df_name.join(df_marks.groupby('Student_ID')['Tam/50'].sum(), on='Student_ID') print (df_name) Student_ID Name DOB Tam/50 0 1 Raju 1993-02-02 198 1 2 Indu 1987-01-04 204 2 3 Laya 2000-06-24 123
Or solution without helper column:
s = (df_marks[['Int1/40','Int2/40']].sum(axis=1) .groupby(df_marks['Student_ID']) .sum() .rename('Tam/50')) df_name = df_name.join(s, on='Student_ID') print (df_name) Student_ID Name DOB Tam/50 0 1 Raju 1993-02-02 198 1 2 Indu 1987-01-04 204 2 3 Laya 2000-06-24 123