I have a system that lets me export data in a table of this format:
data = {'record_id':[1,1,1,1,2,2,3,3,3],
'name':['AA',"","","",'BB',"",'CC',"",""],
'data':["",'foo1','foo2','foo3',"",'foo4',"",'foo5','foo6']}
df = pd.DataFrame(data)
print(df)
record_id name data
0 1 AA
1 1 foo1
2 1 foo2
3 1 foo3
4 2 BB
5 2 foo4
6 3 CC
7 3 foo5
8 3 foo6
where there are many columns like ‘data’ and they can have any values that don’t necessarily follow a pattern. I need to get the data into this format:
data_out = {'name':['AA','AA','AA','BB','CC','CC'],'data':['foo1','foo2','foo3','foo4','foo5','foo6']}
df_out = pd.DataFrame(data_out)
print(df_out)
name data
0 AA foo1
1 AA foo2
2 AA foo3
3 BB foo4
4 CC foo5
5 CC foo6
I’ve tried reading the documentation on gropuby and searching similar questions, but I can’t find a way to collapse just the 1st 2 rows of each group into 1 row.
Advertisement
Answer
Use df.replace–
df = df.replace('', np.nan)
df['name'] = df['name'].ffill()
df = df.dropna(subset=['data'])
Output
record_id name data 1 1 AA foo1 2 1 AA foo2 3 1 AA foo3 5 2 BB foo4 7 3 CC foo5 8 3 CC foo6
You can optionally drop the record_id column with a .drop() at the end to get the desired output df_out