I have a system that lets me export data in a table of this format:
data = {'record_id':[1,1,1,1,2,2,3,3,3], 'name':['AA',"","","",'BB',"",'CC',"",""], 'data':["",'foo1','foo2','foo3',"",'foo4',"",'foo5','foo6']} df = pd.DataFrame(data) print(df) record_id name data 0 1 AA 1 1 foo1 2 1 foo2 3 1 foo3 4 2 BB 5 2 foo4 6 3 CC 7 3 foo5 8 3 foo6
where there are many columns like ‘data’ and they can have any values that don’t necessarily follow a pattern. I need to get the data into this format:
data_out = {'name':['AA','AA','AA','BB','CC','CC'],'data':['foo1','foo2','foo3','foo4','foo5','foo6']} df_out = pd.DataFrame(data_out) print(df_out) name data 0 AA foo1 1 AA foo2 2 AA foo3 3 BB foo4 4 CC foo5 5 CC foo6
I’ve tried reading the documentation on gropuby
and searching similar questions, but I can’t find a way to collapse just the 1st 2 rows of each group into 1 row.
Advertisement
Answer
Use df.replace
–
df = df.replace('', np.nan) df['name'] = df['name'].ffill() df = df.dropna(subset=['data'])
Output
record_id name data 1 1 AA foo1 2 1 AA foo2 3 1 AA foo3 5 2 BB foo4 7 3 CC foo5 8 3 CC foo6
You can optionally drop the record_id
column with a .drop()
at the end to get the desired output df_out