I have a system that lets me export data in a table of this format:
JavaScript
x
16
16
1
data = {'record_id':[1,1,1,1,2,2,3,3,3],
2
'name':['AA',"","","",'BB',"",'CC',"",""],
3
'data':["",'foo1','foo2','foo3',"",'foo4',"",'foo5','foo6']}
4
df = pd.DataFrame(data)
5
print(df)
6
record_id name data
7
0 1 AA
8
1 1 foo1
9
2 1 foo2
10
3 1 foo3
11
4 2 BB
12
5 2 foo4
13
6 3 CC
14
7 3 foo5
15
8 3 foo6
16
where there are many columns like ‘data’ and they can have any values that don’t necessarily follow a pattern. I need to get the data into this format:
JavaScript
1
11
11
1
data_out = {'name':['AA','AA','AA','BB','CC','CC'],'data':['foo1','foo2','foo3','foo4','foo5','foo6']}
2
df_out = pd.DataFrame(data_out)
3
print(df_out)
4
name data
5
0 AA foo1
6
1 AA foo2
7
2 AA foo3
8
3 BB foo4
9
4 CC foo5
10
5 CC foo6
11
I’ve tried reading the documentation on gropuby
and searching similar questions, but I can’t find a way to collapse just the 1st 2 rows of each group into 1 row.
Advertisement
Answer
Use df.replace
–
JavaScript
1
4
1
df = df.replace('', np.nan)
2
df['name'] = df['name'].ffill()
3
df = df.dropna(subset=['data'])
4
Output
JavaScript
1
8
1
record_id name data
2
1 1 AA foo1
3
2 1 AA foo2
4
3 1 AA foo3
5
5 2 BB foo4
6
7 3 CC foo5
7
8 3 CC foo6
8
You can optionally drop the record_id
column with a .drop()
at the end to get the desired output df_out