Similar to pivot table in Python

Here is a dataframe data_1.

data_1=pd.DataFrame({'id':['1','1','1','1','1','2','2','2','2','2'],
                 'date':['20220325','20220325','20220325','20220327','20220327','20220705','20220705','20220706','20220706','20220706'],
                 'base':["wt","bmi","p","wt","wt","bmi","bmi","wt","p","bmi"],
                 'value':['75','21','25','76','77','19','18','85','23','19']},
                   )
data_1['id'] = pd.to_numeric(data_1['id'], errors='coerce')
data_1['date'] = pd.to_numeric(data_1['date'], errors='coerce')
data_1['value'] = pd.to_numeric(data_1['value'], errors='coerce')

JavaScript
​x
 
data_1=pd.DataFrame({'id':['1','1','1','1','1','2','2','2','2','2'],
                 'date':['20220325','20220325','20220325','20220327','20220327','20220705','20220705','20220706','20220706','20220706'],
                 'base':["wt","bmi","p","wt","wt","bmi","bmi","wt","p","bmi"],
                 'value':['75','21','25','76','77','19','18','85','23','19']},
                   )
data_1['id'] = pd.to_numeric(data_1['id'], errors='coerce')
data_1['date'] = pd.to_numeric(data_1['date'], errors='coerce')
data_1['value'] = pd.to_numeric(data_1['value'], errors='coerce')
​

I want to make this data_1 as follows:

data_1=pd.DataFrame({'id':[1,1,1,2,2,2],
                 'date':[20220325,20220327,20220327,20220705,20220705,20220706],
                 'wt':[75,76,77,"","",85],
                 'bmi':[21,"","",19,18,19],
                 'p':[25,"","","","",23]})

JavaScript
 
data_1=pd.DataFrame({'id':[1,1,1,2,2,2],
                 'date':[20220325,20220327,20220327,20220705,20220705,20220706],
                 'wt':[75,76,77,"","",85],
                 'bmi':[21,"","",19,18,19],
                 'p':[25,"","","","",23]})
​

I tried pivot_table ,but the output is not the same as I expected. Moreover, I need to save the data_1 as csv file, but the there are no columns id and date in the csv file that I made. Is there any method to change the data_1 as my expected output?

Answer

Because you want the duplicates still in your pivot table, this was an interesting question. My solution is to make a pivot table for the non-duplicates and the duplicates and then concatenate them together.

tf = data_1[['id', 'date', 'base']].duplicated()

df1 = data_1[~tf]
df2 = data_1[tf]

df1 = pd.pivot_table(df1, values='value', index=['id', 'date'], columns='base').reset_index()
df2 = pd.pivot_table(df2, values='value', index=['id', 'date'], columns='base').reset_index()

data_1 = pd.concat([df1, df2]).fillna('')

JavaScript
 
tf = data_1[['id', 'date', 'base']].duplicated()
​
df1 = data_1[~tf]
df2 = data_1[tf]
​
df1 = pd.pivot_table(df1, values='value', index=['id', 'date'], columns='base').reset_index()
df2 = pd.pivot_table(df2, values='value', index=['id', 'date'], columns='base').reset_index()
​
data_1 = pd.concat([df1, df2]).fillna('')
​

Output:

>>> data_1
base  id      date   bmi     p    wt
0      1  20220325  21.0  25.0  75.0
1      1  20220327              76.0
2      2  20220705  19.0            
3      2  20220706  19.0  23.0  85.0
0      1  20220327              77.0
1      2  20220705  18.0

JavaScript
 
>>> data_1
base  id      date   bmi     p    wt
0      1  20220325  21.0  25.0  75.0
1      1  20220327              76.0
2      2  20220705  19.0            
3      2  20220706  19.0  23.0  85.0
0      1  20220327              77.0
1      2  20220705  18.0            
​

Then to a csv file use data_1.to_csv(file_path, index=False)

Advertisement

Answer