Here is a dataframe data_1
.
data_1=pd.DataFrame({'id':['1','1','1','1','1','2','2','2','2','2'], 'date':['20220325','20220325','20220325','20220327','20220327','20220705','20220705','20220706','20220706','20220706'], 'base':["wt","bmi","p","wt","wt","bmi","bmi","wt","p","bmi"], 'value':['75','21','25','76','77','19','18','85','23','19']}, ) data_1['id'] = pd.to_numeric(data_1['id'], errors='coerce') data_1['date'] = pd.to_numeric(data_1['date'], errors='coerce') data_1['value'] = pd.to_numeric(data_1['value'], errors='coerce')
I want to make this data_1
as follows:
data_1=pd.DataFrame({'id':[1,1,1,2,2,2], 'date':[20220325,20220327,20220327,20220705,20220705,20220706], 'wt':[75,76,77,"","",85], 'bmi':[21,"","",19,18,19], 'p':[25,"","","","",23]})
I tried pivot_table
,but the output is not the same as I expected.
Moreover, I need to save the data_1
as csv file, but the there are no columns id
and date
in the csv file that I made.
Is there any method to change the data_1
as my expected output?
Advertisement
Answer
Because you want the duplicates still in your pivot table, this was an interesting question. My solution is to make a pivot table for the non-duplicates and the duplicates and then concatenate them together.
tf = data_1[['id', 'date', 'base']].duplicated() df1 = data_1[~tf] df2 = data_1[tf] df1 = pd.pivot_table(df1, values='value', index=['id', 'date'], columns='base').reset_index() df2 = pd.pivot_table(df2, values='value', index=['id', 'date'], columns='base').reset_index() data_1 = pd.concat([df1, df2]).fillna('')
Output:
>>> data_1 base id date bmi p wt 0 1 20220325 21.0 25.0 75.0 1 1 20220327 76.0 2 2 20220705 19.0 3 2 20220706 19.0 23.0 85.0 0 1 20220327 77.0 1 2 20220705 18.0
Then to a csv file use data_1.to_csv(file_path, index=False)