Skip to content
Advertisement

Python pandas – can I sort 1 column dataset into rows of matching data in another dataset

I have half written a code and got stuck at 2nd half.

I have pulled info from a text doc and I have placed the info into pandas dataset column with data like

Job : one
Time :17:00
Date : Tuesday
Cost:200
Job:Two
Time :18:00
Location :England
Job :Three
Time :12:00
Date :Monday
Location :England
Cost:1200

I have pulled all unique categories into another dataset and made them into columns

Job|Time|Date|Location|Cost

Now I want to cycle through the 1st dataset and fill out Each new job per row into the relevant columns, happy for the columns that don’t have data to be blank…

Job|Time|Date|Location|Cost
One|17:00|Tuesday|    |200
Two|18:00|       |England|
Three|12:00|Monday|England|1200

This is example. I actually have 33 unique categories.

I have no idea how to start coding this part (or if it’s possible). New to scripting so been happy to get this far..

I am using pandas as putting info into tables helps my brain because I am pulling info from a text doc and python likes to squash it all up but not sure if using pandas is limiting my options.

Any help /tips appreciated

Not sure if helps but my code so far

Advertisement

Answer

You can split data column by the first occurrence of ; into two columns, and group by the Job delimiter then transpose each group.

df[['key', 'val']] = df['data'].str.split(':', 1, expand=True).apply(lambda x: x.str.strip())

m = (df['key'] == 'Job').cumsum()

df_ = df[['key', 'val']].groupby(m).apply(lambda group: group.set_index('key').T).reset_index(drop=True)
print(df_)

key    Job   Time     Date  Cost Location
0      one  17:00  Tuesday   200      NaN
1      Two  18:00      NaN   NaN  England
2    Three  12:00   Monday  1200  England
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement