I have half written a code and got stuck at 2nd half.
I have pulled info from a text doc and I have placed the info into pandas dataset column with data like
Job : one Time :17:00 Date : Tuesday Cost:200 Job:Two Time :18:00 Location :England Job :Three Time :12:00 Date :Monday Location :England Cost:1200
I have pulled all unique categories into another dataset and made them into columns
Job|Time|Date|Location|Cost
Now I want to cycle through the 1st dataset and fill out Each new job per row into the relevant columns, happy for the columns that don’t have data to be blank…
Job|Time|Date|Location|Cost One|17:00|Tuesday| |200 Two|18:00| |England| Three|12:00|Monday|England|1200
This is example. I actually have 33 unique categories.
I have no idea how to start coding this part (or if it’s possible). New to scripting so been happy to get this far..
I am using pandas as putting info into tables helps my brain because I am pulling info from a text doc and python likes to squash it all up but not sure if using pandas is limiting my options.
Any help /tips appreciated
Not sure if helps but my code so far
Advertisement
Answer
You can split data
column by the first occurrence of ;
into two columns, and group by the Job
delimiter then transpose each group.
df[['key', 'val']] = df['data'].str.split(':', 1, expand=True).apply(lambda x: x.str.strip()) m = (df['key'] == 'Job').cumsum() df_ = df[['key', 'val']].groupby(m).apply(lambda group: group.set_index('key').T).reset_index(drop=True)
print(df_) key Job Time Date Cost Location 0 one 17:00 Tuesday 200 NaN 1 Two 18:00 NaN NaN England 2 Three 12:00 Monday 1200 England