One of the columns I’m importing into my dataframe is structured as a list. I need to pick out certain values from said list, transform the value and add it to one of two new columns in the dataframe. Before:
Name | Listed_Items |
---|---|
Tom | [“dr_md_coca_cola”, “dr_od_water”, “potatoes”, “grass”, “ot_other_stuff”] |
Steve | [“dr_od_orange_juice”, “potatoes”, “grass”, “ot_other_stuff”, “dr_md_pepsi”] |
Phil | [“dr_md_dr_pepper”, “potatoes”, “grass”, “dr_od_coffee”,”ot_other_stuff”] |
From what I’ve read I can turn the column into a list
df["listed_items"] = df["listed_items"].apply(eval)
But then I cannot see how to find any list items that start dr_md, extract the item, remove the starting dr_md, replace any underscores, capitalize the first letter and add that to a new MD column in the row. Then same again for dr_od. There is only one item in the list that starts dr_md and dr_od in each row. Desired output
Name | MD | OD |
---|---|---|
Tom | Coca Cola | Water |
Steve | Pepsi | Orange Juice |
Phil | Dr Pepper | Coffee |
Advertisement
Answer
What you need to do is make a function that does the processing for you that you can pass into apply
(or in this case, map
). Alternatively, you could expand your list column into multiple columns and then process them afterwards, but that will only work if your lists are always in the same order (see panda expand columns with list into multiple columns). Because you only have one input column, you could use map
instead of apply
.
def process_dr_md(l:list): for s in l: if s.startswith("dr_md_"): # You can process your string further here return l[6:] def process_dr_od(l:list): for s in l: if s.startswith("dr_od_"): # You can process your string further here return l[6:] df["listed_items"] = df["listed_items"].map(eval) df["MD"] = df["listed_items"].map(process_dr_md) df["OD"] = df["listed_items"].map(process_dr_od)
I hope that gets you on your way!