Skip to content
Advertisement

Parsing a Pandas Dataframe

I have a dataframe like this;

Table_Name,Field_Names
t_001_Projects,"['r01_P_Order', 'p_Code', 'r03_S_Name', 'r04_L_Name', 'r05_Employer']"
t_101_Milestones,"['rep_Month', 'p_Code', 'mls_Code', 'mls_Desc', 'mls_Date_Target', 'mls_Date_Current', 'mls_Progress']"
t_201_Imp_Topics,"['rep_Month', 'p_Code', 'tpc_Short', 'tpc_Long']"
t_202_Claims_and_Pnts,"['rep_Month', 'p_Code', 'cap_Doc_Type', 'cap_Doc_Date', 'cap_Name', 'cap_Explaination', 'cap_Status', 'cap_Cost', 'cap_Curr']"
t_203_Pers_Plan,"['rep_Month', 'p_Code', 'tpc_Short', 'p_Code', 'tpc_Short', 'p_Code', 'tpc_Short', 'p_Code', 'tpc_Short', 'p_Code', 'tpc_Short']"
t_204_Mach_Plan,"['rep_Month', 'p_Code', 'mcp_Owner', 'mcp_Type', 'mcp_Count']"

Looking like this in Jupyter notebook output;

enter image description here

I want to parse this table so that table name repeats with each field name and column counts remain the same such as the output dataframe will look like;

Table_Name,     Field_Names
t_001_Projects  'r01_P_Order'
t_001_Projects  'p_Code', 
t_001_Projects  'r03_S_Name'
t_001_Projects  'r04_L_Name'

I tried this code from a stackoverfow solution;

list_temp = [df[x].apply(pd.Series).stack() for x in df.columns]
df_parsed = pd.concat( list_temp, axis=1).reset_index( level=1, drop=True)
df_parsed.columns = df.columns
df_parsed

But it did not work. In the solution;

Unlist multiple columns into rows with pandas

which I also successfully replicated, code is working perfectly. However in my case, the output stays same with input. What am I missing ? Thanks in advance…

Advertisement

Answer

The problem was that I was exported dataframe to csv. During export lists were converted to string. Then when I import dataframe lists were strings instead of their original form.

I found the answer here.

Pandas DataFrame stored list as string: How to convert back to list

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement