I’m trying to clean some data from a csv file. Here’s an example of the data I’m importing. What I’m trying to do is split the cell by the first comma. The data before the comma goes to one column, the data after goes to another.
I’ve set up a function to handle the data:
def soil_description(text_block): print(text_block, type(text_block)) text_list = text_block.split(',') print(1) major = text_list[0] print(2) if len(major) == len(text_block): print(3) major = '' minor = text_block else: print(4) minor = text_block[(len(major)+2):] print(major, type(major)) print(minor, type(minor)) return major, minor
The csv file is loaded into a dataframe fgd_og_df and I’m trying to put it in two columns of the soil_gINT_df dataframe using the following:
soil_gINT_df[['USCS Major Constituent 1'],['Additional Description']] = fgd_og_df.apply(lambda row:pd.Series(soil_description(row['Description'])),axis=1)
This gives me the following error:
Traceback (most recent call last): File "C:/Users/main.py", line 128, in <module> soil_gINT_df[['USCS Major Constituent 1'],['Additional Description']] = fgd_og_df.apply(lambda row:pd.Series(soil_description(row['Description'])),axis=1) File "C:Usersvenvlibsite-packagespandascoreframe.py", line 3645, in __setitem__ self._set_item_frame_value(key, value) File "C:Usersvenvlibsite-packagespandascoreframe.py", line 3770, in _set_item_frame_value if key in self.columns: File "C:Usersvenvlibsite-packagespandascoreindexesbase.py", line 5008, in __contains__ hash(key) TypeError: unhashable type: 'list'
This is the output immediately before the error:
SHALE. <class ‘str’>
1
2
3
<class ‘str’>
SHALE. <class ‘str’>
So it’s getting through the function and giving an error when it tries to take the two strings back into the soil_gINT_df dataframe, but I’m not sure why. There are other text strings that are similar that work, so I’m at a bit of a loss.
Advertisement
Answer
The mistake was here, see chepner’s comment pointing out that the first is a tuple of lists:
soil_gINT_df[['USCS Major Constituent 1'],['Additional Description']]
Should be:
soil_gINT_df[['USCS Major Constituent 1', 'Additional Description']]