I have a third column in my data frame where I want to be able to create a fourth column that looks almost the same, except it has no double quotes and there is a ‘user/’ prefix before each ID in the list. Also, sometimes it is just a single ID vs. list of IDs (as shown in example DF).
original
col1 col2 col3 01 01 "ID278, ID289" 02 02 "ID275"
desired
col1 col2 col3 col4 01 01 "ID278, ID289" user/ID278, user/ID289 02 02 "ID275" user/ID275
Advertisement
Answer
Given:
col1 col2 col3 0 1.0 1.0 "ID278, ID289" 1 2.0 2.0 "ID275" 2 2.0 1.0 NaN
Doing:
df['col4'] = (df.col3.str.strip('"') # Remove " from both ends. .str.split(', ') # Split into lists on ', '. .apply(lambda x: ['user/' + i for i in x if i] # Apply this list comprehension, if isinstance(x, list) # If it's a list. else x) .str.join(', ')) # Join them back together. print(df)
Output:
col1 col2 col3 col4 0 1.0 1.0 "ID278, ID289" user/ID278, user/ID289 1 2.0 2.0 "ID275" user/ID275 2 2.0 1.0 NaN NaN