I have a third column in my data frame where I want to be able to create a fourth column that looks almost the same, except it has no double quotes and there is a ‘user/’ prefix before each ID in the list. Also, sometimes it is just a single ID vs. list of IDs (as shown in example DF).
original
JavaScript
x
6
1
col1 col2 col3
2
01 01 "ID278, ID289"
3
4
02 02 "ID275"
5
6
desired
JavaScript
1
6
1
col1 col2 col3 col4
2
01 01 "ID278, ID289" user/ID278, user/ID289
3
4
02 02 "ID275" user/ID275
5
6
Advertisement
Answer
Given:
JavaScript
1
5
1
col1 col2 col3
2
0 1.0 1.0 "ID278, ID289"
3
1 2.0 2.0 "ID275"
4
2 2.0 1.0 NaN
5
Doing:
JavaScript
1
8
1
df['col4'] = (df.col3.str.strip('"') # Remove " from both ends.
2
.str.split(', ') # Split into lists on ', '.
3
.apply(lambda x: ['user/' + i for i in x if i] # Apply this list comprehension,
4
if isinstance(x, list) # If it's a list.
5
else x)
6
.str.join(', ')) # Join them back together.
7
print(df)
8
Output:
JavaScript
1
5
1
col1 col2 col3 col4
2
0 1.0 1.0 "ID278, ID289" user/ID278, user/ID289
3
1 2.0 2.0 "ID275" user/ID275
4
2 2.0 1.0 NaN NaN
5