If there are any cells with a comma (if condition), I would like to separate them out and pick the last one, something like:
The original table is like here below:
| index | x1 | x2 |
|---|---|---|
| 0 | banana | orange |
| 1 | grapes, Citrus | apples |
| 2 | tangerine, tangerine | melons, pears |
which is going to be changed to like below:
| index | x1 | x2 |
|---|---|---|
| 0 | banana | orange |
| 1 | Citrus | apples |
| 2 | tangerine | pears |
As you can see, for each cell the second fruit name was selected by iterating over all cells in dataframe.
In order to do that, I would like to use apply with a function that separates by comma, but please let me know if there’s a better way to do that.
Thanks.
Advertisement
Answer
You can access that with .str accessor:
>>> df
x1 x2
index
0 banana orange
1 grapes, Citrus apples
2 tangerine, tangerine melons, pears
>>> df.apply(lambda col: col.str.split(', ').str[-1], axis=1)
x1 x2
index
0 banana orange
1 Citrus apples
2 tangerine pears
Or, in steps:
>>> df['x1'] = df['x1'].str.split(', ').str[-1]
>>> df['x2'] = df['x2'].str.split(', ').str[-1]