I am creating a small financial management program which imports my transactions from CSV into Python. I want to assign values to a new column ‘category’ based on strings found in the ‘details’ column. I can do it for one, but my question is how do I do it if I had a huge list of possible strings? For example str.contains('RALPHS')
will replace that column value with ‘groceries’, and so on.
For example, below I have a list of strings:
dining = ['CARLS', 'SUBWAY', 'DOMINOS']
and if either of those strings is found in my series, then it will update the respective category series to be ‘dining’.
Here is a small run-able example below.
import pandas as pd import numpy as np data = [ [-68.23 , 'PAYPAL TRANSFER'], [-12.46, 'RALPHS #0079'], [-8.51, 'SAVE AS YOU GO'], [25.34, 'VENMO CASHOUT'], [-2.23 , 'PAYPAL TRANSFER'], [-64.29 , 'PAYPAL TRANSFER'], [-7.06, 'SUBWAY'], [-7.03, 'CARLS JR'], [-2.35, 'SHELL OIL'], [-35.23, 'CHEVRON GAS'] ] df = pd.DataFrame(data, columns=['amount', 'details']) df['category'] = np.nan str_xfer = 'TRANSFER' df['category'] = (df['details'].str.contains(str_xfer)).astype(int) df['category'] = df['category'].replace( to_replace=1, value='transfer') df amount details category 0 -68.23 PAYPAL TRANSFER transfer 1 -12.46 RALPHS 0 2 -8.51 SAVE AS YOU GO 0 3 25.34 VENMO CASHOUT 0 4 -2.23 PAYPAL TRANSFER transfer 5 -64.29 PAYPAL TRANSFER transfer 6 -7.06 SUBWAY 0 7 -7.03 CARLS JR 0 8 -2.35 SHELL OIL 0 9 -35.23 CHEVRON GAS 0
Thanks much.
Advertisement
Answer
If you have one value, we can use str.extract
:
df['category'] = df['details'].str.extract(f'({str_xfer})')
amount details category 0 -68.23 PAYPAL TRANSFER TRANSFER 1 -12.46 RALPHS #0079 NaN 2 -8.51 SAVE AS YOU GO NaN 3 25.34 VENMO CASHOUT NaN 4 -2.23 PAYPAL TRANSFER TRANSFER 5 -64.29 PAYPAL TRANSFER TRANSFER
If you have multiple strings to match, we have to delimit your strings first by |
, which is the or operator in regular expressions.
str_xfer = ['TRANSFER', 'RALPHS', 'CASHOUT'] str_xfer = '|'.join(str_xfer) df['category'] = df['details'].str.extract(f'({str_xfer})')
amount details category 0 -68.23 PAYPAL TRANSFER TRANSFER 1 -12.46 RALPHS #0079 RALPHS 2 -8.51 SAVE AS YOU GO NaN 3 25.34 VENMO CASHOUT CASHOUT 4 -2.23 PAYPAL TRANSFER TRANSFER 5 -64.29 PAYPAL TRANSFER TRANSFER