Skip to content
Advertisement

Remove rows from dataframe based on cervine string list of values

I am trying to remove rows from a dataframe based on multiple conditions, so I defined two lists of the keywords that I want to check before the row is deleted. the condition is when it matches the first list delete unless it contains one of the keywords in the second list.Sample Input and expected Output. INPUT AND OUTPUT

 df=pd.read_csv('/content/file.csv',usecols=['date','username','name','tweet'])


List1=['USA','UK','IQ','KW']
List2=['Eygept','Cairo']
df[df["name"].str.contains('|'.join(List1))==False & df["tweet"] != List2]

TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in na_logical_op(x, y, op)
    265         #  (xint or xbool) and (yint or bool)
--> 266         result = op(x, y)
    267     except TypeError:

7 frames
TypeError: unsupported operand type(s) for &: 'bool' and 'str'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
pandas/_libs/ops.pyx in pandas._libs.ops.scalar_binop()

TypeError: unsupported operand type(s) for &: 'bool' and 'str'

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in na_logical_op(x, y, op)
    290                     f"Cannot perform '{op.__name__}' with a dtyped [{x.dtype}] array "
    291                     f"and scalar of type [{typ}]"
--> 292                 ) from err
    293 
    294     return result.reshape(x.shape)

TypeError: Cannot perform 'rand_' with a dtyped [object] array and scalar of type [bool]

Advertisement

Answer

You can try:

df[~df["name"].astype(str).str.contains('|'.join(List1)) | df["tweet"].astype(str).str.contains('|'.join(List2))]

Edit (2021-06-30):

Base on the sample input and output, you can get the result by:

df[df["name"].astype(str).str.contains('|'.join(List1)) & df["tweet"].astype(str).str.contains('|'.join(List2))]

Result:

        name                            tweet
0    SAM_USA  nice weather in Eygept #WEATHER
4  TOMAS_USA  nice weather in Eygept #WEATHER
6     TOM_KW  nice weather in Eygept #WEATHER
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement