I am trying to remove rows from a dataframe based on multiple conditions, so I defined two lists of the keywords that I want to check before the row is deleted. the condition is when it matches the first list delete unless it contains one of the keywords in the second list.Sample Input and expected Output. INPUT AND OUTPUT
df=pd.read_csv('/content/file.csv',usecols=['date','username','name','tweet']) List1=['USA','UK','IQ','KW'] List2=['Eygept','Cairo'] df[df["name"].str.contains('|'.join(List1))==False & df["tweet"] != List2]
TypeError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in na_logical_op(x, y, op) 265 # (xint or xbool) and (yint or bool) --> 266 result = op(x, y) 267 except TypeError: 7 frames TypeError: unsupported operand type(s) for &: 'bool' and 'str' During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) pandas/_libs/ops.pyx in pandas._libs.ops.scalar_binop() TypeError: unsupported operand type(s) for &: 'bool' and 'str' The above exception was the direct cause of the following exception: TypeError Traceback (most recent call last) /usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in na_logical_op(x, y, op) 290 f"Cannot perform '{op.__name__}' with a dtyped [{x.dtype}] array " 291 f"and scalar of type [{typ}]" --> 292 ) from err 293 294 return result.reshape(x.shape) TypeError: Cannot perform 'rand_' with a dtyped [object] array and scalar of type [bool]
Advertisement
Answer
You can try:
df[~df["name"].astype(str).str.contains('|'.join(List1)) | df["tweet"].astype(str).str.contains('|'.join(List2))]
Edit (2021-06-30):
Base on the sample input and output, you can get the result by:
df[df["name"].astype(str).str.contains('|'.join(List1)) & df["tweet"].astype(str).str.contains('|'.join(List2))]
Result:
name tweet 0 SAM_USA nice weather in Eygept #WEATHER 4 TOMAS_USA nice weather in Eygept #WEATHER 6 TOM_KW nice weather in Eygept #WEATHER