Skip to content
Advertisement

Pandas look for substring then write in another

So I’m trying to look down a specific column of my csv file for a partial string. If that meets a certain condition, it’ll write something else in a different column.

For example:

Letter Grade     Percentage
     A        Ninety Five Percent
     C        Seventy Three Percent
     B        Eighty Two Percent

The “Percentage” column will always have the same format of “Ninety Five Percent” that is number in words + Percent.

I want to be able to open up a csv file, have python parse the percentage column for the first word such if it contains “Ninety” then it’ll assign “A” in the letter grade column, if it contains “Eighty” then B in the letter grade and so on and so forth.

This is my current code:

import pandas as pd

df = pd.DataFrame(pd.read_csv(r'file.csv', dtype=str))

A = 'Ninety'
B = 'Eighty'
C = 'Seventy'
D = 'Sixty'

if df[df['Percentage'].isin(A)]:
     df['Letter Grade'] = df['Letter Grade'].str.replace['', 'A', regex=False]
elif df[df['Percentage'].isin(B)]:
     df['Letter Grade'] = df['Letter Grade'].str.replace['', 'B', regex=False]
elif df[df['Percentage'].isin(C)]:
     df['Letter Grade'] = df['Letter Grade'].str.replace['', 'C', regex=False]
elif df[df['Percentage'].isin(D)]:
     df['Letter Grade'] = df['Letter Grade'].str.replace['', 'D', regex=False]
else:
     df['Letter Grade'] = df['Letter Grade'].str.replace['', 'F', regex=False]


df.to_csv(r'file.csv', index=False)

It is currently giving me this error: The truth value of a DataFrame is ambiguous.

Advertisement

Answer

Try with numpy.select:

import numpy as np

df["Letter Grade"] = np.select([df["Percentage"].str.contains(A),
                                df["Percentage"].str.contains(B), 
                                df["Percentage"].str.contains(C),
                                df["Percentage"].str.contains(D)], 
                               ["A","B","C","D"])

Alternatively with findall and map:

mapper = {"Ninety": "A", "Eighty": "B", "Seventy": "C", "Sixty": "D"}
df["Letter Grade"] = df["Percentage"].str.findall("|".join(mapper.keys())).str[0].map(mapper)
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement