Skip to content
Advertisement

Get index of row where pandas column contains regex

I am checking to see if a pandas column matches a pre-defined regex, using .any() to get the first match if found. However, I need to return the index/row where this match occurred so that I can get the value of another column in that row.

I have the below to check where the reg_ex pattern exists in df['id_org']

if df['id_org'].str.contains(pat=reg_ex, regex=True).any()

Once the above evaluates to true, how do I get the index/row that caused the expression to evaluate to true? I would like to use this index so that I can access another column for that same row using pandas df.at[index, 'desired_col'] or .iloc functions.

In the past I have done: df.at[df['id_org'][df['id_org'] == key].index[0], 'desired_col'] however, I can’t use this line of code any more because I am no longer checking for an exact string “key” match bur rather when a regex now matches in that column.

Advertisement

Answer

You can use idxmax combined with any:

reg_ex = 'xxx'

s = df['id_org'].str.contains(pat=reg_ex, regex=True)
out = s.idxmax() if s.any() else None

Or first_valid_index:

s = df['id_org'].str.contains(pat=reg_ex, regex=True)
out = s[s].first_valid_index()

Example of outputs:

# reg_ex = 'e'
1

# reg_ex = 'z'
None

Used input:

  id_org
0    abc
1    def
2    ghi
3    cde

all matches

s = df['id_org'].str.contains(pat=reg_ex, regex=True)
out = s.index[s]

Example for the regex 'e': Int64Index([1, 3], dtype='int64')

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement