Need help in repeating rows if found in list.
- If found value in list increment count
- If more than one instance found then repeat the row by incrementing count
Dataframe:
data = {'First_Name':['Tom', 'Nick', 'Daniel', 'Jack'], 'oter_col':['other_value1', 'other_value2', 'other_value3', 'other_value4']} df = pd.DataFrame(data) df["repeation"] = 0
Df looks like:
First_Name oter_col count 0 Tom other_value1 0 1 Nick other_value2 0 2 Daniel other_value3 0 3 Jack other_value4 0
Input list:
Full_Name = ["Tom Cruise", "Tom Holland", "Tom Hardy", "Jack black", "Chris Hemsworth"]
In need output like:
First_Name Full_Name oter_col count 0 Tom Tom Cruise other_value1 1 1 Tom Tom Holland other_value1 2 2 Tom Tom Hardy other_value1 3 3 Jack Jack black other_value4 1 4 Nick other_value2 1 5 Daniel other_value3 0
Tried something like this to get first matching index so I can repeat the row but not sure how to do it.
for name in Full_Name: m = df.First_Name.str.contains(name.split()[0]) first_index_found = m.idxmax() if m.any() else None if type(first_index_found) == int: print(first_index_found)
Advertisement
Answer
You can use:
# Create a regex pattern to extract First_Name from Full_Name pat = fr"b({'|'.join(df['First_Name'])})b" # Create a dataframe from Full_Name df1 = pd.DataFrame({'Full_Name': Full_Name}) df1['First_Name'] = df1['Full_Name'].str.extract(pat) # Merge them on First_Name column out = df.merge(df1, on='First_Name', how='left') # Count (choose one) out['repeation'] = out.groupby('First_Name').cumcount().add(1) # OR out['repeation2'] = (out.dropna().groupby('First_Name').cumcount().add(1) .reindex(out.index, fill_value=0))
Output:
>>> out First_Name oter_col repeation Full_Name repeation2 0 Tom other_value1 1 Tom Cruise 1 1 Tom other_value1 2 Tom Holland 2 2 Tom other_value1 3 Tom Hardy 3 3 Nick other_value2 1 NaN 0 4 Daniel other_value3 1 NaN 0 5 Jack other_value4 1 Jack black 1