A minimal example of my data looks as follows:
import pandas as pd data = {'address': ['STREET ADDRESS', 'C/O NAME LASTNAMEnOTHER STREET ADDRESS'], 'coaddress':['', '']} df = pd.DataFrame(data) df
I am looking for a way (using pandas
, preferably) to:
- identify rows in which the
address
column contains “C/O”, and - split the string at the newline (
n
) command and output the part of the string before the newline command to the corresponding row in thecoaddress
column and keep the part of the string after the newline command in theaddress
column.
The df I want to achieve looks as follows:
desired_data = {'address': ['STREET ADDRESS', 'OTHER STREET ADDRESS'], 'coaddress':['', 'C/O NAME LASTNAME']} desired_df = pd.DataFrame(desired_data) desired_df
Any suggestions on how to achieve this? Thanks!
Advertisement
Answer
We can do it in full Pandas using loc
, contains
and split
like so :
df.loc[df["address"].str.contains('C/O'), 'coaddress'] = df["address"].str.split('n').str[0] df.loc[df["address"].str.contains('C/O'), 'address'] = df["address"].str.split('n').str[1]
Output :
address coaddress 0 STREET ADDRESS 1 OTHER STREET ADDRESS C/O NAME LASTNAME
UPDATE :
As commented by @Neither, we can avoid some repetition by setting a function here :
def reassign(df, col, elt): condition = df["address"].str.contains('C/O') element = df["address"].str.split('n') df.loc[condition, col] = element.str[elt] return df col_elt = {'coaddress': 0, 'address': 1} for key in col_elt: df = reassign(df, key, col_elt[key])