How to extract certain string out of long url kind of string in python using pandas

How to use regex in pandas to extract below field. the below is one of my pandas dataframe column value, but i wanted to only extract ‘eastus’ and keep it as value for this field. how to filter this. this position is always fixed

Sample df:

                          correlationId                                                 id                                                                level   ...      status.value status.localizedValue            tag
0    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  /subscriptions/xxxxxxxxxxxxxxxxxxxxx/resourcegroups/xxxxxxxxxxxx/providers/Microsoft.RecoveryServices/locations/eastus/events/xxxxxxxxxxxx/ticks/xxxxxxxx  Informational  ...    Succeeded             Succeeded  Managed by IT

JavaScript
​x
 
                          correlationId                                                 id                                                                level   ...      status.value status.localizedValue            tag
0    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  /subscriptions/xxxxxxxxxxxxxxxxxxxxx/resourcegroups/xxxxxxxxxxxx/providers/Microsoft.RecoveryServices/locations/eastus/events/xxxxxxxxxxxx/ticks/xxxxxxxx  Informational  ...    Succeeded             Succeeded  Managed by IT
​

command i tried:

if not df.empty:
        columns = ["correlationId","eventName.value","id","resourceGroupName","resourceProviderName.value","operationName.value","status.value","eventTimestamp","submissionTimestamp"]        
        df.columns = df.columns.to_series().apply(lambda x: x.strip())
        #print(df.columns)    
        df.fillna('Missing', inplace=True)
        drop_these = ['correlationId']
        df['Location'] = df.id.str.split("/")[8]

JavaScript
 
if not df.empty:
        columns = ["correlationId","eventName.value","id","resourceGroupName","resourceProviderName.value","operationName.value","status.value","eventTimestamp","submissionTimestamp"]        
        df.columns = df.columns.to_series().apply(lambda x: x.strip())
        #print(df.columns)    
        df.fillna('Missing', inplace=True)
        drop_these = ['correlationId']
        df['Location'] = df.id.str.split("/")[8]
​

but its not working

Error:

 df['Location'] = df.id.split("/")[8]
  File "C:Python37libsite-packagespandascoregeneric.py", line 5274, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'split'

JavaScript
 
 df['Location'] = df.id.split("/")[8]
  File "C:Python37libsite-packagespandascoregeneric.py", line 5274, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'split'
​

any suggestion please

Answer

id = '/subscriptions/xxxxxxxx/resourcegroups/xxxxxxxx/providers/Microsoft.RecoveryServices/' 
     'locations/eastus/events/xxxxxxx/ticks/xxxxx'
df = pd.DataFrame({
    'sample':[id]
})
df['Location'] = df['sample'].str.split("/",expand=True)[8]

print(df)
    sample                                                            Location
0  /subscriptions/xxxxxxxx/resourcegroups/xxxxxxxx/providers/Microsoft.RecoveryServices/locations/eastus/events/xxxxxxx/ticks/xxxxx   eastus

JavaScript
 
id = '/subscriptions/xxxxxxxx/resourcegroups/xxxxxxxx/providers/Microsoft.RecoveryServices/' 
     'locations/eastus/events/xxxxxxx/ticks/xxxxx'
df = pd.DataFrame({
    'sample':[id]
})
df['Location'] = df['sample'].str.split("/",expand=True)[8]
​
print(df)
    sample                                                            Location
0  /subscriptions/xxxxxxxx/resourcegroups/xxxxxxxx/providers/Microsoft.RecoveryServices/locations/eastus/events/xxxxxxx/ticks/xxxxx   eastus
​

Advertisement

Answer