Skip to content
Advertisement

how to split a column based on a character and append the rest of columns with each split

Consider I have a dataframe:

a = [['A','def',2,3],['B|C','xyz|abc',56,3],['X|Y|Z','uiu|oi|kji',65,34],['K','rsq',98,12]]
df1 = pd.DataFrame(a, columns=['1', '2','3','4'])
df1
    1   2   3   4
0   A   def 2   3
1   B|C xyz|abc 56  3
2   X|Y|Z   uiu|oi|kji  65  34
3   K   rsq 98  12

First, how do I print all the rows that has “|” in column 1? I am trying the following but it prints all rows of the frame:

df1[df1[1].str.contains("|")]

Second, how do I split the column 1 and column 2 on “|”, so that each split in column 1 gets its corresponding split from column 2 and the rest of the data is appended to each split. For example, I want something like this from df1:

    1   2   3   4
0   A   def 2   3
1   B   xyz 56  3
2   C   abc 56  3
3   X   uiu 65  34
4   Y   oi  65  34
5   Z   kji 65  34
6   K   rsq 98  12

Advertisement

Answer

You can use custom lambda function with Series.str.split and Series.explode for columns specified in list and then add all another columns in DataFrame.join:

splitter = ['1','2']
cols = df1.columns.difference(splitter)
f = lambda x: x.str.split('|').explode()
df1 = df1[splitter].apply(f).join(df1[cols]).reset_index(drop=True)
print (df1)
   1    2   3   4
0  A  def   2   3
1  B  xyz  56   3
2  C  abc  56   3
3  X  uiu  65  34
4  Y   oi  65  34
5  Z  kji  65  34
6  K  rsq  98  12

For filter by | what is special regex character or add regex=False to Series.str.contains:

print(df1[df1[1].str.contains("|" regex=False)])

Or escape it by |:

print(df1[df1[1].str.contains("|")])
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement