Consider I have a dataframe:
a = [['A','def',2,3],['B|C','xyz|abc',56,3],['X|Y|Z','uiu|oi|kji',65,34],['K','rsq',98,12]] df1 = pd.DataFrame(a, columns=['1', '2','3','4']) df1 1 2 3 4 0 A def 2 3 1 B|C xyz|abc 56 3 2 X|Y|Z uiu|oi|kji 65 34 3 K rsq 98 12
First, how do I print all the rows that has “|” in column 1? I am trying the following but it prints all rows of the frame:
df1[df1[1].str.contains("|")]
Second, how do I split the column 1 and column 2 on “|”, so that each split in column 1 gets its corresponding split from column 2 and the rest of the data is appended to each split. For example, I want something like this from df1:
1 2 3 4 0 A def 2 3 1 B xyz 56 3 2 C abc 56 3 3 X uiu 65 34 4 Y oi 65 34 5 Z kji 65 34 6 K rsq 98 12
Advertisement
Answer
You can use custom lambda function with Series.str.split
and Series.explode
for columns specified in list and then add all another columns in DataFrame.join
:
splitter = ['1','2'] cols = df1.columns.difference(splitter) f = lambda x: x.str.split('|').explode() df1 = df1[splitter].apply(f).join(df1[cols]).reset_index(drop=True) print (df1) 1 2 3 4 0 A def 2 3 1 B xyz 56 3 2 C abc 56 3 3 X uiu 65 34 4 Y oi 65 34 5 Z kji 65 34 6 K rsq 98 12
For filter by |
what is special regex character or
add regex=False
to Series.str.contains
:
print(df1[df1[1].str.contains("|" regex=False)])
Or escape it by |
:
print(df1[df1[1].str.contains("|")])