Consider I have a dataframe:
JavaScript
x
9
1
a = [['A','def',2,3],['B|C','xyz|abc',56,3],['X|Y|Z','uiu|oi|kji',65,34],['K','rsq',98,12]]
2
df1 = pd.DataFrame(a, columns=['1', '2','3','4'])
3
df1
4
1 2 3 4
5
0 A def 2 3
6
1 B|C xyz|abc 56 3
7
2 X|Y|Z uiu|oi|kji 65 34
8
3 K rsq 98 12
9
First, how do I print all the rows that has “|” in column 1? I am trying the following but it prints all rows of the frame:
JavaScript
1
2
1
df1[df1[1].str.contains("|")]
2
Second, how do I split the column 1 and column 2 on “|”, so that each split in column 1 gets its corresponding split from column 2 and the rest of the data is appended to each split. For example, I want something like this from df1:
JavaScript
1
9
1
1 2 3 4
2
0 A def 2 3
3
1 B xyz 56 3
4
2 C abc 56 3
5
3 X uiu 65 34
6
4 Y oi 65 34
7
5 Z kji 65 34
8
6 K rsq 98 12
9
Advertisement
Answer
You can use custom lambda function with Series.str.split
and Series.explode
for columns specified in list and then add all another columns in DataFrame.join
:
JavaScript
1
14
14
1
splitter = ['1','2']
2
cols = df1.columns.difference(splitter)
3
f = lambda x: x.str.split('|').explode()
4
df1 = df1[splitter].apply(f).join(df1[cols]).reset_index(drop=True)
5
print (df1)
6
1 2 3 4
7
0 A def 2 3
8
1 B xyz 56 3
9
2 C abc 56 3
10
3 X uiu 65 34
11
4 Y oi 65 34
12
5 Z kji 65 34
13
6 K rsq 98 12
14
For filter by |
what is special regex character or
add regex=False
to Series.str.contains
:
JavaScript
1
2
1
print(df1[df1[1].str.contains("|" regex=False)])
2
Or escape it by |
:
JavaScript
1
2
1
print(df1[df1[1].str.contains("|")])
2