I was wondering how could I find all values that start with ‘orange’ from all the columns and parse it into new columns.
JavaScript
x
6
1
data = pd.DataFrame({'a':["mango 2","mango 3",'apple 3', 'orange 1345','orange 2456','banana 1', "watermelon 2","mango 2","mango 3"],
2
'b':["mango 2","mango 3",'apple 3','banana 1', "watermelon 2", 'orange 41134','orange 22145',"mango 2","mango 3"],
3
'c':['apple 3',"mango 2","mango 3" ,"mango 2","mango 3",'banana 1', "watermelon 2",'orange 2222','orange 2341'],
4
'd':["mango 2","mango 3","mango 2","mango 3",'apple 3','banana 1', "watermelon 2","mango 2","mango 3",'orange 9087','orange 0021'],
5
'e':['apple 3', 'orange 1','orange 2','banana 1', "watermelon 2"]})
6
expected output :
JavaScript
1
2
1
df1 = pd.DataFrame({'category':['orange 1345','orange 2456','orange 41134','orange 22145','orange 2222','orange 2341','orange 9087','orange 0021','orange 1','orange 2'}]
2
Advertisement
Answer
Let’s try stack
then filter by str.contains
:
JavaScript
1
7
1
df1 = data.stack()
2
df1 = (
3
df1[df1.str.contains('^orange', regex=True)]
4
.reset_index(drop=True)
5
.to_frame('category')
6
)
7
df1
:
JavaScript
1
12
12
1
category
2
0 orange 1345
3
1 orange 41134
4
2 orange 2222
5
3 orange 9087
6
4 orange 1
7
5 orange 2456
8
6 orange 22145
9
7 orange 2341
10
8 orange 0021
11
9 orange 2
12
Or melt
for same order as OP:
JavaScript
1
7
1
df1 = data.melt()['value']
2
df1 = (
3
df1[df1.str.contains('^orange', regex=True)]
4
.reset_index(drop=True)
5
.to_frame('category')
6
)
7
df1
:
JavaScript
1
12
12
1
category
2
0 orange 1345
3
1 orange 2456
4
2 orange 41134
5
3 orange 22145
6
4 orange 2222
7
5 orange 2341
8
6 orange 9087
9
7 orange 0021
10
8 orange 1
11
9 orange 2
12
regex ^orange
:
^
asserts position at start of a lineorange
matches the charactersorange
literally (case sensitive)