Skip to content
Advertisement

How to find all columns contains string and put in a new columns?

I was wondering how could I find all values that start with ‘orange’ from all the columns and parse it into new columns.

data = pd.DataFrame({'a':["mango 2","mango 3",'apple 3', 'orange 1345','orange 2456','banana 1', "watermelon 2","mango 2","mango 3"],
                    'b':["mango 2","mango 3",'apple 3','banana 1', "watermelon 2", 'orange 41134','orange 22145',"mango 2","mango 3"],
                    'c':['apple 3',"mango 2","mango 3" ,"mango 2","mango 3",'banana 1', "watermelon 2",'orange 2222','orange 2341'],
                    'd':["mango 2","mango 3","mango 2","mango 3",'apple 3','banana 1', "watermelon 2","mango 2","mango 3",'orange 9087','orange 0021'],
                    'e':['apple 3', 'orange 1','orange 2','banana 1', "watermelon 2"]})

expected output :

df1 = pd.DataFrame({'category':['orange 1345','orange 2456','orange 41134','orange 22145','orange 2222','orange 2341','orange 9087','orange 0021','orange 1','orange 2'}]

Advertisement

Answer

Let’s try stack then filter by str.contains:

df1 = data.stack()
df1 = (
    df1[df1.str.contains('^orange', regex=True)]
        .reset_index(drop=True)
        .to_frame('category')
)

df1:

       category
0   orange 1345
1  orange 41134
2   orange 2222
3   orange 9087
4      orange 1
5   orange 2456
6  orange 22145
7   orange 2341
8   orange 0021
9      orange 2

Or melt for same order as OP:

df1 = data.melt()['value']
df1 = (
    df1[df1.str.contains('^orange', regex=True)]
        .reset_index(drop=True)
        .to_frame('category')
)

df1:

       category
0   orange 1345
1   orange 2456
2  orange 41134
3  orange 22145
4   orange 2222
5   orange 2341
6   orange 9087
7   orange 0021
8      orange 1
9      orange 2

regex ^orange:

  1. ^ asserts position at start of a line
  2. orange matches the characters orange literally (case sensitive)
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement