Skip to content
Advertisement

In Pandas, how to group by column name and condition met, while joining the cells that met the condition in a single cell

I am having a hard time knowing how to even formulate this question, but this is what I am trying to accomplish:

I have a pandas datatable with thousands of rows that look like this:

df = pd.read_excel("data.xlsx")
id text value1 value2
1 These are the True False
2 Values of “value1” True False
3 While these others False True
4 are the Values of “value2” False True

How can I group by column name all the cells that met a condition while joining the cells that met the condition in a single cell to get a table that looks like this?

values merge_text
value1 These are the Values of “value1”
value2 While these others are the Values of “value2”

I was thinking that to solve this, first I need to split the table into multiple tables containing the values that met the condition of a single column and then merge all the tables together.

v1 = df[['id', 'text', 'value1']]
v1 = v1[v1["value1"]==True]
id text value1
1 These are the True
2 Values of “value1” True
v2 = df[['id', 'text', 'value2']]
v2 = v2[v2["value2"]==True]
id text value2
3 While these others True
4 are the Values of “value2” True

What I do not know, and have failed to find the answer online, is how to merge the cells like this:

values merge_text
value1 These are the Values of “value1”

Advertisement

Answer

You could set_index with “id” and “text”; then stack df. Then (i) filter the Series by itself; (ii) groupby “value” and join “text”:

s = df.set_index(['id','text']).stack()
out = s[s].reset_index(level=1).groupby(level=1)['text'].apply(' '.join).reset_index()

Output:

    index                                           text
0  value1               These are the Values of "value1"
1  value2  While these others are the Values of "value2"
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement