Skip to content
Advertisement

Splitting a dataframe with many labels

I’m trying to split my data by different labels, like this:

dfa = df_a[((df_a['label'] == 0) | (df_a['label'] == 15) | (df_a['label'] == 16))]

And this works fine for small amounts of numbers. However, I want to do this for many values. for example:

to_train = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,17, 18, 19, 20) # this can change
dfb = [i for i in to_train if df_b['label']==i] # ValueError

This spits outs an error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I’ve read the other questions with this error, but I am already using bitwise operators, they don’t address many conditions from what I understand.

How do I split the dataframe based on what’s in the tuple/list/etc?

Advertisement

Answer

to_train = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,17, 18, 19, 20)
dfb = dfa[df_a['label'].isin(to_train)]
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement