I have a dataset that looks like this:
my_dict = {'my_var': {0: u'000000', 1: u'100000', 2: u'000000', 3: u'000000', 4: u'000000', 5: u'010000', 6: u'000100', 7: u'110100', 8: u'110101', 9: u'000000', 10: u'010000', 11: u'000100', 12: u'000100', 13: u'000100', 14: u'011101', 15: u'000000', 16: u'111110', 17: u'001010', 18: u'011111', 19: u'111111'}, 'id': {0: u'115', 1: u'143', 2: u'155', 3: u'129', 4: u'003', 5: u'100', 6: u'102', 7: u'004', 8: u'889', 9: u'349', 10: u'772', 11: u'759', 12: u'346', 13: u'340', 14: u'541', 15: u'924', 16: u'234', 17: u'661', 18: u'965', 19: u'779'}}
I’d like to count the occurrence of ones for each case, such that the final output is number of 1’s per id. In the final output, id 115 has zero 1’s while id 143 has one.
I had an idea of splitting the string into six columns, and then summing that up, but that sounded very inefficient, besides that I failed to implement it.
Help is always appreciated.
Advertisement
Answer
Using DataFrame.assign
, Series.str.count
and groupby.sum
:
df.assign(ones=df['my_var'].str.count('1')).groupby('id', sort=False).sum().reset_index() id ones 0 115 0 1 143 1 2 155 0 3 129 0 4 003 0 5 100 1 6 102 1 7 004 3 8 889 4 9 349 0 10 772 1 11 759 1 12 346 1 13 340 1 14 541 4 15 924 0 16 234 5 17 661 2 18 965 5 19 779 6
I assumed that my_var
column is string
type, if not, replace:
df['my_var'].str.count('1')
With
df['my_var'].astype(str).str.count('1')