Skip to content
Advertisement

Counting the number of 1’s that appear in a pandas string

I have a dataset that looks like this:

my_dict = {'my_var': {0: u'000000',
  1: u'100000',
  2: u'000000',
  3: u'000000',
  4: u'000000',
  5: u'010000',
  6: u'000100',
  7: u'110100',
  8: u'110101',
  9: u'000000',
  10: u'010000',
  11: u'000100',
  12: u'000100',
  13: u'000100',
  14: u'011101',
  15: u'000000',
  16: u'111110',
  17: u'001010',
  18: u'011111',
  19: u'111111'},
 'id': {0: u'115',
  1: u'143',
  2: u'155',
  3: u'129',
  4: u'003',
  5: u'100',
  6: u'102',
  7: u'004',
  8: u'889',
  9: u'349',
  10: u'772',
  11: u'759',
  12: u'346',
  13: u'340',
  14: u'541',
  15: u'924',
  16: u'234',
  17: u'661',
  18: u'965',
  19: u'779'}}

I’d like to count the occurrence of ones for each case, such that the final output is number of 1’s per id. In the final output, id 115 has zero 1’s while id 143 has one.

I had an idea of splitting the string into six columns, and then summing that up, but that sounded very inefficient, besides that I failed to implement it.

Help is always appreciated.

Advertisement

Answer

Using DataFrame.assign, Series.str.count and groupby.sum:

df.assign(ones=df['my_var'].str.count('1')).groupby('id', sort=False).sum().reset_index()

     id  ones
0   115     0
1   143     1
2   155     0
3   129     0
4   003     0
5   100     1
6   102     1
7   004     3
8   889     4
9   349     0
10  772     1
11  759     1
12  346     1
13  340     1
14  541     4
15  924     0
16  234     5
17  661     2
18  965     5
19  779     6

I assumed that my_var column is string type, if not, replace:

df['my_var'].str.count('1')

With

df['my_var'].astype(str).str.count('1')
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement