Counting the number of 1’s that appear in a pandas string

I have a dataset that looks like this:

my_dict = {'my_var': {0: u'000000',
  1: u'100000',
  2: u'000000',
  3: u'000000',
  4: u'000000',
  5: u'010000',
  6: u'000100',
  7: u'110100',
  8: u'110101',
  9: u'000000',
  10: u'010000',
  11: u'000100',
  12: u'000100',
  13: u'000100',
  14: u'011101',
  15: u'000000',
  16: u'111110',
  17: u'001010',
  18: u'011111',
  19: u'111111'},
 'id': {0: u'115',
  1: u'143',
  2: u'155',
  3: u'129',
  4: u'003',
  5: u'100',
  6: u'102',
  7: u'004',
  8: u'889',
  9: u'349',
  10: u'772',
  11: u'759',
  12: u'346',
  13: u'340',
  14: u'541',
  15: u'924',
  16: u'234',
  17: u'661',
  18: u'965',
  19: u'779'}}

JavaScript
​x
 
my_dict = {'my_var': {0: u'000000',
u'100000',
u'000000',
u'000000',
u'000000',
u'010000',
u'000100',
u'110100',
u'110101',
u'000000',
u'010000',
u'000100',
u'000100',
u'000100',
u'011101',
u'000000',
u'111110',
u'001010',
u'011111',
u'111111'},
 'id': {0: u'115',
u'143',
u'155',
u'129',
u'003',
u'100',
u'102',
u'004',
u'889',
u'349',
u'772',
u'759',
u'346',
u'340',
u'541',
u'924',
u'234',
u'661',
u'965',
u'779'}}
​

I’d like to count the occurrence of ones for each case, such that the final output is number of 1’s per id. In the final output, id 115 has zero 1’s while id 143 has one.

I had an idea of splitting the string into six columns, and then summing that up, but that sounded very inefficient, besides that I failed to implement it.

Help is always appreciated.

Answer

Using DataFrame.assign, Series.str.count and groupby.sum:

df.assign(ones=df['my_var'].str.count('1')).groupby('id', sort=False).sum().reset_index()

     id  ones
0   115     0
1   143     1
2   155     0
3   129     0
4   003     0
5   100     1
6   102     1
7   004     3
8   889     4
9   349     0
10  772     1
11  759     1
12  346     1
13  340     1
14  541     4
15  924     0
16  234     5
17  661     2
18  965     5
19  779     6

JavaScript
 
df.assign(ones=df['my_var'].str.count('1')).groupby('id', sort=False).sum().reset_index()
​
     id  ones
 115     0
 143     1
 155     0
 129     0
 003     0
 100     1
 102     1
 004     3
 889     4
 349     0
772     1
759     1
346     1
340     1
541     4
924     0
234     5
661     2
965     5
779     6
​

I assumed that my_var column is string type, if not, replace:

df['my_var'].str.count('1')

JavaScript
 
df['my_var'].str.count('1')
​

With

df['my_var'].astype(str).str.count('1')

JavaScript
 
df['my_var'].astype(str).str.count('1')
​

Advertisement

Answer