I am using Pandas as a database substitute as I have multiple databases (Oracle, SQL Server, etc.), and I am unable to make a sequence of commands to a SQL equivalent.
I have a table loaded in a DataFrame with some columns:
JavaScript
x
2
1
YEARMONTH, CLIENTCODE, SIZE, etc., etc.
2
In SQL, to count the amount of different clients per year would be:
JavaScript
1
2
1
SELECT count(distinct CLIENTCODE) FROM table GROUP BY YEARMONTH;
2
And the result would be
JavaScript
1
3
1
201301 5000
2
201302 13245
3
How can I do that in Pandas?
Advertisement
Answer
I believe this is what you want:
JavaScript
1
2
1
table.groupby('YEARMONTH').CLIENTCODE.nunique()
2
Example:
JavaScript
1
17
17
1
In [2]: table
2
Out[2]:
3
CLIENTCODE YEARMONTH
4
0 1 201301
5
1 1 201301
6
2 2 201301
7
3 1 201302
8
4 2 201302
9
5 2 201302
10
6 3 201302
11
12
In [3]: table.groupby('YEARMONTH').CLIENTCODE.nunique()
13
Out[3]:
14
YEARMONTH
15
201301 2
16
201302 3
17