I am using Pandas as a database substitute as I have multiple databases (Oracle, SQLÂ Server, etc.), and I am unable to make a sequence of commands to a SQL equivalent.
I have a table loaded in a DataFrame with some columns:
YEARMONTH, CLIENTCODE, SIZE, etc., etc.
In SQL, to count the amount of different clients per year would be:
SELECT count(distinct CLIENTCODE) FROM table GROUP BY YEARMONTH;
And the result would be
201301 5000 201302 13245
How can I do that in Pandas?
Advertisement
Answer
I believe this is what you want:
table.groupby('YEARMONTH').CLIENTCODE.nunique()
Example:
In [2]: table Out[2]: CLIENTCODE YEARMONTH 0 1 201301 1 1 201301 2 2 201301 3 1 201302 4 2 201302 5 2 201302 6 3 201302 In [3]: table.groupby('YEARMONTH').CLIENTCODE.nunique() Out[3]: YEARMONTH 201301 2 201302 3