Pandas ‘count(distinct)’ equivalent

Question

I am using Pandas as a database substitute as I have multiple databases (Oracle, SQL Server, etc.), and I am unable to make a sequence of commands to a SQL equivalent. I have a table loaded in a DataFrame with some columns: In SQL, to count the amount of different clients per year would be: And the result wou…

Accepted Answer

I believe this is what you want:table.groupby('YEARMONTH').CLIENTCODE.nunique()Example:In [2]: tableOut[2]:    CLIENTCODE  YEARMONTH0           1     2013011           1     2013012           2     2013013           1     2013024           2     2013025           2     2013026           3     201302In [3]: table.groupby('YEARMONTH').CLIENTCODE.nunique()Out[3]: YEARMONTH201301       2201302       3

Advertisement

Answer