How do I plot the frequency of an event overtime with pandas?I

Question

I was trying to plot some data from a pandas dataframe. My table contains 10000ish films and for each of them two info: the year it got published, and a rating from 0 to 3. I am having a hard time trying to plot a graph with the pandas library that shows the number of films that received a particular

Accepted Answer

You can filter by rating and use Series.value_counts:s = data.loc[data['rating'].eq(3), 'year'].value_counts()But there is many years of films:print (len(s))108So for plot I filter only counts greatwer like 30, it is 40 years here:print (s.gt(30).sum())40So filter again and plot:s[s.gt(30)].plot.bar()EDIT: Solution with percentages:s=data.loc[data['rating'].eq(3),'year'].value_counts(normalize=True).sort_index().mul(100)print (s)  1899    0.0182181910    0.0182181916    0.0546551917    0.0546551918    0.054655  2018    3.1699762019    3.1881952020    2.0404452021    1.8400442022    0.765167Name: year, Length: 108, dtype: float64print (s[s.gt(3)])  2007    3.0424492009    3.5889962010    3.8258332011    4.2995082012    4.1537622013    4.9371472014    4.3359452015    3.7711792016    3.7529602017    3.3885952018    3.1699762019    3.188195Name: year, dtype: float64s[s.gt(3)].plot.bar()EDIT1: Here is solution for count years vs ratings:df = pd.crosstab(data['year'], data.rating)print (df)rating   0   1   2    3year                   1874     1   0   0    01877     1   0   0    01878     2   0   0    01881     1   0   0    01883     1   0   0    0    ..  ..  ..  ...2018    19  44  24  1742019    16  47  18  1752020    10  17  11  1122021    11  22  13  1012022     3  14   5   42[141 rows x 4 columns]EDIT2:df = pd.crosstab(data['year'], data.rating, normalize='index').mul(100)print (df)rating           0          1         2          3year                                              1874    100.000000   0.000000  0.000000   0.0000001877    100.000000   0.000000  0.000000   0.0000001878    100.000000   0.000000  0.000000   0.0000001881    100.000000   0.000000  0.000000   0.0000001883    100.000000   0.000000  0.000000   0.000000           ...        ...       ...        ...2018      7.279693  16.858238  9.195402  66.6666672019      6.250000  18.359375  7.031250  68.3593752020      6.666667  11.333333  7.333333  74.6666672021      7.482993  14.965986  8.843537  68.7074832022      4.687500  21.875000  7.812500  65.625000[141 rows x 4 columns]There is alot of values, here is e.g. filter for column 3 for more like 60% values:print (df[3].gt(60).sum())26df[df[3].gt(60)].plot.bar()

Advertisement

Answer