I would like to know how many 0.5/1/1.5/2/2.5/3/3.5/4/4.5/5 ratings that rated by every user in a data frame of a certain movie which is Ocean’s Eleven (2001) in order to calculate Pearson Correlation using the formula.
Below is the code
import numpy as np import pandas as pd ratings_data = pd.read_csv("D:\ratings.csv") movies_name = pd.read_csv("D:\movies.csv") movies_data = pd.merge(ratings_data, movies_name, on='movieId') movies_data.groupby('title')['rating'].mean() movies_data.groupby('title')['rating'].count() average_ratings_count['rating_counts']=pd.DataFrame(movies_data.groupby('title')['rating'].count()) https://i.stack.imgur.com/1eFLV.png matrix_user_ratings = movies_data.pivot_table(index='userId', columns='title', values='rating') oceanRatings = matrix_user_ratings["Ocean's Eleven (2001)"] oceanRatings.head(20) userId 1 NaN 2 NaN 3 NaN 4 NaN 5 NaN 6 NaN 7 4.0 8 NaN 9 NaN 10 NaN 11 NaN 12 NaN 13 NaN 14 NaN 15 NaN 16 NaN 17 NaN 18 4.0 19 NaN 20 NaN Name: Ocean's Eleven (2001), dtype: float64
In this case, I just can know there are two 4.0 ratings, but I have around 600+ users. Because I am using movieLens dataset.
Advertisement
Answer
You can use groupby
:
oceanRatings = matrix_user_ratings["Ocean's Eleven (2001)"].groupby('rating').count()
Or value_counts()
:
oceanRatings = matrix_user_ratings["Ocean's Eleven (2001)"].value_counts()