I would like to know how many 0.5/1/1.5/2/2.5/3/3.5/4/4.5/5 ratings that rated by every user in a data frame of a certain movie which is Ocean’s Eleven (2001) in order to calculate Pearson Correlation using the formula.
Below is the code
import numpy as np
import pandas as pd
ratings_data = pd.read_csv("D:\ratings.csv")
movies_name = pd.read_csv("D:\movies.csv")
movies_data = pd.merge(ratings_data, movies_name, on='movieId')
movies_data.groupby('title')['rating'].mean()
movies_data.groupby('title')['rating'].count()
average_ratings_count['rating_counts']=pd.DataFrame(movies_data.groupby('title')['rating'].count())
https://i.stack.imgur.com/1eFLV.png
matrix_user_ratings = movies_data.pivot_table(index='userId', columns='title', values='rating')
oceanRatings = matrix_user_ratings["Ocean's Eleven (2001)"]
oceanRatings.head(20)
userId
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 4.0
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
15 NaN
16 NaN
17 NaN
18 4.0
19 NaN
20 NaN
Name: Ocean's Eleven (2001), dtype: float64
In this case, I just can know there are two 4.0 ratings, but I have around 600+ users. Because I am using movieLens dataset.
Advertisement
Answer
You can use groupby:
oceanRatings = matrix_user_ratings["Ocean's Eleven (2001)"].groupby('rating').count()
Or value_counts():
oceanRatings = matrix_user_ratings["Ocean's Eleven (2001)"].value_counts()