Skip to content
Advertisement

How do I get the sub-reddit submission with the highest score with pushshift?

I’m a beginner and I’m sorry if this is completely wrong. So far, I’ve been able to present the fields required (author, subreddit, date created, number of comments, score, submission title, submission description) as well as save this into a dataframe. But I’m suddenly lost when the complicated questions begin such as this one and which day of the week has the most submissions. This is what I have right now for getting the submission with the highest score:

subreddit = pd.read_csv('subreddit.csv', delimiter = ',')
subreddit.count()

score = "score"
h_score = subreddit.score.max()
best_submission = subreddit.score(h_score) #it comes out as TypeError: 'Series' object is not callable here
bsubmission_title = title[best_submission]
print("Submission with the highest score:", bsubmission_title)

Advertisement

Answer

subreddit.score.max() returns the highest value in the score column. But you want to get the title that is on the same row as that score. In order to get that you do not need the score value, but the index of the row with the highest score value. You can get this with idxmax. You can then use the index to get the matching title:

h_score_index = subreddit.score.idxmax()
bsubmission_title = subreddit.title[h_score_index]
print("Submission with the highest score:", bsubmission_title)

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement