Skip to content
Advertisement

How to use multistep mrjob with json file

I’m trying to use hadoop to get some statistics from a json file like average number of stars for a category or language with most reviews. To do this I am using mrjob, I found this code:

JavaScript

It allows to find the most used word, but I am not sure how to do this with json attributes instead of words.

A sample of the json:

{“review_id”: “en_0690095”, “product_id”: “product_en_0440378”, “reviewer_id”: “reviewer_en_0133349”, “stars”: “1”, “review_body”: “the cabinet dot were all detached from backing… got me”, “review_title”: “Not use able”, “language”: “en”, “product_category”: “home_improvement”}

{“review_id”: “en_0311558”, “product_id”: “product_en_0399702”, “reviewer_id”: “reviewer_en_0152034”, “stars”: “1”, “review_body”: “I received my first order of this product and it was broke so I ordered it again. The second one was broke in more places than the first. I can’t blame the shipping process as it’s shrink wrapped and boxed.”, “review_title”: “The product is junk.”, “language”: “en”, “product_category”: “home”}

Advertisement

Answer

For me was useful just to use json.loads, like:

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement