I’m new to Python and programming in general and I am having trouble with a website parsing project.
This is the code I managed to write:
JavaScript
x
49
49
1
import requests
2
from bs4 import BeautifulSoup
3
import pandas as pd
4
pd.set_option('display.max_rows', None)
5
pd.set_option('display.max_columns', None)
6
pd.set_option('display.width', None)
7
pd.set_option('display.max_colwidth', -1)
8
import json
9
10
#necessary lists
11
url_list = [
12
"https://warframe.market/items/melee_riven_mod_(veiled)",
13
"https://warframe.market/items/zaw_riven_mod_(veiled)"
14
]
15
item_list = []
16
items_name = []
17
combined_data = []
18
iteration = 0
19
20
21
#looping for every url found in url_list
22
for url in url_list:
23
#requesting data
24
r = requests.get(url)
25
soup = BeautifulSoup(r.content, "html.parser")
26
27
#splitting the last part of the url which has the name of the item that I want to insert in the dataframe
28
name = url.split("/")[4]
29
items_name.append(name)
30
31
#Finding in the parsed HTML code where the JSON file starts ( it start from <script> n°2)
32
results = soup.find_all('script')[2].text.strip()
33
data = json.loads(results)
34
combined_data.append(data) #combining all the data into one list
35
36
37
#filtering only the users who sell the items and are either "ingame" or "online"
38
for payload in combined_data[iteration]["payload"]["orders"]:
39
if payload["order_type"] == "sell" and (payload["user"]["status"] == "online" or payload["user"]["status"] == "ingame"):
40
p = payload
41
item_list.append(p)
42
#adding the items names to the item list ???? PROBLEM ?????
43
item_list = [dict(item, **{'name':items_name[iteration]}) for item in item_list]
44
#trying to change the list from where the data gets taken from and the items name ????? PROBLEM ????
45
iteration += 1
46
47
#creating a dataframe with all the values
48
df = pd.DataFrame(item_list).sort_values(by=["platinum"])
49
What I’m trying to do and can’t find a solution to, is to add to item_list the name of the item which the url refers to.
e.g.
index | platinum | quantity | … | items name (problematic column) |
---|---|---|---|---|
1 | 10 | 1 | … | melee_riven_mod_(veiled) |
2 | 11 | 1 | … | melee_riven_mod_(veiled) |
3 | 12 | 2 | … | zaw_riven_mod_(veiled) |
4 | … | … | … | zaw_riven_mod_(veiled) |
But items name column has the same name for all the rows like this:
index | platinum | quantity | … | items name (problematic column) |
---|---|---|---|---|
1 | 10 | 1 | … | melee_riven_mod_(veiled) |
2 | 11 | 1 | … | melee_riven_mod_(veiled) |
3 | 12 | 2 | … | melee_riven_mod_(veiled) |
4 | … | … | … | melee_riven_mod_(veiled) |
So I wanted to ask what am I doing wrong in the for loop? It iterates 2 times which is the amount of urls in the url_list
but it doesn’t change the name of the item.
What am I not seeing?
Advertisement
Answer
Change
JavaScript
1
6
1
if payload["order_type"] == "sell" and (payload["user"]["status"] == "online" or payload["user"]["status"] == "ingame"):
2
p = payload
3
item_list.append(p)
4
#adding the items names to the item list ???? PROBLEM ?????
5
item_list = [dict(item, **{'name':items_name[iteration]}) for item in item_list]
6
To this:
JavaScript
1
4
1
if payload["order_type"] == "sell" and (payload["user"]["status"] == "online" or payload["user"]["status"] == "ingame"):
2
payload['name'] = items_name[iteration]
3
item_list.append(payload)
4
Note, that instead of having a separate variable iteration
and incrementing it, you can loop over url_list
using enumerate
, which provides both the item and its index at each iteration:
JavaScript
1
3
1
for iteration, url in enumerate(url_list):
2
.
3