Skip to content
Advertisement

convert website table to pandas df (beautifulsoup doesn’t recognize table)

I want to convert a website table to pandas df, but BeautifulSoup doesn’t recognize the table (snipped image below). Below is the code I tried with no luck.

enter image description here

from bs4 import BeautifulSoup
import requests
import pandas as pd

url = 'https://www.ndbc.noaa.gov/ship_obs.php'
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)

soup = BeautifulSoup(response.content, 'html.parser')
tables = soup.find_all('table', rules = 'all')
#tables =soup.find_all("table",{"style":"color:#333399;"}) #instead of above line to specify table with no luck!
df = pd.read_html(table, skiprows=2, flavor='bs4')
df.head()

I also tried the code below with no luck

df = pd.read_html('https://www.ndbc.noaa.gov/ship_obs.php')
print(df)

Advertisement

Answer

Your table is not in the <table> tag but in multiple <span> tags.

You can parse these to a dataframe like so:

import pandas as pd
import requests
import bs4

url = f"https://www.ndbc.noaa.gov/ship_obs.php"
soup = bs4.BeautifulSoup(requests.get(url).text, 'html.parser').find('pre').find_all("span")
print(pd.DataFrame([r.getText().split() for r in soup]))

Output:

      0     1     2      3     4     5   ...    40    41    42    43    44    45
0    SHIP  HOUR   LAT    LON  WDIR  WSPD  ...    °T    ft   sec    °T   Acc   Ice
1    SHIP    19  46.5  -72.3   260   5.1  ...  None  None  None  None  None  None
2    SHIP    19  46.8  -71.2   110   2.9  ...  None  None  None  None  None  None
3    SHIP    19  47.4  -61.8    40  18.1  ...  None  None  None  None  None  None
4    SHIP    19  47.7  -53.2    40   8.0  ...  None  None  None  None  None  None
..    ...   ...   ...    ...   ...   ...  ...   ...   ...   ...   ...   ...   ...
170  SHIP    19  17.6  -62.4   100  20.0  ...  None  None  None  None  None  None
171  SHIP    19  25.8  -78.0    40  24.1  ...  None  None  None  None  None  None
172  SHIP    19   1.5  104.8    20  22.0  ...  None  None  None  None  None  None
173  SHIP    19  57.9    1.2   180     -  ...  None  None  None  None  None  None
174  SHIP    19  35.1  -10.0   310  24.1  ...  None  None  None  None  None  None

[175 rows x 46 columns]
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement