I am writing a program working on weather station’s data, and this is the CSV I get from my station:
The issue is that pandas has troubles opening it. First, I had an error message that I managed to bypass by writing:
test = pd.read_csv("PAYA(3).csv",error_bad_lines=False,skiprows = [0,7])
Now the other issue is that the pandas file only displays the first 4 lines:
The CSV can be downloaded at: https://mesowest.utah.edu/cgi-bin/droman/download_api2_handler.cgi?output=csv&product=&stn=PAYA&unit=0&daycalendar=1&hours=1&day1=05&month1=01&year1=2020&time=LOCAL&hour1=0&var_0=air_temp&var_8=precip_accum_one_hour.
How could I properly read the file ? What I would like is to divide it in 4 columns to have date, hour, precipitation, temperature. My final goal is to automatically download the data for a given date-window, and stack the variables into a giant array.
Advertisement
Answer
When the skiprows
argument to pandas.read_csv
is passed a list, according to the docs you are asking it to skip exactly the rows in that list, not a range of rows.
If you want to skip the first 8 rows as it appears just pass skiprows=8
.
Update: I found the following worked best for this dataset:
>>> pd.read_csv(url, header=6, skiprows=[7])
This uses row 6
for the column names, skipping the row 7
which is giving some units. Using header=6
implicitly skips to row 7
as the start of the data.
Result:
Station_ID Date_Time air_temp_set_1 precip_accum_one_hour_set_1 0 PAYA 01/03/2020 22:00 AKST 24.80 NaN 1 PAYA 01/03/2020 22:05 AKST 24.80 NaN 2 PAYA 01/03/2020 22:10 AKST 24.80 NaN 3 PAYA 01/03/2020 22:15 AKST 24.80 NaN 4 PAYA 01/03/2020 22:20 AKST 24.80 NaN .. ... ... ... ... 287 PAYA 01/04/2020 21:45 AKST 8.60 NaN 288 PAYA 01/04/2020 21:50 AKST 8.60 NaN 289 PAYA 01/04/2020 21:53 AKST 10.04 NaN 290 PAYA 01/04/2020 21:55 AKST 8.60 NaN 291 PAYA 01/04/2020 22:00 AKST 8.60 NaN