Skip to content
Advertisement

Reading a CSV from a particular line

I am writing a program working on weather station’s data, and this is the CSV I get from my station: enter image description here

The issue is that pandas has troubles opening it. First, I had an error message that I managed to bypass by writing:

test = pd.read_csv("PAYA(3).csv",error_bad_lines=False,skiprows = [0,7])

Now the other issue is that the pandas file only displays the first 4 lines: enter image description here

The CSV can be downloaded at: https://mesowest.utah.edu/cgi-bin/droman/download_api2_handler.cgi?output=csv&product=&stn=PAYA&unit=0&daycalendar=1&hours=1&day1=05&month1=01&year1=2020&time=LOCAL&hour1=0&var_0=air_temp&var_8=precip_accum_one_hour.

How could I properly read the file ? What I would like is to divide it in 4 columns to have date, hour, precipitation, temperature. My final goal is to automatically download the data for a given date-window, and stack the variables into a giant array.

Advertisement

Answer

When the skiprows argument to pandas.read_csv is passed a list, according to the docs you are asking it to skip exactly the rows in that list, not a range of rows.

If you want to skip the first 8 rows as it appears just pass skiprows=8.

Update: I found the following worked best for this dataset:

>>> pd.read_csv(url, header=6, skiprows=[7])

This uses row 6 for the column names, skipping the row 7 which is giving some units. Using header=6 implicitly skips to row 7 as the start of the data.

Result:

    Station_ID              Date_Time  air_temp_set_1  precip_accum_one_hour_set_1
0         PAYA  01/03/2020 22:00 AKST           24.80                          NaN
1         PAYA  01/03/2020 22:05 AKST           24.80                          NaN
2         PAYA  01/03/2020 22:10 AKST           24.80                          NaN
3         PAYA  01/03/2020 22:15 AKST           24.80                          NaN
4         PAYA  01/03/2020 22:20 AKST           24.80                          NaN
..         ...                    ...             ...                          ...
287       PAYA  01/04/2020 21:45 AKST            8.60                          NaN
288       PAYA  01/04/2020 21:50 AKST            8.60                          NaN
289       PAYA  01/04/2020 21:53 AKST           10.04                          NaN
290       PAYA  01/04/2020 21:55 AKST            8.60                          NaN
291       PAYA  01/04/2020 22:00 AKST            8.60                          NaN
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement