Skip to content
Advertisement

Extract first available day for each month from a csv

Stock quotes obtained from yfinance don’t start the 1st of each month, so I need to extract the the Date and Close for the first available qupte for each month in a csv.

Date,Open,High,Low,Close,Adj Close,Volume
2021-01-04 00:00:00-05:00,88.0,88.12449645996094,85.35700225830078,86.30650329589844,86.30650329589844,37324000
2021-01-05 00:00:00-05:00,86.25450134277344,87.34149932861328,85.84500122070312,87.00250244140625,87.00250244140625,20360000

[...]

2021-02-01 00:00:00-05:00,92.22949981689453,95.7770004272461,92.22949981689453,94.65350341796875,94.65350341796875,40252000
2021-02-02 00:00:00-05:00,95.65650177001953,97.46849822998047,95.31849670410156,95.95600128173828,95.95600128173828,66332000
2021-02-03 00:00:00-05:00,103.28050231933594,105.33100128173828,100.67749786376953,102.94400024414062,102.94400024414062,97882000
2021-02-04 00:00:00-05:00,103.03099822998047,103.46499633789062,101.75499725341797,102.68150329589844,102.68150329589844,48596000

[...]

2021-04-29 00:00:00-04:00,119.46150207519531,120.20800018310547,118.69249725341797,119.63800048828125,119.63800048828125,41234000
2021-04-30 00:00:00-04:00,118.4010009765625,119.09249877929688,117.3280029296875,117.67500305175781,117.67500305175781,44856000
2021-05-03 00:00:00-04:00,118.24549865722656,119.07749938964844,116.7750015258789,117.15399932861328,117.15399932861328,28242000
2021-05-04 00:00:00-04:00,115.67749786376953,116.24949645996094,112.83399963378906,115.34149932861328,115.34149932861328,44818000

So far I managed to only get the days starting with 0, but this is still not what I want:

import csv

with open("quotes.csv", "r") as file:
    data = csv.reader(file)

    for line in data:
        if len(line[0]) > 4 and int(line[0][8]) < 1:
            print([line[0], line[4]])

The following is an example of what I want to obtain from the example csv above (I skipped some months only for convenience, I still need one day for each month):

2021-01-04 00:00:00-05:00, 86.30650329589844
2021-02-01 00:00:00-05:00, 94.65350341796875
2021-05-03 00:00:00-04:00, 117.15399932861328

Advertisement

Answer

I would recommend you convert the first value into a Python datetime object. You can then easily test the date as needed. For example, the following approach detects whenever the month changes:

from datetime import datetime
import csv

with open("quotes.csv", "r") as file:
    data = csv.reader(file)
    header = next(data)  # skip the header row
    cur_month = None

    for line in data:
        try:
            date = datetime.strptime(line[0][:19], "%Y-%m-%d %H:%M:%S")
            
            if date.month != cur_month:
                cur_month = date.month
                print(line[0], line[4])
        except:
            pass  # skip over badly formatted dates or empty rows

For the example given it would give:

2021-01-04 00:00:00-05:00 86.30650329589844
2021-02-01 00:00:00-05:00 94.65350341796875
2021-04-29 00:00:00-04:00 119.63800048828125
2021-05-03 00:00:00-04:00 117.15399932861328

(as there are many missing rows)

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement