Skip to content
Advertisement

How to Access Private Github Repo File (.csv) in Python using Pandas or Requests

I had to switch my public Github repository to private and cannot access files, not with access tokens that I was able to with the public Github repo.

I can access my private repo’s CSV with curl: ”’ curl -s https://{token}@raw.githubusercontent.com/username/repo/master/file.csv

”’

However, I want to access this information in my python file. When the repo was public I could simply use: ”’ url = ‘https://raw.githubusercontent.com/username/repo/master/file.csv‘ df = pd.read_csv(url, error_bad_lines=False)

”’

This no longer works now that the repo is private, and I cannot find a work around to download this CSV in python instead of pulling from terminal.

If I try: ”’ requests.get(https://{token}@raw.githubusercontent.com/username/repo/master/file.csv) ”’ I get a 404 response, which is basically the same thing that is happening with the pd.read_csv(). If I click on the raw file I see that a temporary token is created and the URL is: ”’ https://raw.githubusercontent.com/username/repo/master/file.csv?token=TEMPTOKEN ”’ Is there a way to attach my permanent private access token so that I can always pull this data from github?

Advertisement

Answer

This is what ended up working for me – leaving it here if anyone runs into the same issue. Thanks for the help!

    import json, requests, urllib, io

    user='my_github_username'
    pao='my_pao'

    github_session = requests.Session()
    github_session.auth = (user, pao)

    # providing raw url to download csv from github
    csv_url = 'https://raw.githubusercontent.com/user/repo/master/csv_name.csv'

    download = github_session.get(csv_url).content
    downloaded_csv = pandas.read_csv(io.StringIO(download.decode('utf-8')), error_bad_lines=False)
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement