Skip to content
Advertisement

How can I extract contents from a file stored in gitlab repos

Using the gitlab-python package, I’d like to extract lines from all Dockerfiles. Using my code below, I am able to get project names and url to the repo I want but how can I ensure there is a Dockerfile and read the contents of the Dockerfile.

import gitlab
import json
from pprint import pprint
import requests
import urllib.request


# private token authentication
gl = gitlab.Gitlab('<path_to_gitlab_repo>', private_token=<token_here>)

gl.auth()

# list all projects
projects = gl.projects.list()
for project in projects:
    # print(project) # prints all the meta data for the project
    print("Project: ", project.name)
    print("Gitlab URL: ", project.http_url_to_repo)
    # print("Branches: ", project.repo_branches)
    pprint(project.repository_tree(all=True))
    f = urllib.request.urlopen(project.http_url_to_repo)
    myfile = f.read()
    print(myfile)
    print("nn")

The output I get now is :

Gitlab URL:  <path_to_gitlab_repo>
[{'id': '0c4a64925f5c129d33557',
  'mode': '1044',
  'name': 'README.md',
  'path': 'README.md',
  'type': 'blob'}]

Advertisement

Answer

You can use the project.files.get() method (see documentation) to get the Dockerfile of the project.

You can then print the content of the Dockerfile/do whatever you want to do with it like this:

import gitlab
import base64


# private token authentication
gl = gitlab.Gitlab(<gitlab-url>, private_token=<private-token>)

gl.auth()

# list all projects
projects = gl.projects.list(all=True)
for project in projects:
    # print(project) # prints all the meta data for the project
    # print("Project: ", project.name)
    # print("Gitlab URL: ", project.http_url_to_repo)

    # Skip projects without branches
    if len(project.branches.list()) == 0:
        continue

    branch = project.branches.list()[0].name

    try:
        f = project.files.get(file_path='Dockerfile', ref=branch)
    except gitlab.exceptions.GitlabGetError:
        # Skip projects without Dockerfile
        continue

    file_content = base64.b64decode(f.content).decode("utf-8")
    print(file_content.replace('\n', 'n'))

You might have to adjust the branch name in case there are multiple branches.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement