I’m trying to wrap a scraping project in a Docker container to run it on a droplet. The spider scraps a website and then writes the data to a postgres database. The postgres database is already running and managed by Digitalocean.
When I run the command locally to test, everything is fine:
docker compose up
I can visualize the spider writing on the database.
Then, I use github action to build and push my docker image on a registry each time I push the code with the script:
name: CI # 1 # Controls when the workflow will run. on: # Triggers the workflow on push events but only for the master branch push: branches: [ master ] # Allows you to run this workflow manually from the Actions tab workflow_dispatch: inputs: version: description: 'Image version' required: true #2 env: REGISTRY: "registry.digitalocean.com/*****-registery" IMAGE_NAME: "******-scraper" POSTGRES_USERNAME: ${{ secrets.POSTGRES_USERNAME }} POSTGRES_PASSWORD: ${{ secrets.POSTGRES_PASSWORD }} POSTGRES_HOSTNAME: ${{ secrets.POSTGRES_HOSTNAME }} POSTGRES_PORT: ${{ secrets.POSTGRES_PORT }} POSTGRES_DATABASE: ${{ secrets.POSTGRES_DATABASE }} SPLASH_URL: ${{ secrets.SPLASH_URL }} #3 jobs: build-compose: name: Build docker-compose runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Insall doctl uses: digitalocean/action-doctl@v2 with: token: ${{ secrets.DIGITALOCEAN_ACCESS_TOKEN }} - name: Login to DO Container Registry with short-lived creds run: doctl registry login --expiry-seconds 1200 - name: Remove all old images run: if [ ! -z "$(doctl registry repository list | grep "****-scraper")" ]; then doctl registry repository delete-manifest ****-scraper $(doctl registry repository list-tags ****-scraper | grep -o "sha.*") --force; else echo "No repository"; fi - name: Build compose run: docker compose -f docker-compose.yaml up -d - name: Push to Digital Ocean registery run: docker compose push deploy: name: Deploy from registery to droplet runs-on: ubuntu-latest needs: build-compose
Then I ssh root@ipv4
manually to my droplet in order to install docker
, docker compose
and run the image from the registry with:
# Login to registry docker login -u DO_TOKEN -p DO_TOKEN registry.digitalocean.com # Stop running container docker stop ****-scraper # Remove old container docker rm ****-scraper # Run a new container from a new image docker run -d --restart always --name ****-scraper registry.digitalocean.com/****-registery/****-scraper
As soon as the python script starts on the droplet I have the error:
psycopg2.OperationalError: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket “/var/run/postgresql/.s.PGSQL.5432”?
It seems like I’m doing something wrong and I can’t find how to fix this so far. I would appreciate some help explanations.
Thanks,
My Dockerfile:
# As Scrapy runs on Python, I run the official Python 3 Docker image. FROM python:3.9.7-slim # Set the working directory to /usr/src/app. WORKDIR /usr/src/app # Install libpq-dev for psycopg2 python package RUN apt-get update && apt-get -y install libpq-dev gcc # Copy the file from the local host to the filesystem of the container at the working directory. COPY requirements.txt ./ # Install Scrapy specified in requirements.txt. RUN pip3 install --no-cache-dir -r requirements.txt # Copy the project source code from the local host to the filesystem of the container at the working directory. COPY . . # For Slash EXPOSE 8050 # Run the crawler when the container launches. CMD [ "python3", "./****/launch_spiders.py" ]
My docker-compose.yaml
version: "3" services: splash: image: scrapinghub/splash restart: always command: --maxrss 2048 --max-timeout 3600 --disable-lua-sandbox --verbosity 1 ports: - "8050:8050" launch_spiders: restart: always build: . volumes: - .:/usr/src/app image: registry.digitalocean.com/****-registery/****-scraper depends_on: - splash
Advertisement
Answer
Problem solved!
The .env file with all my credentials was in the .dockerignore. It was then impossible to locate this .env when building the image.