Skip to content
Advertisement

Tag: apache-beam

Libraries cannot be found on Dataflow/Apache-beam job launched from CircleCI

I am having serious issues running a python Apache Beam pipeline using a GCP Dataflow runner, launched from CircleCI. I would really appreciate if someone could give any hint on how to tackle this, I’ve tried it all but nothing seems to work. Basically, I’m running this python Apache Beam pipeline which runs in Dataflow and uses google-api-python-client-1.12.3. If I

Read whole file in Apache Beam

Is it possible to read whole file (not line by line) in Apache Beam? For example, I want to read multiline JSONs, and my idea is to read file by file, extract data from each file and create PCollection from lists. Is it good idea or it’s better to preprocess source JSONs to one JSON file where each line is

Advertisement