Tag: dataflow

Pass/Refer a SQL file in Apache Beam instead of string

apache-beam dataflow google-cloud-dataflow python sql

I’m trying to run a simple Beam pipeline to extract data from a BQ table using SQL and push to a GCS bucket. My requirement is to pass the SQL from a file (a simple .sql file) and not as a string. I want to modularize the SQL. So far, I’ve tried the following option – it did not work:

Big Query how to change mode of columns?

dataflow google-bigquery google-cloud-platform google-cloud-pubsub python

I have a Dataflow pipeline that fetches data from Pub/Sub and prepares them for insertion into Big Query and them writes them into the Database. It works fine, it can generate the schema automatically and it is able to recognise what datatype to use and everything. However the data we are using with it can va…

Dataflow Bigquery-Bigquery pipeline executes on smaller data, but not the large production dataset

apache-beam dataflow google-bigquery google-cloud-dataflow python

A little bit of a newbie to Dataflow here, but have succesfully created a pipleine that works well. The pipleine reads in a query from BigQuery, applies a ParDo (NLP fucntion) and then writes the data to a new BigQuery table. The dataset I am trying to process is roughly 500GB with 46M records. When I try thi…