I’m trying to run a simple Beam pipeline to extract data from a BQ table using SQL and push to a GCS bucket. My requirement is to pass the SQL from a file (a simple .sql file) and not as a string. I want to modularize the SQL. So far, I’ve tried the following option – it did not work:
Tag: dataflow
Big Query how to change mode of columns?
I have a Dataflow pipeline that fetches data from Pub/Sub and prepares them for insertion into Big Query and them writes them into the Database. It works fine, it can generate the schema automatically and it is able to recognise what datatype to use and everything. However the data we are using with it can vary vastly in format. Ex:
Dataflow Bigquery-Bigquery pipeline executes on smaller data, but not the large production dataset
A little bit of a newbie to Dataflow here, but have succesfully created a pipleine that works well. The pipleine reads in a query from BigQuery, applies a ParDo (NLP fucntion) and then writes the data to a new BigQuery table. The dataset I am trying to process is roughly 500GB with 46M records. When I try this with a