I am currently working on a ETL Dataflow job (using the Apache Beam Python SDK) which queries data from CloudSQL (with psycopg2 and a custom ParDo) and writes it to BigQuery. My goal is to create a Dataflow template which I can start from a AppEngine using a Cron job. I have a version which works locally using the DirectRunner.
Tag: apache-beam
Dataflow BigQuery to BigQuery
I am trying to create a dataflow script that goes from BigQuery back to BigQuery. Our main table is massive and breaks the extraction capabilities. I’d like to create a simple table (as a result of a query) containing all the relevant information. The SQL query ‘Select * from table.orders where paid = false limit 10’ is a simple one