I am trying to write a Pipeline which will Read Data From JDBC(oracle,mssql) , do something and write to bigquery. I am Struggling in the ReadFromJdbc steps where it was not able to convert it correct schema type. My Code: My data has three columns two of which are Varchar and one is timestamp. Error which i am facing while
Tag: google-cloud-dataflow
Libraries cannot be found on Dataflow/Apache-beam job launched from CircleCI
I am having serious issues running a python Apache Beam pipeline using a GCP Dataflow runner, launched from CircleCI. I would really appreciate if someone could give any hint on how to tackle this, I’…
Start CloudSQL Proxy on Python Dataflow / Apache Beam
I am currently working on a ETL Dataflow job (using the Apache Beam Python SDK) which queries data from CloudSQL (with psycopg2 and a custom ParDo) and writes it to BigQuery. My goal is to create a Dataflow template which I can start from a AppEngine using a Cron job. I have a version which works locally using the DirectRunner.