Skip to content
Advertisement

Big Query how to change mode of columns?

I have a Dataflow pipeline that fetches data from Pub/Sub and prepares them for insertion into Big Query and them writes them into the Database.

It works fine, it can generate the schema automatically and it is able to recognise what datatype to use and everything.

However the data we are using with it can vary vastly in format. Ex: we can get both A and B for a single column

JavaScript

If the first message we get gets added, then adding the second one will not work.

If i do it the other way around it does however.

i always get the following error:

JavaScript

Below is the code

JavaScript

Is there a way to change the restrictions of the table to make this work?

Advertisement

Answer

BigQuery isn’t a document database, but a columnar oriented database. In addition, you can’t update the schema of existing columns (only add or remove them).

For your use case, and because you can’t know/predict the most generic schema of each of your field, the safer is to store the raw JSON as a string, and then to use the JSON functions of BigQuery to post process, in SQL, your data

Advertisement