Skip to content
Advertisement

How to read multiple JSON files from GCS bucket in google dataflow apache beam python

I’m having a bucket in GCS that contain list of JSON files. I came to extract the list of the file names using

JavaScript

Now I want to pass this list of filenames to apache beam to read them. I wrote this code, but it doesn’t seem a good pattern

JavaScript

Have you faced the same issue before?

Advertisement

Answer

In the end I came to use the google-cloud storage as reading API for this.

Listing all elements of the bucket

JavaScript

and I created this ParDo for reading the content

JavaScript

And mu pipeline looked like this:

JavaScript
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement