Skip to content
Advertisement

How to combine multiple files in GCS bucket with Cloud Function trigger

I have 3 files per date per name in this format: ‘nameXX_date’, here’s an example: ‘nameXX_01-01-20’ ‘nameXY_01-01-20’ ‘nameXZ_01-01-20’

where ‘name’ can be anything, and the date is whatever day the file was uploaded (almost every day).

I need to write a cloud function that triggers whenever a new file lands in the bucket, that combines the 3 XX,XY,XZ files into one file with filename = “name_date”.

Here’s what I’ve got so far:

bucket_id = 'bucketname'
client = gcs.Client()
bucket = client.get_bucket(bucket_id)

name = 
date =
outfile = f'bucketname/{name}_{date}.CSV'

blobs = []
for shard in ('XX', 'XY', 'XZ'):
    sfile = f'{name}{shard}_{date}'
    blob = bucket.blob(sfile)
    if not blob.exists():
        # this causes a retry in 60s
        raise ValueError(f'branch {sfile} not present')
    blobs.append(blob)
bucket.blob(outfile).compose(blobs)
logging.info(f'Successfullt created {outfile}')
for blob in blobs:
    blob.delete()
logging.info('Deleted {} blobs'.format(len(blobs)))

The issue I’m facing is that I’m not sure how to get the name and date of the new file that landed in the bucket, so that I can find the other 2 matching files and combine them

Btw, I’ve got this code from this article and I’m trying to implement it here: https://medium.com/google-cloud/how-to-write-to-a-single-shard-on-google-cloud-storage-efficiently-using-cloud-dataflow-and-cloud-3aeef1732325

Advertisement

Answer

As I understand, the cloud function is triggered by a google.storage.object.finalize event on an object in the specific GCS bucket.

In that case your cloud function “signature” looks like (taken from the “medium” article you mentioned):

def compose_shards(data, context):

The data is a dictionary with plenty of details about the object (file) has been finalized. See some details here: Google Cloud Storage Triggers

For example, the data["name"] – is the name of the object under discussion.

If you know the pattern/template/rule according to which those objects/shards are named, you can extract the relevant elements from an object/shard name, and use it to compose the target object/file name.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement