Currently I have a spark job that reads the file, creates a dataframe, does some transformations and then move those records in “year/month/date” format. I am achieving this by:
JavaScript
x
6
1
df.write.option("delimiter", "t").option("header", False).mode(
2
"append"
3
).partitionBy("year", "month", "day").option("compression", "gzip").csv(
4
config["destination"]
5
)
6
I want to achieve the same by pythonic way. So, in the end it should look like:
JavaScript
1
3
1
data/2022/04/14
2
data/2022/04/15
3
Advertisement
Answer
Based on your question , instead of using partitionBy
you can also modify your config['destination']
, as s3 will take care of the necessary folder creations underneath the s3 path
JavaScript
1
15
15
1
s3_dump_path = config["destination"] ### 's3:/test-path/'
2
>>> curr_date = datetime.now().date()
3
>>> year,month,day = curr_date.strftime('%Y'),curr_date.strftime('%m'),curr_date.strftime('%d')
4
>>> s3_new_path = '/'.join([s3_dump_path,year,month,day])
5
>>> s3_new_path
6
's3:/test-path//2022/04/14'
7
>>> config["destination"] = s3_new_path
8
9
df.write.option("delimiter", "t").option("header", False).mode(
10
"append"
11
).option("compression", "gzip").csv(
12
config["destination"]
13
)
14
15