I have an AWS S3 bucket filled with data parameterized by date. I’d like to extract that data one date at a time using the AWS CLI (reference), specifically the aws s3 sync
command.
The following command does what I expect it to do:
aws s3 sync s3://my-bucket-1 . --exclude "*" --include "*2018-01-17*" --dryrun
Running this command from my command line generates a (dryrun) download
for every file in my bucket containing the substring 2018-01-17
.
Great! To simplify the necessary file operations, I’ve written a small CLI wrapper around this executor. This wrapper is in Python, and uses the subprocess.run
facility to do its work. The entire operation boils down to the following call:
subprocess.run(['aws', 's3', 'sync', 's3://my-bucket-1', '.', '--exclude', '"*"', '--include', '"*2018-01-17*"', '--dryrun'])
The problem is that when I run this statement, I get a (dryrun) download
back for every file in the bucket. That is, data is returned that corresponds with bucket entries from 01-18, 01-19, and so on. The --exclude
/--include
rules fail to apply, and the result is the same as if I had simply run aws s3 sync s3://my-bucket-1 .
Why does this occur?
Advertisement
Answer
When using the list form of invocation, you should not use those additional double quotes. Normally, when your command is given as a single string, quotes can be identify that the contents between the double quotes is all part of a single argument.
If you use double quotes like that inside of a list item, it’s understood that it should be parsed to pass the quote literally as an argument, so it is escaping your quote and passing it literally. Consequently, nothing matches your include
and exclude
parameters because the argument contains a literal "
.
So, the following should be the corrected arguments.
subprocess.run(['aws', 's3', 'sync', 's3://my-bucket-1', '.', '--exclude', '*', '--include', '*2018-01-17*', '--dryrun'])