Python Multithreaded S3 Bucket-to-Bucket Copy (on Amazon Web Services)

For both backup and staging purposes, we regularly need to backup an entire S3 bucket to another bucket. AWS has no built-in function to do this, nor does the boto Python library.

We started off with a simple for key in bucket.list() and copied the files one by one in sequence with key.copy(dest_bucket, key_name). This is imperfect for a few reasons:

  • There are many files, and the files are very large. Processing one by one takes a long time. Sometimes we need a copy asap.
  • AWS is designed to fail. Applications built on AWS should be developed to handle failures. With the sequential design, if any one of the key copy requests fails, for any reason, it will interrupt the rest of the process.

This seems like a perfect problem for threading, and I have been looking for an excuse to play with Python’s built-in threading features. This also seems like a perfect chance to try hosting an open source project on GitHub, also a first for me.

Issues:
– Does not set ACL. I assume this is set to bucket default.
– Timeout is clumsy, results in multiple 30 second delays. Instead, should log error/timeouts and retry x times.

Performance:
– 52GB / 7 minutes = 52,000MB / 420 Seconds = 123.8 MB/sec

With tweaks to timeout and error handling, this can be significantly improved. Curious to hear other people’s experiences too.

Try it out!

About these ads
This entry was posted in internets, work. Bookmark the permalink.

One Response to Python Multithreaded S3 Bucket-to-Bucket Copy (on Amazon Web Services)

  1. This is terrific, and something that I’ve been seeking. Are you considering extending this? In particular, I’d love to be able to tweak the behavior for existing keys. Right now, it’s “skip,” but would be good to also have “overwrite” and “replace if updated.” In any case – thank you for this !!!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s