I’m writing a project which writes using the same data a ton of times, and I have been using ray to scale this up in a cluster setting, however the files are too large to send back and forth/save on the ray object store all the time. Is there a way to save the python objects on the local nodes between the calls of the remote functions?
Writing to files always tends to be tricky in distributed systems since regular file systems aren’t shared between machines. Ray generally doesn’t interfere with the file system, but I think you have a few options here.
Expand the object store size: You can change the plasma store size and where it’s stored to a larger file by setting the
Use a distributed file system: Distributed filesystems like NFS allow you to share part of your filesystem across machines. If you manually set up an NFS share, you can direct Ray to write to a file within NFS.
Don’t use a filesystem: While this is technically a non-answer, this is arguably the most typical approach to distributed systems. Instead of writing to your filesystem, consider writing to S3 or similar KV store or Blob Store.
Downsides of these approaches:
The biggest downside of (1) is that if you aren’t careful, you could badly affect your performance.
The biggest downside of (2) is that it can be slow. In particular, if you need to read and write data from multiple nodes. A secondary downside is that you will have to setup NFS yourself.
The biggest downside to (3) is that you’re now relying on an external service, and it arguably isn’t a direct solution to your problem.