Skip to content
Advertisement

Read/Write NetworkX Graph Object

I am trying to deal with a super-massive NetworkX Graph object with hundreds of millions of nodes. I’d like to be able to write it to file as to not consume all my computer memory. However, I need to constantly be searching across existing nodes, updating edges, etc.

Is there a good solution for this? I’m not sure how it would work with any of the file formats provided on http://networkx.lanl.gov/reference/readwrite.html

The only solution i can think of is to store each node as a separate file with references to other nodes in the filesystem – that way, opening one node for examination doesn’t overload the memory. Is there an existing filesystem for large amounts of data (e.g. PyTables) to do this without writing my own boilerplate code?

Advertisement

Answer

If you’ve built this as a NetworkX graph, then it will already be in memory. For this large of a graph, my guess is you’ll have to do something similar to what you suggested with separate files. But, instead of using separate files, I’d use a database to store each node with many-to-many connections between nodes. In other words you’d have a table of nodes, and a table of edges, then to query for the neighbors of a particular node you could just query for any edges that have that particular node on either end. This should be fast, though I’m not sure if you’ll be able to take advantage of NetworkX’s analysis functions without first building the whole network in memory.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement