I have a file that has a thousand line of codes and I’d like to break it into several files. However, I found those functions depends on each other so I have no idea how to decouple those… Here is a simplified example:
import numpy as np def tensor(data): return Tensor(data) class Tensor: def __init__(self,data): self.data=data def __repr__(self): return f'Tensor({str(self.data)})' def mean(self): return mean(self.data) def mean(data): value=np.mean(data) return tensor(value)
What is the best way to separate tensor
, Tensor
, and mean
(put them into 3 different files)? Thanks for your help!!
Advertisement
Answer
Having a module that is thousands of lines long isn’t that bad. You may not actually need to break it up into different modules. It is common to have a module that has a function alongside a class like your tensor
and Tensor
in the same module, and there is no reason for mean
to be split up into a separate function as that code can just be placed directly in Tensor.mean
.
A module should have a specific purpose and be a self contained unit around that purpose. If you are splitting things up just to have smaller files, then that is only going to make your codebase worse. However, large modules are a sign that things may need to be refactored. If you can find good ways of refactoring ideas in the code into smaller ideas, then those smaller units could be given their own modules, otherwise, keep everything as a bigger module.
As for how you can split up code that is coupled together. Here is one of way of splitting up the code into the modules you indicated. Since you have a function, the tensor
function, that you would like people to use to get an instance of your Tensor
class, it seemed like creating a Python package would be somewhat sensible since packages come with an __init__.py
file that is used for establishing the API ie your tensor function. I put the tensor
function directly in the __init__.py
file, but if the function is pretty large, it can be broken out into a separate module, since the __init__.py
file is just suppose to give you an overview of the API being created.
# --- main.py ---- from tensor import tensor print(tensor([1,2,3]).mean())
# --- tensor/__init__.py ---- ''' Add some documentation here ''' def tensor(data): return Tensor(data) from tensor.Tensor import Tensor
# --- tensor/Tensor.py ---- from tensor import helper class Tensor: def __init__(self,data): self.data=data def __repr__(self): return f'Tensor({str(self.data)})' def mean(self): return helper.mean(self.data)
# --- tensor/helper.py ---- import numpy as np from . import tensor def mean(data): value=np.mean(data) return tensor(value)
About circular dependencies
Tensor
and helper
are importing each other, and this is ok. When the helper
module imports Tensor
, and Tensor
in turn imports helper
again, helper
will just continue loading normally, and then when it is done Tensor
will finish loading. Now if you had stuff on the module level (code outside of your function/classes) being executed when the module is first loaded, and it is dependent on functionality in another module that is only partially loaded, then that is when you run into problems with circular dependencies.
Using classes that don’t exist yet
I can add to the __init__
file
def some_function(): return DoesntExist()
and your code would still run. It doesn’t look for a class named Tensor
until it is actually running the tensor
function. If we did the following then we would get an error about Tensor
not existing.
def tensor(data): return Tensor(data) tensor() from tensor.Tensor import Tensor
because now we are running the tensor function before the import and it can’t find anything named Tensor
.
The order of stuff in __init__
If you switch the order around you will have
__init__
importsTensor
importshelper
imports__init__
again
as it tries to grab the tensor
function, but it can’t as the __init__
function can’t proceed past the the line that imports Tensor
until that import has been completed.
Now with the current order we have,
__init__
definestensor
, sees the import statement, and saves its current progress as a partial import The same imports happen (__init__
importsTensor
importshelper
imports__init__
looking for atensor
function) This time we look at the partial import for the tensor function, find it, and we are able to continue on using that.
I didn’t think about any of that when I put things in that order. I just wrote out the code, got the circular import error, switched the order around, and didn’t think about what was going on until you asked about it.
And now that I think about it, the following would have worked too.
The order of things in the __init__ file will no longer matter.
from tensor.Tensor import Tensor def tensor(data): return Tensor(data)
And then in helper.py
import numpy as np import tensor def mean(data): value=np.mean(data) return tensor.tensor(value)
The difference is that now instead of specifically asking that the tensor function exist when the module is imported by trying to do from . import tensor
, we are doing import tensor
(which is importing the tensor package and not the function). And now, whenever the the mean function gets run, we are going to do tensor.tensor(value)
to get the tensor function inside our tensor package.