I use the modin library for multiprocessing.
While the library is great for faster processing, it fails at merge
and I would like to revert to default pandas in between the code.
I understand as per PEP 8: E402 conventions, import should be declared once and at the top of the code however my case would need otherwise.
import pandas as pd import modin.pandas as mpd import os import ray ray.init() os.environ["MODIN_ENGINE"] = "ray" df = mpd.read_csv() do stuff
Then I would like to revert to default pandas within the same code
but how would i do the below in pandas
as there does not seem to be a clear way to switch from pd
and mpd
in the below lines and unfortunately modin seems to take precedence over pandas.
df = df.loc[:, df.columns.intersection(['col1', 'col2'])] df = df.drop_duplicates() df = df.sort_values(['col1', 'col2'], ascending=[True, True])
Is it possible? if yes, how?
Advertisement
Answer
Since many have posted answers however in this particular case, as applicable and pointed out by @Nin17 and this comment from Modin GitHub, to convert from Modin to Pandas for single core processing of some of the operations like df.merge
you can use
import pandas as pd import modin.pandas as mpd import os import ray ray.init() os.environ["MODIN_ENGINE"] = "ray" df_modin = mpd.read_csv() #reading dataframe into Modin for parallel processing df_pandas = df_modin._to_pandas() #converting Modin Dataframe into pandas for single core processing
and if you would like to reconvert the dataframe to a modin dataframe for parallel processing
df_modin = mpd.DataFrame(df_pandas)