Skip to content
Advertisement

PyInstaller with Pandas creates over 500 MB exe

I try to create an exe file using PyInstaller 3.2.1, for test purpose I tried to make an exe for following code:

import pandas as pd
print('hello world')

After considerable amount of time (15mins +) I finished with dist folder as big as 620 MB and build – 150 MB. I work on Windows using Python 3.5.2 |Anaconda custom (64-bit). Might be worth noting that in dist folder mkl files are responsible for almost 300 MB. I run pyinstaller using ‘pyinstaller.exe foo.py’. I tried using –exclude-module to exclude some dependencies, still ended up with huge files. Whether I use onefile or onedir doesn’t make any difference.

I am aware that exe must contain some important files but is it normal to be as big as almost 1 GB? I can provide warning log if necessary or anything that could be helpful to solve the matter.

P.S. In parallel my coworker created an exe from same sample script and ended up with less than 100 MB, difference is he is not using anaconda. Could that be the matter?

Any help will be appreciated.

Advertisement

Answer

PyInstaller creates a big executable from conda packages and a small executable from pip packages. From this simple python code:

from pandas import DataFrame as df
print('h')

I obtain a 203MB executable using conda packages and a 30MB executable using pip packages. But conda is a nice replacement for pure virtualenv. I can develop with conda and Jupyter, create some script ‘mycode.py’ (I can download Jupyter notebook as py-file in myfolder).

But my final solution is next: If you do not have it, install Miniconda and from the Windows Start Menu open Anaconda Prompt;

    cd myfolder
    conda create -n exe python=3
    activate exe
    pip install pandas pyinstaller pypiwin32
    echo hiddenimports = ['pandas._libs.tslibs.timedeltas'] > %CONDA_PREFIX%Libsite-packagesPyInstallerhookshook-pandas.py
    pyinstaller -F mycode.py

Where I create a new environment ‘exe’, pypiwin32 is needed for pyinstaller but is not installed automaticaly, and hook-pandas.py is needed to compile with pandas. Also, importing submodules does not help me optimize the size of the executable file. So I do not need this thing:

from pandas import DataFrame as df

but I can just use the usual code:

import pandas as pd

Also, some errors are possible along using the national letters in paths, so it is nice the english user account for development tools.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement