Skip to content
Advertisement

Using Pyinstaller with NLTK results in error: can’t find nltk_data

I am attempting to export a simple GUI that used NLTK as an exe with Python 3.6 and Windows 10.

When I run PyInstaller to freeze my simple program as an exe I get the error: Unable to find “c:usersusrnltk_data” when adding binary and data files.

When I even copied the nltk_data folder here and I get an error in a different nltk.data.path path “c:usersusrappdatalocalprogramspythonpython36nltk_data”

import tkinter as tk
from nltk.corpus import stopwords
sw = stopwords.words('english')

counter = 0 
def counter_label(label):
  counter = 0
  def count():
    global counter
    counter += 1
    label.config(text=sw[counter])
    label.after(1000, count)
  count()


root = tk.Tk()
root.title("Counting Seconds")
label = tk.Label(root, fg="dark green")
label.pack()
counter_label(label)
button = tk.Button(root, text='Stop', width=25, command=root.destroy)
button.pack()
root.mainloop()

for pyinstaller I run

pyinstaller --onefile -- windowed test_tkinter.py

Advertisement

Answer

It seems that it is a known bug to the hook of PyInstaller named nltk. An easy way to fix it is to edit this file:

<PythonPath>/Lib/site-packages/PyInstaller/hooks/hook-nltk.py

And comment the lines iterating over nltk_data:

#-----------------------------------------------------------------------------
# Copyright (c) 2005-2018, PyInstaller Development Team.
#
# Distributed under the terms of the GNU General Public License with exception
# for distributing bootloader.
#
# The full license is in the file COPYING.txt, distributed with this software.
#-----------------------------------------------------------------------------


# hook for nltk
import nltk
from PyInstaller.utils.hooks import collect_data_files

# add datas for nltk
datas = collect_data_files('nltk', False)

# loop through the data directories and add them
# for p in nltk.data.path:
#     datas.append((p, "nltk_data"))

datas.append(("<path_to_nltk_data>", "nltk_data"))

# nltk.chunk.named_entity should be included
hiddenimports = ["nltk.chunk.named_entity"]

Remember to replace path_to_nltk_data with your currrent path for nltk_data.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement