Skip to content
Advertisement

Tag: spacy

sentence segmentation within a dictionary using spacy dependency parse

I have a TMX file containing source and target segments. Some of these segments are made up of several sentences. My goal is to segment these multi-sentence segments so that the entire TMX file consists of single-sentence segments. I intend to use spacy’s dependency parser to segment these multi-sentence segments. To achieve this, I have extracted the source and target

SpaCy NLP- Detect the verb form

As far as I know that we can get the v1 form of a verb using I wanted to know is their a way in which we can get the form of the verb like: swims it should output v4 Is their way to do that using SpaCy or any other lib and if there is then please give a

Meaningless Spacy Nouns

I am using Spacy for extracting nouns from sentences. These sentences are grammatically poor and may contain some spelling mistakes as well. Here is the code that I am using: Code Output: Similarly for sentence “fast foward2”, I get Spacy noun as Which shows that these nouns have some meaningless words like: sfx, foward2, ms, 64x, bit, pwm, r, brailledisplayfastmovement,

How to use LanguageDetector() from spacy_langdetect package?

I’m trying to use the spacy_langdetect package and the only example code I can find is (https://spacy.io/universe/project/spacy-langdetect): It’s throwing error: nlp.add_pipe now takes the string name of the registered component factory, not a callable component. So I tried using the below for adding to my nlp pipeline But this gives error: Can’t find factory for ‘language_detector’ for language English (en).

Can’t find SpaCy model when packaging with PyInstaller

I am using PyInstaller package a python script into an .exe. This script is using spacy to load up the following model: en_core_web_sm. I have already run python -m spacy download en_core_web_sm to download the model locally. The issue is when PyInstaller tries to package up my script it can’t find the model. I get the following error: Can’t find

Warning: [W108] The rule-based lemmatizer did not find POS annotation for the token ‘This’

What this message is about? How do I remove this warning message? Warning: [W108] The rule-based lemmatizer did not find POS annotation for the token ‘This’. Check that your pipeline includes components that assign token.pos, typically ‘tagger’+’attribute_ruler’ or ‘morphologizer’. [W108] The rule-based lemmatizer did not find POS annotation for the token ‘is’. Check that your pipeline includes components that assign

Advertisement