I have a TMX file containing source and target segments. Some of these segments are made up of several sentences. My goal is to segment these multi-sentence segments so that the entire TMX file consists of single-sentence segments. I intend to use spacy’s dependency parser to segment these multi-sentence segments. To achieve this, I have extracted the source and target
Tag: spacy
Get for each word the number of the sentences in which appears in a given text [closed]
Closed. This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 1 year ago. Improve this question I’m using Spacy and I am looking for a program that counts the frequencies of each word in a text, and output each word with
Given a word can we get all possible lemmas for it using Spacy?
The input word is standalone and not part of a sentence but I would like to get all of its possible lemmas as if the input word were in different sentences with all possible POS tags. I would also like to get the lookup version of the word’s lemma. Why am I doing this? I have extracted lemmas from all
Name Entity Recognition (NER) for multiple languages
I am writing some code to perform Named Entity Recognition (NER), which is coming along quite nicely for English texts. However, I would like to be able to apply NER to any language. To do this, I would like to 1) identify the language of a text, and then 2) apply the NER for the identified language. For step 2,
SpaCy NLP- Detect the verb form
As far as I know that we can get the v1 form of a verb using I wanted to know is their a way in which we can get the form of the verb like: swims it should output v4 Is their way to do that using SpaCy or any other lib and if there is then please give a
Meaningless Spacy Nouns
I am using Spacy for extracting nouns from sentences. These sentences are grammatically poor and may contain some spelling mistakes as well. Here is the code that I am using: Code Output: Similarly for sentence “fast foward2”, I get Spacy noun as Which shows that these nouns have some meaningless words like: sfx, foward2, ms, 64x, bit, pwm, r, brailledisplayfastmovement,
How to use LanguageDetector() from spacy_langdetect package?
I’m trying to use the spacy_langdetect package and the only example code I can find is (https://spacy.io/universe/project/spacy-langdetect): It’s throwing error: nlp.add_pipe now takes the string name of the registered component factory, not a callable component. So I tried using the below for adding to my nlp pipeline But this gives error: Can’t find factory for ‘language_detector’ for language English (en).
Can’t find SpaCy model when packaging with PyInstaller
I am using PyInstaller package a python script into an .exe. This script is using spacy to load up the following model: en_core_web_sm. I have already run python -m spacy download en_core_web_sm to download the model locally. The issue is when PyInstaller tries to package up my script it can’t find the model. I get the following error: Can’t find
Warning: [W108] The rule-based lemmatizer did not find POS annotation for the token ‘This’
What this message is about? How do I remove this warning message? Warning: [W108] The rule-based lemmatizer did not find POS annotation for the token ‘This’. Check that your pipeline includes components that assign token.pos, typically ‘tagger’+’attribute_ruler’ or ‘morphologizer’. [W108] The rule-based lemmatizer did not find POS annotation for the token ‘is’. Check that your pipeline includes components that assign
spacy matcher returns right answer when two words are set as seperate ‘TEXT’ conditional object only. Why is it?
I’m trying to set a matcher finding word ‘iPhone X’. The sample code says I should follow below. I tried another approach by putting like below. Why is the second approach not working? I assumed if I put the two word ‘iPhone’ and ‘X’ together, it might work as the same way cause it regard the word with space in