Tag: nlp

Load a model as DPRQuestionEncoder in HuggingFace

bert-language-model huggingface-transformers nlp python transformer-model

I would like to load the BERT’s weights (or whatever transformer) into a DPRQuestionEncoder architecture, such that I can use the HuggingFace save_pretrained method and plug the saved model into the RAG architecture to do end-to-end fine-tuning. But I got the following error I am using the last version of Transformers. Answer As already mentioned in the comments, DPRQuestionEncoder does

How to detect protected cells in Excel file using Python?

excel nlp openpyxl pandas python

Given that an Excel file contains some cells protected with passwords, I want to detect these protected cells to choose whether to include them in the inputs or skip them. I have tried pandas and openpyxl However, the protected cells are read normally like other unprotected cells and could be easily changed. So the question is, how could I detect

Problem to covert data from CoNLL format to spacy format

dataset nlp python spacy

How can I covert data from CoNLL format to spacy format? I’ve executed current code following similar Q&A on stackoverflow: How to convert from CoNLL format to spacy format. CoNLL spacyformat However, I cannot fix the error. Code Error Message I’ve read the document, spacy convert, but have no idea how to fix the error. Environment Python 3.9.1 spaCy version

Python, NLP: How to find all trigrams from text files with adjectives as the middle term

nlp nltk python

I think the question is self-explanatory but here goes the detailed meaning of the question. I want to extract all trigrams from text files using the nltk library having adjectives as the middle term. Example Text – A red ball was with the good boy. Example of output – and so on Answer This code should do it:

How to parse a lisp-readable file of property lists in Python

lisp nlp nltk pyparsing python

I am Trying to parse a verbs english lexicon in order to built a NLP application using Python, so I have to merge it with my NLTK scripts, the lexicon is a lisp-readable file of property lists, but I need it in a easier formart like a Json file or a pandas dataframe. An example from that Lexicon database is:

Applying abbreviation to the column of a dataframe based on another column of the same dataframe

nlp pandas pandas-groupby python text-classification

I have two columns in the dataframe, one of which is a class and another is a description. In the description I have some abbreviations. I want to expand these abbreviations based on the class value. I have a dictionary with class as key and in the value I have another dictionary with abbreviations and its full form. Since these

How to save Farsi text in csv file using python?

farsi nlp pandas python

I was trying to save my dataset in a CSV file with the following script: but the result is confusing, write some unknown chars to CSV file instead of Farsi chars: Can anyone help me? I want to write all these files in a CSV: example of what I have in one of them and want to write: but result:

Get for each word the number of the sentences in which appears in a given text [closed]

nlp python spacy

Closed. This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 1 year ago. Improve this question I’m using Spacy and I am looking for a program that counts the frequencies of each word in a text, and output each word with

Create a NER dictionary from a given text

dictionary named-entity-recognition nlp python

I have the following variable data[1][‘entities’][0] = (48, 54, ‘Category 1’) stands for (start_offset, end_offset, entity). I want to read each word of data[0] and tag it according to data[1] entities. I am expecting to have as final output, Here, ‘O’ stands for ‘OutOfEntity’, ‘S’ stands for ‘Start’, ‘B’ stands for ‘Between’, and ‘E’ stands for ‘End’ and are unique

Given a word can we get all possible lemmas for it using Spacy?

lemmatization nlp python spacy spacy-3

The input word is standalone and not part of a sentence but I would like to get all of its possible lemmas as if the input word were in different sentences with all possible POS tags. I would also like to get the lookup version of the word’s lemma. Why am I doing this? I have extracted lemmas from all