Skip to content

Tag: dataset

Problem to covert data from CoNLL format to spacy format

How can I covert data from CoNLL format to spacy format? I’ve executed current code following similar Q&A on stackoverflow: How to convert from CoNLL format to spacy format. CoNLL spacyformat However, I cannot fix the error. Code Error Message I’ve read the document, spacy convert, but have no idea how to fix the error. Environment Python 3.9.1 spaCy version

How to split parallel corpora while keeping alignment?

I have two text files containing parallel text in two languages (potentially millions of lines). I am trying to generate random train/validate/test files from that single file, as train_test_split does in sklearn. However when I try to import it into pandas using read_csv I get errors from many of the lines because of erroneous data in there and it would

Getting min and max datime for each date in csv

I’m kind of new to data science and Python. First of all, do you suggest using any other Library than pandas when dealing with huge dataset (100K+ rows)? Second of all, let me expose to you my current problem. I have a Dataset in which I have a Datetime column, to make it easy to understand, let’s say I only

Data Augmentation in PyTorch

I am a little bit confused about the data augmentation performed in PyTorch. Now, as far as I know, when we are performing data augmentation, we are KEEPING our original dataset, and then adding other versions of it (Flipping, Cropping…etc). But that doesn’t seem like happening in PyTorch. As far as I understood from the references, when we use data.transforms
