Tag: dataset

how can get difference between a defined date and the dates from a csv file python

I have a list of dates and I want to get a difference from a defined one(I mean days) and append days calculated in a new column I get TypeError: unsupported operand type(s) for -: ‘DatetimeArray’ and ‘datetime.date’ Now how can I read the dates in csv file in the same format as the defined date is there a way

Problem to covert data from CoNLL format to spacy format

dataset nlp python spacy

How can I covert data from CoNLL format to spacy format? I’ve executed current code following similar Q&A on stackoverflow: How to convert from CoNLL format to spacy format. CoNLL spacyformat However, I cannot fix the error. Code Error Message I’ve read the document, spacy convert, but have no idea how to fix the error. Environment Python 3.9.1 spaCy version

I’m trying to import CSV file using pandas, But I’m getting Error. (look at pic)

csv dataset jupyter pandas python

What am I doing wrong?? I’m trying to import a csv file using pandas, i either get an error stating file can’t be found or a unicodeerror message? Answer You should escape your backslashes on Windows – U is interpreted as a unicode character directive in the string. Try:

extracting images and their label one by one from ImageDataGenerator().flow_from_directory

dataset keras python tensorflow

so I imported my dataset(38 classes) for validation using ImageDataGenerator().flow_from_directory and i wanted to pick each image and its label one by one. For example i want to pick the first image and it’s label i tried this i get the image but for the label i just get an array of shape (32,38) with 0 and 1s Is there

Loading a large dataset from CSV files in TensorFlow

csv dataset python tensorflow-datasets tensorflow2.0

I use the following code to load a bunch of images in my data set in TensorFlow, which works well: I am wondering how I can use a similar code to load a bunch of CSV files. Each CSV file has a shape 256 x 256 and can be assumed as a grayscale image. I don’t know what I should

How to split parallel corpora while keeping alignment?

dataset pandas python scikit-learn unix

I have two text files containing parallel text in two languages (potentially millions of lines). I am trying to generate random train/validate/test files from that single file, as train_test_split does in sklearn. However when I try to import it into pandas using read_csv I get errors from many of the lines because of erroneous data in there and it would

Getting min and max datime for each date in csv

data-science dataset pandas python

I’m kind of new to data science and Python. First of all, do you suggest using any other Library than pandas when dealing with huge dataset (100K+ rows)? Second of all, let me expose to you my current problem. I have a Dataset in which I have a Datetime column, to make it easy to understand, let’s say I only

Data Augmentation in PyTorch

data-augmentation dataset image-processing python pytorch

I am a little bit confused about the data augmentation performed in PyTorch. Now, as far as I know, when we are performing data augmentation, we are KEEPING our original dataset, and then adding other versions of it (Flipping, Cropping…etc). But that doesn’t seem like happening in PyTorch. As far as I understood from the references, when we use data.transforms