Tag: csv

Reading csv file with partially variable name

I want to read a csv file into a data frame from a certain folder with pandas. This folder contains several csv files. They contain different information. The first part in the filename (1 – 8 is variable). I want to read it in the file which ends with ‘_Reference.csv’, but I have no clue ho…

Fastest way to filter csv using pandas and create a matrix

csv dask dataframe pandas python

input dict I have large csv files in the below format basename_AM1.csv I have large csv files in the below format basename_AM1.csv Now I need to create a similarity dict like below for the given input_dict by searching/filter the csv files I have come up with the below logic but for an input_dict of 100 sampl…

Parse multiple line CSV using PySpark , Python or Shell

awk csv pyspark python shell

Input (2 columns) : Note: Harry and Prof. does not have starting quotes Output (2 columns) What I tried (PySpark) ? Issue The above code worked fine where multiline had both start and end double quotes (For eg: row starting with Ronald) But it didnt work with rows where we only have end quotes but no start qu…

Optimal way to use multiprocessing for many files

csv io multiprocessing python

So I have a large list of files that need to be processed into CSVs. Each file itself is quite large, and each line is a string. Each line of the files could represent one of three types of data, each of which is processed a bit differently. My current solution looks like the following: I iterate through the …

Function failing to update spacing after comma

csv python readlines with-statement writelines

I have a csv file that has inconsistent spacing after the comma, like this: 534323, 93495443,34234234, 3523423423, 2342342,236555, 6564354344 I have written a function that tries to read in the file and makes the spacing consistent, but it doesn’t appear to update anything. After opening the new file cr…

How to write an object typed array into csv file with NumPy?

csv numpy python

I have two numpy arrays(A, B) and 2 scalar values(C,D) that I want to store in a csv file. I know how to write a single numpy array in it: I want the first two columns of my csv-file to contain the 2 arrays A and B and then have the 2 scalar values C and D as the first

Pandas: how select row based on string in previous row – should be a simple solution

csv pandas python

I have a a csv file. How do I print the row that follows a row that has a particular string? I need to print all rows that contain “ixation” in them and then the row that follows this row. Here is my current code Here is my current output… But I want… How do I only print out the

Convert Hour Minute Second Milisecond to total second.milisecond in latest pandas version

csv pandas python time

I have csv file with format HH:MM:SS.Milisecond I want convert that to How to do that in Pandas latest version (1.4)? I’ve tried some answers from this forum, but they don’t work on csv and latest version of Pandas. Answer Assuming this input: You can use pandas.Timedelta.total_seconds: output:

Error tokenizing data. C error: Expected x fields in line 5, saw x

csv pandas python

I keep getting this error. I don’t even know how to identify the row that is in error as the data I am requesting is jumbled. I can’t provide a URL to the API but I will provide a sample of the first few lines of data. My code: Error: Data from API: Answer Since you don’t specify a separator