Skip to content

Tag: pandas

Ignore UserWarning from openpyxl using pandas

I have tons of .xlsm files that I have to load. Each Excel file has 6 sheets. Because of that, I’m opening each Excel file like this, using pandas: After each iteration I am passing the df to other function and do some stuff with it. I am using pd.ExcelFile to load the file into memory just once and the…

Merging csv files in order of date created Python Pandas

I am merging 3700 csv files with a total of 10 million rows. The files do not have a sequential naming but the date in which they were created(Descending) is sequential. I use the following code to merge them but do not know how to add pick them in that sequence. The following are names of files arranged in d…

keep x rows and delete all from csv file

I want to be able to specify how many rows I want to keep and delete the rest, also preserving the header. I found some code which let’s you delete the first 5 rows but how can I make it do what I want? For example if I have this CSV I just want to specify a number to my

python pandas get distinct matches in columns

I have a dataframe which looks a bit like what this code gives: What I want to end up with is a list of lists or dataframe or something similar which tells me the distinct matches across both columns in both directions. It’d be something like this: I have tried to do it but I can’t get it to go

LightGBM does not accept the dtypes of my data

I’m trying to use LGBMClassifier and for some reason, he does not accept the types of my data (all features are not accepted, I tested it). When we look at my data we can clearly see that all dtypes are either category, float or int (pd.DataFrame.info()) When I eventually try to train my LGBMClassifier …

Confused on Index Slicing in python

I am slicing my hour column and when I use slice(1, 3) and slice(0, 3) I get the same results. What am i missing? See in the image below. Answer There is a space at the beginning of dataset[“hour”]. So slice(0, 3) is ” 02″ while slice(1, 3) is “02”. This was answered by @Ba…