I have tons of .xlsm files that I have to load. Each Excel file has 6 sheets. Because of that, I’m opening each Excel file like this, using pandas: After each iteration I am passing the df to other function and do some stuff with it. I am using pd.ExcelFile to load the file into memory just once and the…
Tag: pandas
Merging csv files in order of date created Python Pandas
I am merging 3700 csv files with a total of 10 million rows. The files do not have a sequential naming but the date in which they were created(Descending) is sequential. I use the following code to merge them but do not know how to add pick them in that sequence. The following are names of files arranged in d…
keep x rows and delete all from csv file
I want to be able to specify how many rows I want to keep and delete the rest, also preserving the header. I found some code which let’s you delete the first 5 rows but how can I make it do what I want? For example if I have this CSV I just want to specify a number to my
How to add hours to the original time in dataframe then generate a new date?
The data in test.csv likes this: I want to remove the +00:00 in upload_time and add 8 hours to upload_time ,then generate a new column new_upload_time. I use this code to do it. result.csv: Although I have implemented it, I feel that the code is a bit complicated. Is there a simple way? Answer Do the formatti…
python pandas get distinct matches in columns
I have a dataframe which looks a bit like what this code gives: What I want to end up with is a list of lists or dataframe or something similar which tells me the distinct matches across both columns in both directions. It’d be something like this: I have tried to do it but I can’t get it to go
pandas, merge duplicates if row contains wildcard text
I have a dataset of duplicates (ID). Dataset contains both information and emails. I’m trying to concatenate the emails (if row have character @) and then remove the duplicates. My original dataset: What I wish to accomplish: My current code is a modification of Eric Ed Lohmar code and give the followin…
Apply transformation only on string columns with Pandas, ignoring numeric data
So, I have a pretty large dataframe with 85 columns and almost 90,000 rows and I wanted to use str.lower() in all of them. However, there are several columns containing numerical data. Is there an easy solution for this? Than, after using something like df.applymap(str.lower) I would get: Currently it’s…
LightGBM does not accept the dtypes of my data
I’m trying to use LGBMClassifier and for some reason, he does not accept the types of my data (all features are not accepted, I tested it). When we look at my data we can clearly see that all dtypes are either category, float or int (pd.DataFrame.info()) When I eventually try to train my LGBMClassifier …
Confused on Index Slicing in python
I am slicing my hour column and when I use slice(1, 3) and slice(0, 3) I get the same results. What am i missing? See in the image below. Answer There is a space at the beginning of dataset[“hour”]. So slice(0, 3) is ” 02″ while slice(1, 3) is “02”. This was answered by @Ba…
How to replicate same values based on the index value of other column in python
I have a dataframe like below and I want to add another column that is replicated untill certain condition is met. Now I want to add another column which contains additional information about the dataframe. For instance, I want to replicate Yes untill id is B and No when it is below B and Yes from C to D and