I have a table with some company information that we’re trying to clean up. In the first column is a clean company name, but not necessarily the correct one. In the second column, there is the correct company name, but often not very clean / missing. Here is an example. Name Info Nike Nike, a footwear m…
Tag: pandas
How to split in train and test by month
I have a dataframe structured like this I have data for all days and months from 2018 to 2021, with around 50k observations How can I aggregate all the data for the same month and perform a Train-Test splitting for each month? I.e. for all the data of the months of January, February, March and so on. Answer t…
Plotting top 10 Values in Big Data
I need help plotting some categorical and numerical Values in python. the code is given below: However, the data size is so huge (Big data) that I’m not even able to make meaningful plotting in python. Basically, I just want to take the top 5 or top 10 values in python and make a plot of that as given b…
Using Functions rather than one liners Python
I am trying to use Python functions to perform data preprocessing. I want to have efficient code. I have like 8 files that I am reading to load them and do some sort of analysis. How would one achieve that using a function and the .head() to read all the CSV files? For instance instance I have loaded my data
Quarter end from today’s date
I’m trying to get a quarter end date based on today’s date, but I wanted to strip the time first. If I do that, the below code throws me an error on the last line AttributeError: ‘datetime.datetime’ object has no attribute ‘to_period’ Could you advise me, how to improve it …
What’s the most pandiastic way to join 2 instances of pd.Index?
I have 2 pandas index instances that come from different functions / a bit of a complicated mask to get to those. I would now like to combine those, i.e., define a ‘combined index’ that holds all labels contained in either of the two. My friend pd.concat() cannot be applied to 2 index instances. W…
Get the sum of each column, with recursive values in each cell
Given a parameter p, be any float or integer. For example, let p=4 time 1 2 3 4 5 Numbers a1 a1*(0.5)^(1/p)^(2-1) a1*(0.5)^(1/p)^(2-1) a1*(0.5)^(1/p)^(3-1) a1*(0.5)^(1/p)^(4-1) Numbers nan a2 a2*(0.5)^(1/p)^(3-2) a2*(0.5)^(1/p)^(4-2) a2*(0.5)^(1/p)^(5-2) Numbers nan nan a3 a3*(0.5)^(1/p)^(4-3) a3*(0.5)^(1/p)^…
pandas string replace multiple character in a cell
I want to replace 1 with 4, 2 with 5, and 3 with 6 So this is the desired output How can I achieve this using pd.str.replace() ? Answer Try .replace (not .str.replace) with option regex=True: Output:
Python & Beautiful Soup – Extract text between a specific tag and class combination
I’m new to using Beautiful Soup and web scraping in general; I’m trying to build a dataframe that has the title, content, and publish date from a blog post style website (everything’s on one page, there’s a title, publish date, and then the post’s content). I’m able to get …
Convert Array to dataframe with Longitude, Lattitude coordinates
Imported Libraries I am trying to creat a Heatmap out of my strava dataset ( which turns to be a csv file of 155479 rows with Georaphical cooridnates) I tried first to display the whole dataset on Folium using python, the problem is that Folium seemed to crash when i tried to upload the whole dataset ( it was…