I have a pandas df with mixed formatting for a specific column. It contains the qtr and year. I’m hoping to split this column into separate columns. But the formatting contains a space or a second dash between qtr and year. I’m hoping to include a function that splits the column by a blank space o…
Tag: pandas
cumcount based on certain values sorted by date in Pandas dataframe
I have the a dataframe df that looks like sorted by ID and Date in ascending and descending order respectively: I would like to add two new columns called Number of 1 which is defined to be cumulative count of number 1 occurred in Place for each ID and another column called Recent Number of 1 which is defined…
How to display the nearest station with the distance in a database
The train station dataframe is in this format: The house dataframe is in this format i manage to find the closest station to each flat in distance, but how can i include a column which indicates what the closest station is Here is the code that i did to find the shortest distance, but i cant seems to find a
FastAPI is very slow in returning a large amount of JSON data
I have a FastAPI GET endpoint that is returning a large amount of JSON data (~160,000 rows and 45 columns). Unsurprisingly, it is extremely slow to return the data using json.dumps(). I am first reading the data from a file using json.loads() and filtering it per the inputted parameters. Is there a faster way…
Process columns based on column names in another column
I like to select cells for processing by choosing column names contained in a different column. For clarity, input and output are given below. Column ‘a’ contains the column names for setting the value to None for each row. I tried to code as below but keep getting errors. Input Output Answer Chec…
How to calculate the outliers in a Pandas dataframe while excluding NaN values
I have a pandas dataframe that should look like this. Some values in this dataframe are outliers. I came across this method of calculating the outliers in every colum using the z score: My goal is to create a column Is Outlier and put a True/False on each row that has/doesn’t have at least one outlier a…
Find the unique values of the first column but not present in the second column and return the corresponding row number of the value in first column
I have a csv file and there are a few columns here. Some of these columns have values and some don’t. What code should I write to get the output in the example below? Expected result The necessary conditions are as follows: ‘Mahmut’ should be contained in the name column, and the number in t…
Combine two pandas Series with overlap into one column
Let us say I have two Pandas series A and B that I wish to combine into one series. I want to set the values of overlapping rows to 0 (or some other arbitrary value). Into How would I go about doing this in pandas. Answer We can do concat then check the dup index , drop reindex back
Searching word frequency in pandas from dict
Here is the code which I am using: I want the output dataframe to look for keys in the dictionary and search for them in the dataframe but the key can be only be counted one time against an id. Simply put: Group by id. For each group: see if at least one word in all sentences of a group
InvalidSchema: No connection adapters were found. When working with Python Web scraper
I am rather new to Web Scraping I have scrapped one of the zip files seen here. The goal is to append them into a final data frame called final_df. Below is a snip of my code that runs well. This works well for one year of zip files such as 2017 however I am curious if we could get