Tag: pandas

GroupBy Pandas with ratio

I am working on a dataset which looks something like this: I am trying to do 2 things: Find length of longest sequence of each type and find ratio of A/B and B/A for those sequences for each ID. Ratio attribute explanation: Calculate the total amount in the longest sequence for each ID(say length n). If the s…

Remove trailing .0 from strings of entire DataFrame

character-trimming pandas python regexp-replace string

Hi I would like to remove all “.0” at the end of a string for an entire DataFrame and I need it to be an exact match. Let’s make an example df: The desired output: I tried using replace but it didn’t work for some reason (I read maybe because replace only replaces entire strings and no…

Automatic data wrangling on Pandas with multiple dataframes using lists and loops

append dataframe loops pandas python

for professional purposes I need to produce some reports that includes new entries every week. I have 16 dataframes having same column names (df names are week1, week2… week16). I created a list of the dataframes and then a loop. I wanted to test rename of column with index = 1 and I did not succeed. I …

Extract specific symbols from pandas cells, then replace them with values from a dict where they are keys

pandas python

My data looks like this, well these are two of the columns: These are MSC codes, corresponding to different areas of science. I need to replace each code with the corresponding subject from this dict here: https://mathscinet.ams.org/msnhtml/msc2020.pdf , some of them are: “”” 00 General and …

How to find all columns contains string and put in a new columns?

dataframe loops pandas python

I was wondering how could I find all values that start with ‘orange’ from all the columns and parse it into new columns. expected output : Answer Let’s try stack then filter by str.contains: df1: Or melt for same order as OP: df1: regex ^orange: ^ asserts position at start of a line orange m…

Shift column position to right based on criteria using Pandas

dataframe pandas pandas-groupby python python-3.x

I have a dataframe that looks like below I would like to position shift by 1 cell to the right if there is NA in the column dep_id. I tried the below but it wasn’t working Any efficient and elegant approach to shift column position on big data? For example, I expect my output to be like as shown below

Applying abbreviation to the column of a dataframe based on another column of the same dataframe

nlp pandas pandas-groupby python text-classification

I have two columns in the dataframe, one of which is a class and another is a description. In the description I have some abbreviations. I want to expand these abbreviations based on the class value. I have a dictionary with class as key and in the value I have another dictionary with abbreviations and its fu…

Bar chart plotting issue: TypeError: ‘AxesSubplot’ object is not iterable

matplotlib numpy pandas python seaborn

Below shown is the categorical data detail for the bar chart, which is from a specific DataFrame column i.e. coast Shown below syntax is the defined function used, to get the bar chart. However, the bar chart does appears without the values on the bar which is shown below. But the below error message appears …

Shading between two lines with Matplotlib

matplotlib pandas pandas-datareader python

I want to use axvspan() function to visualize a DataFrame that I obtained using Pandas DataReader.But when I use the the following codes, I saw an error and there is no shading in subplots. What should I do? Thank you. Answer Try looping over all Subplots and adding axvspan to the specific AxesSubplot instead…

How to get all the rows with the same values on a certain set of columns of an other specified row in Pandas?

dataframe pandas python

In a setup similar to this: My question is how to get ALL the rows in the dataframe with the same values on a certain set of columns ( let’s say for example {B,C} ) of an other specified row ( for example row with index 3) I want this (index 3, set {B,C}): The problem now is that in