Tag: pandas

Finding elements in list with highest weightage compared to other elements

I have a list of values that add up to 100 (percentage). I need to find the values that constitute the highest percentages as compared to others. How do I decide the criteria for filtering the data? Help me with the logic. Below are a few samples and expected output: The list is already sorted. This is not a ‘top

Split a dataframe based on a specifc cumsum value

cumsum dataframe pandas python

I have a solution working, but it seems cumbersome and I am wondering if there is a better way to achieve what I want. I need to achieve two things: Split a dataframe into two dataframes based on a specifc cumsum value. If a row needs to be split to fulfill the cumsum condition, than this must happen. An example

Exporting many excel files based on column value in pandas dataframe while maintaining formatting?

pandas pandas.excelwriter python

I have a dataframe that I would like to parse to individual files based on unique strings in a column “names.” I can do this with a simple function like: The caveat being that I would like write out these files with conditional formatting. I’ve been able to achieve the desired formatting using ExcelWriter in the following block: Is there

Background with range on seaborn based on two columns

matplotlib pandas python seaborn

I am trying to add to my several line plots a background that shows a range from value x (column “Min”) to value y (column “Max”) for each year. My dataset looks like that: I used the following code: I would like something like this with the range being my “Min” and “Max” by year. Is it possible to do

Filter DataFrame based on partial matching string from list

pandas python

I have a dataframe with lots of categories. Here list of some of them I want to filter dataframe based on string matching. I don’t want to pass entire row name, i wanna pass something like [‘Совкомбанк’, ‘Тинькофф’]. The expecting result of this is : I tried df = df[df[column_name].isin(values)] but i didn’t work. Answer .isin will check for exact

Pandas average of previous rows fulfilling condition

dataframe numpy pandas performance python

I have a huge data-frame (>20m rows) with each row containing a timestamp and a numeric variable X. I want to assign a new column where for each row the value in this new column is the average of X in the previous rows within a specified time window e.g the average of all rows with time stamps no more

Sorting a table in python with alphabet and numbers

dataframe pandas python sorting

I have the following table: Column1 Column2 99 QA 65 CD 134 LL N12 OO 127 KK Q23 MM 1 AA A10 KL K9 MA I would like to sort the table such that the numbers are sorted in descending order first then the alphabets in descending order. How do I do that? The output should look something like the

Select plot linestyle with plotly pandas backend

matplotlib pandas plot plotly python

I want to be able to select the linestyle with the pandas plot method with the plotly backend. Matplotlib: When I use the matplotlib backend in pandas, I can do: which allows me to select the linestyle for each column. The output is: Plotly backend: With plotly I can do How can I select the linestyle of a given line

Lambda and vectorize in python

lambda numpy pandas python

I’m learning data analysis while performing vectorized operation with lambda function it run at first but again run it shows error as TypeError: <lambda>() takes 1 positional argument but 2 were given sample data of tips.csv file This is the image that I run first which doesn’t show any error Answer You can vectorize solution different way – with numpy.where

Recode multiple values in several columns in Python [similar to R]

dataframe numpy pandas python r

I am trying to translate my R script to python. I have a survey data with several date of birth and education level columns for each family member(from family member 1 to member 10): here a sample: I had a function in R in order to check the logic and re code wrong education level in all columns.Like this and