I have a list of values that add up to 100 (percentage). I need to find the values that constitute the highest percentages as compared to others. How do I decide the criteria for filtering the data? Help me with the logic. Below are a few samples and expected output: The list is already sorted. This is not a ‘top
Tag: pandas
Split a dataframe based on a specifc cumsum value
I have a solution working, but it seems cumbersome and I am wondering if there is a better way to achieve what I want. I need to achieve two things: Split a dataframe into two dataframes based on a specifc cumsum value. If a row needs to be split to fulfill the cumsum condition, than this must happen. An example
Exporting many excel files based on column value in pandas dataframe while maintaining formatting?
I have a dataframe that I would like to parse to individual files based on unique strings in a column “names.” I can do this with a simple function like: The caveat being that I would like write out these files with conditional formatting. I’ve been able to achieve the desired formatting using ExcelWriter in the following block: Is there
Background with range on seaborn based on two columns
I am trying to add to my several line plots a background that shows a range from value x (column “Min”) to value y (column “Max”) for each year. My dataset looks like that: I used the following code: I would like something like this with the range being my “Min” and “Max” by year. Is it possible to do
Filter DataFrame based on partial matching string from list
I have a dataframe with lots of categories. Here list of some of them I want to filter dataframe based on string matching. I don’t want to pass entire row name, i wanna pass something like [‘Совкомбанк’, ‘Тинькофф’]. The expecting result of this is : I tried df = df[df[column_name].isin(values)] but i didn’t work. Answer .isin will check for exact
Pandas average of previous rows fulfilling condition
I have a huge data-frame (>20m rows) with each row containing a timestamp and a numeric variable X. I want to assign a new column where for each row the value in this new column is the average of X in the previous rows within a specified time window e.g the average of all rows with time stamps no more
Sorting a table in python with alphabet and numbers
I have the following table: Column1 Column2 99 QA 65 CD 134 LL N12 OO 127 KK Q23 MM 1 AA A10 KL K9 MA I would like to sort the table such that the numbers are sorted in descending order first then the alphabets in descending order. How do I do that? The output should look something like the
Select plot linestyle with plotly pandas backend
I want to be able to select the linestyle with the pandas plot method with the plotly backend. Matplotlib: When I use the matplotlib backend in pandas, I can do: which allows me to select the linestyle for each column. The output is: Plotly backend: With plotly I can do How can I select the linestyle of a given line
Lambda and vectorize in python
I’m learning data analysis while performing vectorized operation with lambda function it run at first but again run it shows error as TypeError: <lambda>() takes 1 positional argument but 2 were given sample data of tips.csv file This is the image that I run first which doesn’t show any error Answer You can vectorize solution different way – with numpy.where
Recode multiple values in several columns in Python [similar to R]
I am trying to translate my R script to python. I have a survey data with several date of birth and education level columns for each family member(from family member 1 to member 10): here a sample: I had a function in R in order to check the logic and re code wrong education level in all columns.Like this and