I have a dataframe with missing values. for each index in a column group, i want to replace these values seperately. If all of the values in a group are missing, i want to replace the values with 1. If only some of the values are missing, i want to replace it with data from an imputed dataframe dataframe 1
Tag: dataframe
Is it possible access a list stored in a dataframe in a vectorized manner?
Considering a dataframe like so: I want to create a new column ‘extracted_value’ which would be the value contained in the list at ‘indexes’ index (list = [0, 1, 2], indexes = 0 -> 0, indexes = 1 -> 1, and so on) Doing it with iterrows() is extremely slow as I work with dataframes containing multiple millions of lines.
How to perform split/merge/melt with Python and polars?
I have a data transformation problem where the original data consists of “blocks” of three rows of data, where the first row denotes a ‘parent’ and the two others are related children. A minimum working example looks like this: In reality, there are up to 15 Providers (so up to 30 columns), but they are not necessary for the example.
Can I get a sub-DataFrame according to first letter in columns names?
I want to get only columns whose names start with ‘Q1’ and those starting with ‘Q3’, I know that this is possible by doing: But since my real df is too large (more than 70 variables) I search a way to get the new_df by using only desired first letters in the columns titles. My example dataframe is: df has
How to sort MultiIndex using values from a given column
I have a DataFrame with 2-level index and column with the numerical values. I want to sort it by level-0 and level-1 index in such a way that the the order of 0-level index is determined by the sum of values from Value column (descending), and the order of 1-level index is also determined by the values in Value column.
Add Categorical Column with Specific Count
I’m trying to create a new categorical column of countries with specific percentage values. Take the following dataset, for instance: I’m trying the following script to get the new column: However, I’m getting all the countries with equal count. I want specific count for each country: Desired Output What would be the ideal way of getting the desired output? Any
How can I use Python to convert multiple columns in the same row to another row?
I have an excel file which has multiple title names as columns within the same row where the data is given, I need to sort the data and convert the column names to rows and assign it to the data under the “column names” enter image description here My expected output is for it to turn out like this: enter
Python pandas : How to find difference between two dataframe based on single column
I have two dataframes I am trying to find out the difference between these two dataframes based on the column Fruit This is what i am doing now but i am not getting the expected output Expected output Answer You can use the negated isin:
How can I use python conditionals to map columns in a dataframe with duplicates in them?
I am trying to create a mapping where there are duplicates in certain columns in a dataframe. Here are two examples of dataframes I am working with: Here is what I need; 3 conditional python logic that does: when we see the first issue_status of 100 and trading_state of None, map F in the reason column. when we see the
How to save a list in a pandas dataframe cell to a HDF5 table format?
I have a dataframe that I want to save in the appendable format to a hdf5 file. The dataframe looks like this: And the code that replicates the issue is: Unfortunately, it returns this error: I am aware that I can save each value in a separate column. This does not help my extended use case, as there might be