I’m trying to create a new categorical column of countries with specific percentage values. Take the following dataset, for instance: I’m trying the following script to get the new column: However, I’m getting all the countries with equal count. I want specific count for each country: Desired Output What would be the ideal way of getting the desired output? Any
Tag: pandas
Trim leading zero’s using python pandas without changing the datatype of any columns
I have a csv file of around 42000 lines and around 80 columns, from which I need to remove leading Zero’s, hence I am using Pandas to_csv and saving it back to text file by which leading Zero’s are removed. Any column may contain null values in any row, but those columns are getting converted to Float datatype and getting
Remove part of a string from pd.to_datetime() unconverted values
I tried to convert a column of dates to datetime using pd.to_datetime(df, format=’%Y-%m-%d_%H-%M-%S’) but I received the error ValueError: unconverted data remains: .1 I ran: to identify the problem. 119/1037808 dates in the date column have an extra “.1” at the end of them. Other than the “.1”, the dates are fine. How can I remove the “.1” from the
How can I use Python to convert multiple columns in the same row to another row?
I have an excel file which has multiple title names as columns within the same row where the data is given, I need to sort the data and convert the column names to rows and assign it to the data under the “column names” enter image description here My expected output is for it to turn out like this: enter
Python pandas : How to find difference between two dataframe based on single column
I have two dataframes I am trying to find out the difference between these two dataframes based on the column Fruit This is what i am doing now but i am not getting the expected output Expected output Answer You can use the negated isin:
How can I use python conditionals to map columns in a dataframe with duplicates in them?
I am trying to create a mapping where there are duplicates in certain columns in a dataframe. Here are two examples of dataframes I am working with: Here is what I need; 3 conditional python logic that does: when we see the first issue_status of 100 and trading_state of None, map F in the reason column. when we see the
Transform Pandas column to get a key value pair in a column post group by
My DataFrame: Output required: Approach tried so far: Answer Use GroupBy.apply with lambda function: Duplicated keys not exist in python dictionary. You can aggregate values, e.g. by sum:
Chain df.str.split() in pandas dataframe
Edit: 2022NOV21 How do we chain df.col.str.split() since this returns the split columns if expand = True I am trying to split a column after performing .melt(). If I use assign I end up using the original column and the melted column actually does not even exist. Answer Using expand converts it into a DataFrame, which you do not really
How to save a list in a pandas dataframe cell to a HDF5 table format?
I have a dataframe that I want to save in the appendable format to a hdf5 file. The dataframe looks like this: And the code that replicates the issue is: Unfortunately, it returns this error: I am aware that I can save each value in a separate column. This does not help my extended use case, as there might be
Add values to new column from a dict with keys matching the index of a dataframe
I have a dictionary that for examples sake, looks like I have a dataframe that has the same index values as the keys in this dict. I want to add each value from the dict to the dataframe. I feel like doing a check for every row of the DF, checking the index value, matching it to the one in