Tag: pandas

Add Categorical Column with Specific Count

I’m trying to create a new categorical column of countries with specific percentage values. Take the following dataset, for instance: I’m trying the following script to get the new column: However, I’m getting all the countries with equal count. I want specific count for each country: Desired Output What would be the ideal way of getting the desired output? Any

Trim leading zero’s using python pandas without changing the datatype of any columns

export-to-csv pandas python

I have a csv file of around 42000 lines and around 80 columns, from which I need to remove leading Zero’s, hence I am using Pandas to_csv and saving it back to text file by which leading Zero’s are removed. Any column may contain null values in any row, but those columns are getting converted to Float datatype and getting

Remove part of a string from pd.to_datetime() unconverted values

datetime pandas python string

I tried to convert a column of dates to datetime using pd.to_datetime(df, format=’%Y-%m-%d_%H-%M-%S’) but I received the error ValueError: unconverted data remains: .1 I ran: to identify the problem. 119/1037808 dates in the date column have an extra “.1” at the end of them. Other than the “.1”, the dates are fine. How can I remove the “.1” from the

How can I use Python to convert multiple columns in the same row to another row?

dataframe excel pandas python

I have an excel file which has multiple title names as columns within the same row where the data is given, I need to sort the data and convert the column names to rows and assign it to the data under the “column names” enter image description here My expected output is for it to turn out like this: enter

Python pandas : How to find difference between two dataframe based on single column

dataframe pandas python

I have two dataframes I am trying to find out the difference between these two dataframes based on the column Fruit This is what i am doing now but i am not getting the expected output Expected output Answer You can use the negated isin:

How can I use python conditionals to map columns in a dataframe with duplicates in them?

dataframe pandas python

I am trying to create a mapping where there are duplicates in certain columns in a dataframe. Here are two examples of dataframes I am working with: Here is what I need; 3 conditional python logic that does: when we see the first issue_status of 100 and trading_state of None, map F in the reason column. when we see the

Transform Pandas column to get a key value pair in a column post group by

pandas python

My DataFrame: Output required: Approach tried so far: Answer Use GroupBy.apply with lambda function: Duplicated keys not exist in python dictionary. You can aggregate values, e.g. by sum:

Chain df.str.split() in pandas dataframe

chain melt pandas python split

Edit: 2022NOV21 How do we chain df.col.str.split() since this returns the split columns if expand = True I am trying to split a column after performing .melt(). If I use assign I end up using the original column and the melted column actually does not even exist. Answer Using expand converts it into a DataFrame, which you do not really

How to save a list in a pandas dataframe cell to a HDF5 table format?

dataframe hdf5 pandas pytables python

I have a dataframe that I want to save in the appendable format to a hdf5 file. The dataframe looks like this: And the code that replicates the issue is: Unfortunately, it returns this error: I am aware that I can save each value in a separate column. This does not help my extended use case, as there might be

Add values to new column from a dict with keys matching the index of a dataframe

pandas python

I have a dictionary that for examples sake, looks like I have a dataframe that has the same index values as the keys in this dict. I want to add each value from the dict to the dataframe. I feel like doing a check for every row of the DF, checking the index value, matching it to the one in