Tag: pandas

How to replace NaN value in column in Dataframe based on values from another column in same dataframe

data-science dataframe numpy pandas python

Below is the Dataframe i’m working. I want to replace NaN values in ‘Score’ columns using values from column ‘Country’ and ‘Sectors’ Below is the code which I’ve tried I want to replace only NaN values specific to country == ‘USA’ and Sectors == &#82…

Python: How to One Hot Encode a Feature with multiple values?

dataframe pandas python

I have the following dataframe df with names of the travelling cities in route column of an aircraft with it’s ticket_price. I want to obtain individual city names from route and one hot encode them. Dataframe (df) Required Dataframe (df_encoded) Code I have performed some preprocessing on the route col…

Using Python Pandas, read multiple folder paths written in xlsx file and process each csv file separately

pandas python

I have an excel file with the name F_Path.xlsx listing the folder paths like below: Answer Try the following:

How to drop rows in one DataFrame based on one similar column in another Dataframe that has a different number of rows

dataframe duplicates pandas python

I have two DataFrames that are completely dissimilar except for certain values in one particular column: How would I go about finding the matching values in the Email column of df and the Contact column of df2, and then dropping the whole row in df based on that match? Output I’m looking for (index numb…

pandas .diff() but use first cell as difference between last cell in prior column

dataframe pandas python

say that i have a df in the following format: and i would like to get the difference of the 2020 column by using df[‘delta’] = df[‘2020’].diff() this will obviously return NaN for the first value in the column. how can i make it so that it automatically interprets that diff as the diff…

Pandas merge indexing not behaving as expected

dataframe join pandas python

I am trying to perform an anti-join in effectively one line. However, my one line solution is not giving me the same results that a receive when breaking up the code into two lines (which behaves as expected). Specifically, the single-line solution results in a dataframe with fewer rows. The goal of my anti-j…

Python calculated Timedelta 50 years in future, should be same day

pandas python timedelta timestamp

This is a follow up to Calculating new column value in dataframe based on next rows column value The solution in the previous question worked for a column holding hh:mm:ss values as a string. I tried applying (no pun intended) the same logic to calculate the 1 second difference on a column of pandas Timestamp…

Pandas deleting rows based on same sting in columns

data-cleaning dataframe pandas python

Hello i am using pandas DataFrame to clean this file and want to delete rows which contains the manufacturers name in the buy-box seller column. For example row 1 will be deleted because it contains the string ‘Goli’ in Buy-Box seller Column. Answer There are misisng values so first replace them b…

Autofill datetime in Pandas by previous increment

datetime pandas python time-series

Given previous datetime values in a Pandas DataFrame–either as an index or as values in a column–is there a way to “autofill” remaining time increments based on the previous fixed increments? For example, given: I would like to apply a function to yield: B 2013-01-01 09:00:00 0.0 2013-…

Pandas quantile function not returning the correct number of given quantiles

linear-interpolation numpy pandas python

I have a dataframe with over 2,000 records that has multiple columns with various balances. Based on the balance amount I want to assign it to a bucket. Trying to split each balance column into a quantile and have the following buckets 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9 Concretely, translating the balances i…