Tag: pandas

How would I find the longest string per row in a data frame and print the row number if it exceeds a certain amount

I want to write a program which searches through a data frame and if any of the items in it are above 50 characters long, print the row number and ask if you want to continue through the data frame. I tried using this, but I don’t want to drop the rows, just print the row numbers where the strings

How to normalise a date columnin pandas dataframe to the same format

excel pandas python

I have a dataframe made from pulling in different excel sheets. I am trying to normalise the date_time column to just a standard DD/MM/YYY format. Is that possible? 1 DATE Column 3 Column 4 Column 5 Column 6 2 01/03/2021 00:00 3 01/03/2021 00:00 4 01/03/2021 00:00 5 01/03/2021 00:00 6 01/03/2021 00:00 … … 122350 11/24/2022 122351 11/24/2022 122352

Need to find and replace/correct list from another df column

pandas python

I have a list let suppose F = [Jonii, Max, anna, xyz, etc..] and df which contains 2 column- Name and Corrected_Name. df I need to search each string from list into df[Name] and replace it with df[Corrected_Name]. For eg. in above, code will search list in df[Name] and if found which is “Jonii” then replace it with “Jon” which

Pandas: replacing nan values conditionally within a group

dataframe pandas python

I have a dataframe with missing values. for each index in a column group, i want to replace these values seperately. If all of the values in a group are missing, i want to replace the values with 1. If only some of the values are missing, i want to replace it with data from an imputed dataframe dataframe 1

Insert $ in front of numbers in column using Pandas

numpy pandas python

I wish to add dollar symbol in front of all the values in my column. Data Desired Doing I believe I have to map this, not 100% sure. Any suggestion is appreciated. Answer You can also try

Is it possible access a list stored in a dataframe in a vectorized manner?

dataframe pandas python vectorization

Considering a dataframe like so: I want to create a new column ‘extracted_value’ which would be the value contained in the list at ‘indexes’ index (list = [0, 1, 2], indexes = 0 -> 0, indexes = 1 -> 1, and so on) Doing it with iterrows() is extremely slow as I work with dataframes containing multiple millions of lines.

Can I get a sub-DataFrame according to first letter in columns names?

dataframe pandas python

I want to get only columns whose names start with ‘Q1’ and those starting with ‘Q3’, I know that this is possible by doing: But since my real df is too large (more than 70 variables) I search a way to get the new_df by using only desired first letters in the columns titles. My example dataframe is: df has

How to sort MultiIndex using values from a given column

dataframe pandas python

I have a DataFrame with 2-level index and column with the numerical values. I want to sort it by level-0 and level-1 index in such a way that the the order of 0-level index is determined by the sum of values from Value column (descending), and the order of 1-level index is also determined by the values in Value column.

Why are python dates such a mess and what can I do about it?

date datetime numpy pandas python

A common source of errors in my Python codebase are dates. Specifically, the different implementations of dates and datetimes, and how comparisons are handled between them. These are the date types in my codebase You can print them to see: Is there a canonical date representation in Python? I suppose x7: datetime.date is probably closest… Also, note comparisons are a