I want to write a program which searches through a data frame and if any of the items in it are above 50 characters long, print the row number and ask if you want to continue through the data frame. I tried using this, but I don’t want to drop the rows, just print the row numbers where the strings
Tag: pandas
How to normalise a date columnin pandas dataframe to the same format
I have a dataframe made from pulling in different excel sheets. I am trying to normalise the date_time column to just a standard DD/MM/YYY format. Is that possible? 1 DATE Column 3 Column 4 Column 5 Column 6 2 01/03/2021 00:00 3 01/03/2021 00:00 4 01/03/2021 00:00 5 01/03/2021 00:00 6 01/03/2021 00:00 … … 122350 11/24/2022 122351 11/24/2022 122352
Need to find and replace/correct list from another df column
I have a list let suppose F = [Jonii, Max, anna, xyz, etc..] and df which contains 2 column- Name and Corrected_Name. df I need to search each string from list into df[Name] and replace it with df[Corrected_Name]. For eg. in above, code will search list in df[Name] and if found which is “Jonii” then replace it with “Jon” which
Pandas: replacing nan values conditionally within a group
I have a dataframe with missing values. for each index in a column group, i want to replace these values seperately. If all of the values in a group are missing, i want to replace the values with 1. If only some of the values are missing, i want to replace it with data from an imputed dataframe dataframe 1
Insert $ in front of numbers in column using Pandas
I wish to add dollar symbol in front of all the values in my column. Data Desired Doing I believe I have to map this, not 100% sure. Any suggestion is appreciated. Answer You can also try
Is it possible access a list stored in a dataframe in a vectorized manner?
Considering a dataframe like so: I want to create a new column ‘extracted_value’ which would be the value contained in the list at ‘indexes’ index (list = [0, 1, 2], indexes = 0 -> 0, indexes = 1 -> 1, and so on) Doing it with iterrows() is extremely slow as I work with dataframes containing multiple millions of lines.
groupby in pandas with custom function over a subset of rows in each group
I have a pandas DataFrame of the following format: Input: where (version, branch) is a MultiIndex. PROBLEM DESCRIPTION: I want to groupby version and set the values in the column X with branch overall to the sum of the values in the column X for the remaining branches (having the same version), weighted by the values in the column N.
Can I get a sub-DataFrame according to first letter in columns names?
I want to get only columns whose names start with ‘Q1’ and those starting with ‘Q3’, I know that this is possible by doing: But since my real df is too large (more than 70 variables) I search a way to get the new_df by using only desired first letters in the columns titles. My example dataframe is: df has
How to sort MultiIndex using values from a given column
I have a DataFrame with 2-level index and column with the numerical values. I want to sort it by level-0 and level-1 index in such a way that the the order of 0-level index is determined by the sum of values from Value column (descending), and the order of 1-level index is also determined by the values in Value column.
Why are python dates such a mess and what can I do about it?
A common source of errors in my Python codebase are dates. Specifically, the different implementations of dates and datetimes, and how comparisons are handled between them. These are the date types in my codebase You can print them to see: Is there a canonical date representation in Python? I suppose x7: datetime.date is probably closest… Also, note comparisons are a