Tag: dataframe

How to use a loop to check multiple conditions on multiple columns to filter a dataframe in Python

conditional-statements dataframe for-loop python

I have a list containing names of the columns of a dataframe. The values for these columns are either ‘Yes’ or ‘No’. I want to filter out rows that’d have any of the columns having ‘Yes’. I want to use maybe a for loop to iterate through the list because the list is c…

Python dataframe columns from row&column header

dataframe python

I have a data frame df1 like this: I want to make another df with separate col for each df1 col. header-row name pair: Basically I have used the pandas.DataFrame.describe function and now I would like to transpose it this way. Answer You can unstack your DataFrame into a Series, flatten the Index, turn it bac…

ValueError: All arrays must be of the same length

dataframe json pandas python

can someone help with the error with converting a json file to a data frame pls I’m trying to convert the JSON text file to a data frame but get the array same length error. I have tried double [[]] around the ‘data’ but still doesn’t work. The text file is at https://stackoverflowtez.…

how to detect a braking process in python dataframe

dataframe math pandas python

I have some trips, and for each trip contains different steps, the data frame looks like following: I want to know if, on a trip X, the cyclist has braked (speed has decreased by at least 30%). The problem is that the duration between every two steps is each time different. For example, in 6 seconds, the spee…

PySpark sum all the values of Map column into a new column

apache-spark apache-spark-sql dataframe pyspark python

I have a dataframe which looks like this I want to sum of all the row wise decimal values and store into a new column My approach This is not working as it says, it can be applied only to int Answer Since, your values are of float type, the initial value passed within the aggregate should match the type

How to compare each date in a cell with all the dates in a column

dataframe pandas python python-3.x

I have a dataframe with three columns lets say I want to compare each date in Date column with all the other dates in the Date column and only keep those rows which lie within 6 months of atleast one of all the dates. Desired Output: I have tried a couple of approches such a nested loops, but I got

Pandas (How to Fix): List is actually string and the value of length is misleading

count dataframe pandas python python-3.x

I have a dataframe with a list of years in the first column. A second column shows the number of years listed in each row. Which made me think that the contents of each cell is a pure string. And it seems that way when I checked the type: When I convert the column to list using to_list(), it shows:

Pyspark find existing set of rows in a dataframe and replace it with values from another dataframe

dataframe pyspark python replace row

I have a Pyspark dataframe_Old (dfo) as below: Id neighbor_sid neighbor division a1 1100 Naalehu Hawaii a2 1101 key-west-fl Miami a3 1102 lubbock Texas a10 1202 bay-terraces California I have a Pyspark dataframe_new (dfn) as below: Id neighbor_sid neighbor division a1 1100 Naalehu Hawaii a2 1111 key-largo-fl …

how can i def function for new Dataframe with Cleaned data

dataframe pandas python

I have several dataframes where I need to reduce the dataframe to a time span for all of them. So that I don’t have to reduce the codeblock over and over again, I would like to write a function. Currently everything is realized without working by the following code: my approach: unfortunately this does …