Skip to content
Advertisement

Tag: dataframe

expand row based on integer in column and split into number of months between dates

I have the following dataframe: id date_start date_end reporting_month reporting_month_number months_length 1 2022-03-31 23:56:22 2022-05-01 23:56:22 2022-03 1 3 2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-03 1 4 3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-03 1 6 I would like to split each id row so I can have a row for each of the months_length, starting on the date of reporting_month,

Python dataframe columns from row&column header

I have a data frame df1 like this: I want to make another df with separate col for each df1 col. header-row name pair: Basically I have used the pandas.DataFrame.describe function and now I would like to transpose it this way. Answer You can unstack your DataFrame into a Series, flatten the Index, turn it back into a DataFrame and

ValueError: All arrays must be of the same length

can someone help with the error with converting a json file to a data frame pls I’m trying to convert the JSON text file to a data frame but get the array same length error. I have tried double [[]] around the ‘data’ but still doesn’t work. The text file is at https://stackoverflowtez.filecloudonline.com/ui/core/index.html?mode=single&path=/SHARED/%212CkRNC5x55IO6kGJjEwTViZ4mGmwG/9aINFGD2QxaELHFL&shareto=#expl-tabl. A portion of JSON pasted below, the

how to detect a braking process in python dataframe

I have some trips, and for each trip contains different steps, the data frame looks like following: I want to know if, on a trip X, the cyclist has braked (speed has decreased by at least 30%). The problem is that the duration between every two steps is each time different. For example, in 6 seconds, the speed of a

Pyspark find existing set of rows in a dataframe and replace it with values from another dataframe

I have a Pyspark dataframe_Old (dfo) as below: Id neighbor_sid neighbor division a1 1100 Naalehu Hawaii a2 1101 key-west-fl Miami a3 1102 lubbock Texas a10 1202 bay-terraces California I have a Pyspark dataframe_new (dfn) as below: Id neighbor_sid neighbor division a1 1100 Naalehu Hawaii a2 1111 key-largo-fl Miami a3 1103 grapevine Texas a4 1115 meriden-ct Connecticut a12 2002 east-louisville Kentucky

how can i def function for new Dataframe with Cleaned data

I have several dataframes where I need to reduce the dataframe to a time span for all of them. So that I don’t have to reduce the codeblock over and over again, I would like to write a function. Currently everything is realized without working by the following code: my approach: unfortunately this does not work Answer There are a

Advertisement