class_id class code id 8 XYZ A 1 8 XYZ B 2 9 ABC C 3 I have a dataframe like above. I want to transform it so the ‘codes’ column below collects all the unique (code, id) pairs into a JSON format that a class contains. class_id class codes 8 XYZ [{‘code: ‘A’, ‘id’: 1},…
Tag: pandas
Concat multiple dataframe and manage those that doesn’t exist
I try to concat some dataframe – 30 dataframe of 24h data – that been created automatically with some csv, but sometimes csv doesn’t exist, so the dataframe wasn’t created (df1, fd2, df4,df8,df9,…). And so I want to create weekly dataframe with 7 concatenated df, but the function…
Lag of values from 1 month ago
My initial dataset has only 2 columns, date and value. What I’m trying to do is, for each date, get the value from the previous month (columns m-1 and m-12). The problems I’m having is when the day doesn’t exist in previous month, like 29 of February, that I want to leave it empty, and most …
Extracting feature names from sklearn column transformer
I’m using sklearn.pipeline to transform my features and fit a model, so my general flow looks like this: column transformer –> general pipeline –> model. I would like to be able to extract feature names from the column transformer (since the following step, general pipeline applies the…
Distance Matrix Haversine
I am working on a data frame that looks like this : I’m trying to make a Haverisne distance matrix. Basically for each zone, I would like to calculate the distance between it and all the others in the dataframe. So there should be only 0s on the diagonal. Here is the Haversine function that I use but I …
Selecting first row from each subgroup (pandas)
How to select the subset of rows where distance is lowest, grouping by date and p columns? Ideally, the returned dataframe should contain: Answer One way is to use groupby + idxmin to get the index of the smallest distance per group, then use loc to get the desired output: Output:
Parse pandas series with a list of dicts into new columns
I have a pandas series containing a list of dictionaries. I’d like to parse the contents of the dicts with some condition and store the results into new columns. Here’s some data to work with: I’d like to parse the contents of each dictionary with some conditional logic. Check for each dicts…
Looping through Pandas Dataframe Columns to count values
I have a column with 14000+ rows and there is only two numbers in this column. When I try this I get an equal 12/12 return for each counter. That is definitely not correct. How do I loop through and count the Yes and Nos? Answer You can do it like this :
How do I loop column names in a pandas dataframe?
I am new to Python and have never really used Pandas, so forgive me if this doesn’t make sense. I am trying to create a df based on frontend data I am sending to a flask route. The data is looped through and appended for each row. My only problem is that I don’t know how to get the df
Filter dataframe based on 2 columns
I have a big dataframe city Flow Berlin False Berlin True Vienna False Vienna True Vienna False Frankfurt True Frankfurt False I want to remove only the rows where city and flow is Vienna and false using python Resulting dataframe should be city Flow Berlin False Berlin True Vienna True Frankfurt True Frankfu…