Skip to content
Advertisement

Tag: pandas

Losing cell formats when accessing rows

In some circumstances the format (int, float, etc) of a cell is lost when accessing via its row. In that example the first column has integers and the second floats. But the 111 is converted into 111.0. The output I would expect is like this I have an idea why this happens. But IMHO this isn’t user friendly. Can I

Extract a value from a JSON string stored in a pandas data frame column

I have a pandas dataframe with a column named json2 which contains a json string coming from an API call: “{‘obj’: [{‘timestp’: ‘2022-12-03’, ‘followers’: 281475, ‘avg_likes_per_post’: 7557, ‘avg_comments_per_post’: 182, ‘avg_views_per_post’: 57148, ‘engagement_rate’: 2.6848}, {‘timestp’: ‘2022-12-02’, ‘followers’: 281475, ‘avg_likes_per_post’: 7557, ‘avg_comments_per_post’: 182, ‘avg_views_per_post’: 57148, ‘engagement_rate’: 2.6848}]}” I want to make a function that iterates over the column and extracts the number

Change dates to quarters in JSON file Python

I’m trying to convert the dates inside a JSON file to their respective quarter and year. My JSON file is formatted below: The current code I’m using is an attempt of using the pandas.Series.dt.quarter as seen below: The issue I face is that my code isn’t comprehending the object name “lastDate”. My ideal output should have the dates ultimately replaced

How to use pd.apply() to instantiate new columns?

Instead of doing this: I want to do this in one line or function. Below is what I tried: But I just get Exception has occurred: ValueError. What can I do here? Answer Looks like you can replace your whole code with a reindex: NB. By default the fill value is NaN, if you really want None use fill_value=None. If

Calculate average temperature/humidity between 2 dates pandas data frames

I have the following data frames: df3 Harvest_date Starting_date 2022-10-06 2022-08-06 2022-02-22 2021-12-22 df (I have all temp and humid starting from 2021-01-01 till the present) date temp humid 2022-10-06 00:30:00 2 30 2022-10-06 00:01:00 1 30 2022-10-06 00:01:30 0 30 2022-10-06 00:02:00 0 30 2022-10-06 00:02:30 -2 30 I would like to calculate the avg temperature and humidity between

Pandas lagged rolling average on aggregate data with multiple groups and missing dates

I’d like to calculate a lagged rolling average on a complicated time-series dataset. Consider the toy example as follows: This results in the following DataFrame: Now I’d like to add a column representing the average weight per fruit for the previous 7 days: wgt_per_frt_prev_7d. It should be defined as the sum of all the fruit weights divided by the sum

Advertisement