Tag: pandas

Losing cell formats when accessing rows

In some circumstances the format (int, float, etc) of a cell is lost when accessing via its row. In that example the first column has integers and the second floats. But the 111 is converted into 111.0. The output I would expect is like this I have an idea why this happens. But IMHO this isn’t user friendly. Can I

Extract a value from a JSON string stored in a pandas data frame column

dictionary json pandas python

I have a pandas dataframe with a column named json2 which contains a json string coming from an API call: “{‘obj’: [{‘timestp’: ‘2022-12-03’, ‘followers’: 281475, ‘avg_likes_per_post’: 7557, ‘avg_comments_per_post’: 182, ‘avg_views_per_post’: 57148, ‘engagement_rate’: 2.6848}, {‘timestp’: ‘2022-12-02’, ‘followers’: 281475, ‘avg_likes_per_post’: 7557, ‘avg_comments_per_post’: 182, ‘avg_views_per_post’: 57148, ‘engagement_rate’: 2.6848}]}” I want to make a function that iterates over the column and extracts the number

Change dates to quarters in JSON file Python

dataframe json pandas python

I’m trying to convert the dates inside a JSON file to their respective quarter and year. My JSON file is formatted below: The current code I’m using is an attempt of using the pandas.Series.dt.quarter as seen below: The issue I face is that my code isn’t comprehending the object name “lastDate”. My ideal output should have the dates ultimately replaced

How to use pd.apply() to instantiate new columns?

dataframe pandas python

Instead of doing this: I want to do this in one line or function. Below is what I tried: But I just get Exception has occurred: ValueError. What can I do here? Answer Looks like you can replace your whole code with a reindex: NB. By default the fill value is NaN, if you really want None use fill_value=None. If

How do I group into different dates based on change in another column values in Pandas

pandas python python-3.x

I have data that looks like this What I would like to do is group by ID and CD and get the start and stop change for each change. I tried using groupby and agg function but it will group all A together even though they needs to be separated since there is B in between 2 A. What I

Calculate average temperature/humidity between 2 dates pandas data frames

pandas python

I have the following data frames: df3 Harvest_date Starting_date 2022-10-06 2022-08-06 2022-02-22 2021-12-22 df (I have all temp and humid starting from 2021-01-01 till the present) date temp humid 2022-10-06 00:30:00 2 30 2022-10-06 00:01:00 1 30 2022-10-06 00:01:30 0 30 2022-10-06 00:02:00 0 30 2022-10-06 00:02:30 -2 30 I would like to calculate the avg temperature and humidity between

TypeError: TimeGrouper.init() got multiple values for argument ‘freq’

pandas python typeerror

What am I doing wrong? This is all the code needed to reproduce. Result: Pandas version 1.5.1, Python version 3.10.6. Answer This seems to be a bug It looks like the weirdness is because Grouper.__new__() instantiates a TimeGrouper if you pass freq as a kwarg, but not if you pass freq as a positional argument. I don’t know why it

Find all possible paths in a python graph data structure without using recursive function

dataframe graph-theory pandas python recursion

I have a serious issue with finding all possible paths in my csv file that looks like this : Source Target Source_repo Target_repo SOURCE1 Target2 repo-1 repo-2 SOURCE5 Target3 repo-5 repo-3 SOURCE8 Target5 repo-8 repo-5 There a large amount of lines in the datasets, more than 5000 lines. I want to generate all possible paths like this in and return

Pandas lagged rolling average on aggregate data with multiple groups and missing dates

dataframe pandas python rolling-computation time-series

I’d like to calculate a lagged rolling average on a complicated time-series dataset. Consider the toy example as follows: This results in the following DataFrame: Now I’d like to add a column representing the average weight per fruit for the previous 7 days: wgt_per_frt_prev_7d. It should be defined as the sum of all the fruit weights divided by the sum