In some circumstances the format (int, float, etc) of a cell is lost when accessing via its row. In that example the first column has integers and the second floats. But the 111 is converted into 111.0. The output I would expect is like this I have an idea why this happens. But IMHO this isn’t user frie…
Tag: pandas
Extract a value from a JSON string stored in a pandas data frame column
I have a pandas dataframe with a column named json2 which contains a json string coming from an API call: “{‘obj’: [{‘timestp’: ‘2022-12-03’, ‘followers’: 281475, ‘avg_likes_per_post’: 7557, ‘avg_comments_per_post’: 182, ‘avg_…
Change dates to quarters in JSON file Python
I’m trying to convert the dates inside a JSON file to their respective quarter and year. My JSON file is formatted below: The current code I’m using is an attempt of using the pandas.Series.dt.quarter as seen below: The issue I face is that my code isn’t comprehending the object name “…
Find if words from one sentence are found in corresponding row of another column also containing sentences (Pandas)
I have dataframe that looks like this: and I have this code that works as a solution but it takes forever on larger datasets and I know there has to be an easier way to solve it so just looking to see if anyone knows of a more concise/elegant way to do find a count of matching words between corresponding
How to use pd.apply() to instantiate new columns?
Instead of doing this: I want to do this in one line or function. Below is what I tried: But I just get Exception has occurred: ValueError. What can I do here? Answer Looks like you can replace your whole code with a reindex: NB. By default the fill value is NaN, if you really want None use fill_value=None. I…
How do I group into different dates based on change in another column values in Pandas
I have data that looks like this What I would like to do is group by ID and CD and get the start and stop change for each change. I tried using groupby and agg function but it will group all A together even though they needs to be separated since there is B in between 2 A. What I
Calculate average temperature/humidity between 2 dates pandas data frames
I have the following data frames: df3 Harvest_date Starting_date 2022-10-06 2022-08-06 2022-02-22 2021-12-22 df (I have all temp and humid starting from 2021-01-01 till the present) date temp humid 2022-10-06 00:30:00 2 30 2022-10-06 00:01:00 1 30 2022-10-06 00:01:30 0 30 2022-10-06 00:02:00 0 30 2022-10-06 0…
TypeError: TimeGrouper.__init__() got multiple values for argument ‘freq’
What am I doing wrong? This is all the code needed to reproduce. Result: Pandas version 1.5.1, Python version 3.10.6. Answer This seems to be a bug It looks like the weirdness is because Grouper.__new__() instantiates a TimeGrouper if you pass freq as a kwarg, but not if you pass freq as a positional argument…
Find all possible paths in a python graph data structure without using recursive function
I have a serious issue with finding all possible paths in my csv file that looks like this : Source Target Source_repo Target_repo SOURCE1 Target2 repo-1 repo-2 SOURCE5 Target3 repo-5 repo-3 SOURCE8 Target5 repo-8 repo-5 There a large amount of lines in the datasets, more than 5000 lines. I want to generate a…
Pandas lagged rolling average on aggregate data with multiple groups and missing dates
I’d like to calculate a lagged rolling average on a complicated time-series dataset. Consider the toy example as follows: This results in the following DataFrame: Now I’d like to add a column representing the average weight per fruit for the previous 7 days: wgt_per_frt_prev_7d. It should be defin…