In some circumstances the format (int, float, etc) of a cell is lost when accessing via its row. In that example the first column has integers and the second floats. But the 111 is converted into 111.0. The output I would expect is like this I have an idea why this happens. But IMHO this isn’t user friendly. Can I
Tag: pandas
Extract a value from a JSON string stored in a pandas data frame column
I have a pandas dataframe with a column named json2 which contains a json string coming from an API call: “{‘obj’: [{‘timestp’: ‘2022-12-03’, ‘followers’: 281475, ‘avg_likes_per_post’: 7557, ‘avg_comments_per_post’: 182, ‘avg_views_per_post’: 57148, ‘engagement_rate’: 2.6848}, {‘timestp’: ‘2022-12-02’, ‘followers’: 281475, ‘avg_likes_per_post’: 7557, ‘avg_comments_per_post’: 182, ‘avg_views_per_post’: 57148, ‘engagement_rate’: 2.6848}]}” I want to make a function that iterates over the column and extracts the number
Change dates to quarters in JSON file Python
I’m trying to convert the dates inside a JSON file to their respective quarter and year. My JSON file is formatted below: The current code I’m using is an attempt of using the pandas.Series.dt.quarter as seen below: The issue I face is that my code isn’t comprehending the object name “lastDate”. My ideal output should have the dates ultimately replaced
Find if words from one sentence are found in corresponding row of another column also containing sentences (Pandas)
I have dataframe that looks like this: and I have this code that works as a solution but it takes forever on larger datasets and I know there has to be an easier way to solve it so just looking to see if anyone knows of a more concise/elegant way to do find a count of matching words between corresponding
How to use pd.apply() to instantiate new columns?
Instead of doing this: I want to do this in one line or function. Below is what I tried: But I just get Exception has occurred: ValueError. What can I do here? Answer Looks like you can replace your whole code with a reindex: NB. By default the fill value is NaN, if you really want None use fill_value=None. If
How do I group into different dates based on change in another column values in Pandas
I have data that looks like this What I would like to do is group by ID and CD and get the start and stop change for each change. I tried using groupby and agg function but it will group all A together even though they needs to be separated since there is B in between 2 A. What I
Calculate average temperature/humidity between 2 dates pandas data frames
I have the following data frames: df3 Harvest_date Starting_date 2022-10-06 2022-08-06 2022-02-22 2021-12-22 df (I have all temp and humid starting from 2021-01-01 till the present) date temp humid 2022-10-06 00:30:00 2 30 2022-10-06 00:01:00 1 30 2022-10-06 00:01:30 0 30 2022-10-06 00:02:00 0 30 2022-10-06 00:02:30 -2 30 I would like to calculate the avg temperature and humidity between
TypeError: TimeGrouper.__init__() got multiple values for argument ‘freq’
What am I doing wrong? This is all the code needed to reproduce. Result: Pandas version 1.5.1, Python version 3.10.6. Answer This seems to be a bug It looks like the weirdness is because Grouper.__new__() instantiates a TimeGrouper if you pass freq as a kwarg, but not if you pass freq as a positional argument. I don’t know why it
Find all possible paths in a python graph data structure without using recursive function
I have a serious issue with finding all possible paths in my csv file that looks like this : Source Target Source_repo Target_repo SOURCE1 Target2 repo-1 repo-2 SOURCE5 Target3 repo-5 repo-3 SOURCE8 Target5 repo-8 repo-5 There a large amount of lines in the datasets, more than 5000 lines. I want to generate all possible paths like this in and return
Pandas lagged rolling average on aggregate data with multiple groups and missing dates
I’d like to calculate a lagged rolling average on a complicated time-series dataset. Consider the toy example as follows: This results in the following DataFrame: Now I’d like to add a column representing the average weight per fruit for the previous 7 days: wgt_per_frt_prev_7d. It should be defined as the sum of all the fruit weights divided by the sum