I have the below dataframe which comes from a JSON trying to format ready for db insertion, i am splitting using .tolist() but getting error for None entries. tried fillna and replace to insert a dummy list i.e. [0,0,0] but will only let me replace with a string. Any suggestions welcome. this works #df_split_…
Tag: dataframe
compare 2 string columns in data frame and add 1 to key if different
I’m having a hard time trying to figure this out, I have a data frame with multiple columns after merging 2. I have 2 list: I need to compare each group of variables and if they are different add 1 to the key column I was trying to use something like this: Answer I would avoid loops. Is there any
Python: How to filter a Pandas DataFrame using Values from a Series?
Context I am currently processing some data and encountered a problem. I would like to filter a Pandas DataFrame using Values from a Series. However, this always throws the following Error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Code Question…
Pandas split list upon DataFrame creation
I have a JSON file coming in, which I am doing some operations/trimming on. The result looks like this: When applying df = pd.DataFrame(user, index=[0]) I get the following Dataframe: When applying df = pd.DataFrame(user) I get: I am aware, as to why that happens, however none is what I want. I’d like t…
What is pandas equivalent of the following SQL?
OK, I have a dataframe that looks like the following: In SQL, to filter unique segments (segment_id) by travelmode I will do: What is the pandas equivalent of this expression? Answer Maybe: as suggested in this post.
Pandas get difference from first row at a set dtime with groupby
If I have a dataframe with [Group], [DTime] and [Value] columns For each [Group] I’m trying to find the difference between the first [Value] and every subsequent value from a set [DTime], for this example say it’s the start of the df at 2015-01-01. Ultimately I would like to plot a timeseries of […
Add averages to existing plot with pandas.DataFrame
I have a pandas data-frame of the form and I want to plot the last 7 days together with the average over the weekdays. I can create / plot the average by using and I can create / plot the last 7 days by using but I fail to combine them to a single plot since the average uses weekday
Pandas Dataframe: Retrieve the Maximum Value in a Pandas Dataframe using .groupby and .idxmax()
I have a Pandas Dataframe that contains a series of Airbnb Prices grouped by neighbourhood group neighbourhood and room_type. My objective is to return the Maximum Average Price for each room_type per Neighbourhood and return only this. My approach to this was to use .groupby and .idxmax() to get the maximum …
Not able to perform operations on resulting dataframe after “join” operation in PySpark
Here I have created three dataframes: df,rule_df and query_df. I’ve performed inner join on rule_df and query_df, and stored the resulting dataframe in join_df. However, when I try to simply print the columns of the join_df dataframe, I get the following error- The resultant dataframe is not behaving as…
Is there any function to get multiple timeseries with .get and create a dataframe in Pandas?
I get multiple time series data in series format with datetimeindex, which I want to resample and convert to a dataframe with multiple columns with each column representing each time series. I am using separate functions to create the dataframe, for example, .get(), .resample(), pd.concat(). Since it is not f…