Tag: dataframe

Replacing None with a list within a dataframe

I have the below dataframe which comes from a JSON trying to format ready for db insertion, i am splitting using .tolist() but getting error for None entries. tried fillna and replace to insert a dummy list i.e. [0,0,0] but will only let me replace with a string. Any suggestions welcome. this works #df_split_…

compare 2 string columns in data frame and add 1 to key if different

compare dataframe pandas python

I’m having a hard time trying to figure this out, I have a data frame with multiple columns after merging 2. I have 2 list: I need to compare each group of variables and if they are different add 1 to the key column I was trying to use something like this: Answer I would avoid loops. Is there any

Python: How to filter a Pandas DataFrame using Values from a Series?

data-science dataframe pandas python series

Context I am currently processing some data and encountered a problem. I would like to filter a Pandas DataFrame using Values from a Series. However, this always throws the following Error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Code Question…

Pandas split list upon DataFrame creation

dataframe json pandas python

I have a JSON file coming in, which I am doing some operations/trimming on. The result looks like this: When applying df = pd.DataFrame(user, index=[0]) I get the following Dataframe: When applying df = pd.DataFrame(user) I get: I am aware, as to why that happens, however none is what I want. I’d like t…

What is pandas equivalent of the following SQL?

dataframe pandas python sql

OK, I have a dataframe that looks like the following: In SQL, to filter unique segments (segment_id) by travelmode I will do: What is the pandas equivalent of this expression? Answer Maybe: as suggested in this post.

Add averages to existing plot with pandas.DataFrame

dataframe pandas python time-series

I have a pandas data-frame of the form and I want to plot the last 7 days together with the average over the weekdays. I can create / plot the average by using and I can create / plot the last 7 days by using but I fail to combine them to a single plot since the average uses weekday

Pandas Dataframe: Retrieve the Maximum Value in a Pandas Dataframe using .groupby and .idxmax()

dataframe pandas pandas-groupby python

I have a Pandas Dataframe that contains a series of Airbnb Prices grouped by neighbourhood group neighbourhood and room_type. My objective is to return the Maximum Average Price for each room_type per Neighbourhood and return only this. My approach to this was to use .groupby and .idxmax() to get the maximum …

Not able to perform operations on resulting dataframe after “join” operation in PySpark

apache-spark-sql data-profiling dataframe pyspark python

Here I have created three dataframes: df,rule_df and query_df. I’ve performed inner join on rule_df and query_df, and stored the resulting dataframe in join_df. However, when I try to simply print the columns of the join_df dataframe, I get the following error- The resultant dataframe is not behaving as…

Is there any function to get multiple timeseries with .get and create a dataframe in Pandas?

azure-timeseries-insights dataframe pandas python time-series

I get multiple time series data in series format with datetimeindex, which I want to resample and convert to a dataframe with multiple columns with each column representing each time series. I am using separate functions to create the dataframe, for example, .get(), .resample(), pd.concat(). Since it is not f…