Skip to content

Tag: dataframe

Group by and find top n value_counts pandas

I have a dataframe of taxi data with two columns that looks like this: Basically, each row represents a taxi pickup in that neighborhood in that borough. Now, I want to find the top 5 neighborhoods in each borough with the most number of pickups. I tried this: Which gives me something like this: How do I filter it so

Converting pandas.DataFrame to bytes

I need convert the data stored in a pandas.DataFrame into a byte string where each column can have a separate data type (integer or floating point). Here is a simple set of data: and df looks something like this: The DataFrame knows about the types of each column df.dtypes so I’d like to do something like this: This typically works

How to read a Parquet file into Pandas DataFrame?

How to read a modestly sized Parquet data-set into an in-memory Pandas DataFrame without setting up a cluster computing infrastructure such as Hadoop or Spark? This is only a moderate amount of data that I would like to read in-memory with a simple Python script on a laptop. The data does not reside on HDFS. It is either on the

Iterating through pandas groupby groups

I have a pandas dataframe school_df that looks like this: Each row represents one project by that school. I’d like to add two columns: for each unique school_id, a count of how many projects were posted before that date and a count of how many projects were completed before that date. The code below works, but I have ~300,000 unique

how to switch columns rows in a pandas dataframe

I have the following dataframe: I tried with pivot table but I get the following error: any alternative to pivot table to do this? AdvertisementAnswer You can use df = df.T to transpose the dataframe. This switches the dataframe round so that the rows become columns. You could also use pd.DataFrame.transpose().

Shuffle DataFrame rows

I have the following DataFrame: The DataFrame is read from a CSV file. All rows which have Type 1 are on top, followed by the rows with Type 2, followed by the rows with Type 3, etc. I would like to shuffle the order of the DataFrame’s rows so that all Type’s are mixed. A possible result could be: How

How to loop over grouped Pandas dataframe?

DataFrame: Code: I’m trying to just loop over the aggregated data, but I get the error: ValueError: too many values to unpack @EdChum, here’s the expected output: The output is not the problem, I wish to loop over every group. Answer df.groupby(‘l_customer_id_i’).agg(lambda x: ‘,’.join(x)) does already return a dataframe, so you cannot loop over the groups anymore. In general: df.groupby(…)