Skip to content
Advertisement

Tag: dataframe

How to add header row to a pandas DataFrame

I am reading a csv file into pandas. This csv file consists of four columns and some rows, but does not have a header row, which I want to add. I have been trying the following: But when I apply the code, I get the following Error: What exactly does the error mean? And what would be a clean way

How to read a Parquet file into Pandas DataFrame?

How to read a modestly sized Parquet data-set into an in-memory Pandas DataFrame without setting up a cluster computing infrastructure such as Hadoop or Spark? This is only a moderate amount of data that I would like to read in-memory with a simple Python script on a laptop. The data does not reside on HDFS. It is either on the

Pandas dataframe from nested dictionary

My dictionary looks like this: I want to get a dataframe that looks like this: I tried calling pandas.from_dict(), but it did not give me the desired result. So, what is the most elegant, practical way to achieve this? EDIT: In reality, my dictionary is of depth 4, so I’d like to see a solution for that case, or ideally,

Iterating through pandas groupby groups

I have a pandas dataframe school_df that looks like this: Each row represents one project by that school. I’d like to add two columns: for each unique school_id, a count of how many projects were posted before that date and a count of how many projects were completed before that date. The code below works, but I have ~300,000 unique

Shuffle DataFrame rows

I have the following DataFrame: The DataFrame is read from a CSV file. All rows which have Type 1 are on top, followed by the rows with Type 2, followed by the rows with Type 3, etc. I would like to shuffle the order of the DataFrame’s rows so that all Type’s are mixed. A possible result could be: How

Advertisement