Tag: dataframe

How to add header row to a pandas DataFrame

I am reading a csv file into pandas. This csv file consists of four columns and some rows, but does not have a header row, which I want to add. I have been trying the following: But when I apply the code, I get the following Error: What exactly does the error mean? And what would be a clean way

How to read a Parquet file into Pandas DataFrame?

blaze dataframe pandas parquet python

How to read a modestly sized Parquet data-set into an in-memory Pandas DataFrame without setting up a cluster computing infrastructure such as Hadoop or Spark? This is only a moderate amount of data that I would like to read in-memory with a simple Python script on a laptop. The data does not reside on HDFS. It is either on the

Pandas dataframe from nested dictionary

dataframe dictionary pandas python

My dictionary looks like this: I want to get a dataframe that looks like this: I tried calling pandas.from_dict(), but it did not give me the desired result. So, what is the most elegant, practical way to achieve this? EDIT: In reality, my dictionary is of depth 4, so I’d like to see a solution for that case, or ideally,

How to remove string value from column in pandas dataframe

dataframe lambda pandas python regex

I am trying to write some code that splits a string in a dataframe column at comma (so it becomes a list) and removes a certain string from that list if it is present. after removing the unwanted string I want to join the list elements again at comma. My dataframe looks like this: So basically my goal is to

How to add a constant column in a Spark DataFrame?

apache-spark apache-spark-sql dataframe pyspark python

I want to add a column in a DataFrame with some arbitrary value (that is the same for each row). I get an error when I use withColumn as follows: It seems that I can trick the function into working as I want by adding and subtracting one of the other columns (so they add to zero) and then adding

How to switch columns rows in a pandas dataframe

dataframe pandas python transpose

I have the following dataframe: I tried with pivot table but I get the following error: any alternative to pivot table to do this? Answer You can use df = df.T to transpose the dataframe. This switches the dataframe round so that the rows become columns. You could also use pd.DataFrame.transpose().

Remove name, dtype from pandas output of dataframe or series

dataframe output-formatting pandas python series

I have output file like this from a pandas function. I’m trying to get an output with just the second column, i.e., by deleting top and bottom rows, first column. How do I do that? Answer You want just the .values attribute: You can convert to a list or access each value:

Shuffle DataFrame rows

dataframe pandas permutation python shuffle

I have the following DataFrame: The DataFrame is read from a CSV file. All rows which have Type 1 are on top, followed by the rows with Type 2, followed by the rows with Type 3, etc. I would like to shuffle the order of the DataFrame’s rows so that all Type’s are mixed. A possible result could be: How

Python Pandas replace NaN in one column with value from corresponding row of second column

dataframe fillna nan pandas python

I am working with this Pandas DataFrame in Python. I need to replace all NaNs in the Temp_Rating column with the value from the Farheit column. This is what I need: If I do a Boolean selection, I can pick out only one of these columns at a time. The problem is if I then try to join them, I