Tag: dataframe

How do I save the data that has been randomly undersampled?

I am trying to balance a data frame by using random undersampling of the majority class. It has been successful, however, I also want to save the data that has been removed from the data frame (undersampled) to a new data frame. How do I accomplish this? This is the code that I am using to undersample the dat…

How to clean data so that the correct arrival code is there for the city pair?

dataframe pandas python sql

How to clean data so that the correct arrival code is there for the city pair? From the picture, the CSV is like column 1: City Pair (Departure – Arrival), column 2 is meant to be the Departure Code, and column 3 is meant to be the Arrival Code. As you can see for row 319 in the first column,

Appending new value to the dataframe

append dataframe pandas python

Above code prints same value twice i.e. Why is it not appending NSEI at the end of the stocksList dataframe? Full code: Answer how your code is flawed Relying on the length of the index on a dataframe with a reworked index is not reliable. Here is a simple example demonstrating how it can fail. input: Pre-pro…

How to return an empty value or None on pandas dataframe?

dataframe pandas python textacy

SAMPLE DATA: https://docs.google.com/spreadsheets/d/1s6MzBu5lFcc-uUZ9B6CI1YR7P1fDSm4cByFwKt3ckgc/edit?usp=sharing I have this function that uses textacy to extract the source attribution. This automatically returns the speaker, cue and content of the quotes. In my dataset, some paragraphs have several quotati…

Iterate over column values matched value based on another column pandas dataframe

dataframe pandas python

This is a followup to extract column value based on another column pandas dataframe I have more than one row that matches the column value and want to know how to iterate to efficiently retrieve each value when there are multiple matches. Dataframe is The below will always pick p3 So I tried to iterate like A…

Summing duplicates rows

dataframe duplicates pandas python

I have a database with more than 300 duplicates that look like this: I want that for each duplicate shipment_id only original_cost gets added together and rates remain as they are. like for these duplicates: it should look something like this: is there any way to do this? Answer Group by the duplicate values …

Pivot and merge two pandas dataframes

dataframe pandas python

I have two dataframes (taken from pd.to_clipboard(), suggest using pd.read_clipboard()) df_a: and df_b: What I am looking to do is add a third column to df_a, say ThirdVal, which contains the value in df_b where the DateField and Team align. My issue is that df_b is transposed and formatted awry compared to d…

Pandas DataFrame Dividing a column by itself taking first element and divide all the rows and so on

dataframe pandas python

I have a DataFrame from Pandas: df1: Now I want to iterate over the rows. For every row, divided by the first elements of the same column and then iterate elements. Taking all the rows one by one and divided by the first element as standard in the denominator and all rows with second elements and so on. For e…

Converting pandas dataframe to PySpark dataframe drops index

apache-spark dataframe pandas pyspark python

I’ve got a pandas dataframe called data_clean. It looks like this: I want to convert it to a Spark dataframe, so I use the createDataFrame() method: sparkDF = spark.createDataFrame(data_clean) However, that seems to drop the index column (the one that has the names ali, anthony, bill, etc) from the orig…

Fastest way to append a row to an existing data frame?

dataframe pandas python python-3.x

I know this question has been asked many a time, but none of the solutions already posted on this site is ideal. I have tested various methods found here, and timed them using IPython, I will post the results below: songs is a DataFrame with 4464 rows (initially) and 15 columns. I am fully aware DataFrame ind…