Skip to content
Advertisement

How to add multiple columns to a dataframe based on calculations

I have a csv dataset (with > 8m rows) that I load into a dataframe. The csv has columns like:

JavaScript

I am able to load the dataset into my dataframe, but then I need to add multiple calculated columns to the dataframe for each row. In otherwords, unlike this SO question, I do not want the rows of the new columns to have the same initial value (col 1 all NAN, col 2, all “dogs”, etc.).

Right now, I can add my columns by doing something like:

JavaScript

But it seems inefficient since the entire dataset is processed N times (once for each call).

It seems that I should be able to calculate all of the new columns in a single go, but I am missing some conceptual approach.

Examples:

JavaScript

Update 1

Following on the information for MoRe, I was able to get the essential working. I needed to augment by adding the column names, and then with the merge to specify the index.

JavaScript

Advertisement

Answer

JavaScript

if i understood your mean correctly, it’s your answer. but before everything please use Swifter pip :) first create a series by lists and convert it to columns…

swifter is a simple library (at least i think it is simple) that only has only one useful method: apply

JavaScript

it use parallel manner to improve speed in large datasets… in small ones, it isn’t good and even is worse

https://pypi.org/project/swifter/

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement