Skip to content

Tag: dataframe

Dealing with huge pandas data frames

I have a huge database (of 500GB or so) an was able to put it in pandas. The databasse contains something like 39705210 observations. As you can imagine, python has hard times even opening it. Now, I am trying to use Dask in order to export it to cdv into 20 partitions like this: However when I am trying to

How to divide in Panda Python

I generated the following code: In the second line of the code where I try to divide Second Dose by First Dose, I do not get the right results. Below an example of the output I get: Instead of getting 527.85 for % Partially Vaccinated I should get 5606041/5870786 = 0.95. Anyone knows what am I doing wrong in …

How to cross-reference data in Pandas dataframes?

I’m working with data that has two separate IDs per item. When we pull data from most sources, we get a PLU/SKU—however, in one of our sources, we get an item number from our on-prem point-of-sale system. To solve this by hand, we have a master list that contains both the PLU and item number for each it…