Tag: dataframe

Select rows of a data frame in a loop based on list of indexes

I have one data frame & two (or multiple) lists of indexes: I want to create a loop where I can select the rows of data. for each iteration, I use one list. so for 1st iteration data has the rows shown in idx1 0,2,4. how I can do that ? This is a simplified example, in my actual code,

Is there a way to have SQLAlchemy NOT change 1 to True and 0 to False for BIT columns?

csv dataframe dtype python sqlalchemy

I am using SQLAlchemy to read data from a SQL Server database and then turning the table data to a csv file to later hand off. However, I noticed when there is a 1 or 0 in a SQL Server table field, the csv output has True or False instead. I know to Python, it’s still a number since True

Get value from Spark dataframe when rows are dictionaries

apache-spark data-extraction dataframe pyspark python

I have a PySpark dataframe that looks like this: Values Column {[0.0, 54.04, 48…. Sector A {[0.0, 55.4800000… Sector A If I show the first element of the column ‘Values’ without truncating the data, it looks like this: {[0.0, 54.04, 48.19, 68.59, 61.81, 54.730000000000004, 48.51, 57.03…

Dealing with huge pandas data frames

dask dask-dataframe dataframe pandas python

I have a huge database (of 500GB or so) an was able to put it in pandas. The databasse contains something like 39705210 observations. As you can imagine, python has hard times even opening it. Now, I am trying to use Dask in order to export it to cdv into 20 partitions like this: However when I am trying to

Groupby column and create lists for other columns, preserving order

apache-spark apache-spark-sql dataframe pyspark python

I have a PySpark dataframe which looks like this: I want to group by or partition by ID column and then the lists for col1 and col2 should be created based on the order of timestamp. My approach: But this is not returning list of col1 and col2. Answer I don’t think the order can be reliably preserved us…

Transform python dictionaries with keys and corresponding lists to pandas dataframe

dataframe dictionary pandas python

I am trying to transform multiple dictionaries with keys and corresponding lists to a pandas dataframe and can’t get to the right way of transforming them. For the pandas data frame, the keys are the index column and the lists How can I transform python dictionaries with keys and corresponding lists (in…

How to divide in Panda Python

dataframe division pandas python

I generated the following code: In the second line of the code where I try to divide Second Dose by First Dose, I do not get the right results. Below an example of the output I get: Instead of getting 527.85 for % Partially Vaccinated I should get 5606041/5870786 = 0.95. Anyone knows what am I doing wrong in …

Format pandas dataframe output into a text file as a table (formatted and aligned to the max length of the data or header (which ever is longer))

dataframe pandas python

I have the above data frame and would like to save the output in a file as a pipe delimited data like below. So far I have tried pd.to_csv and pd.to_string(), both outputs the data in tabular format however, the data is not aligning to the max length of the column header or the data. to_string() to_csv() Answ…

How to cross-reference data in Pandas dataframes?

dataframe pandas python

I’m working with data that has two separate IDs per item. When we pull data from most sources, we get a PLU/SKU—however, in one of our sources, we get an item number from our on-prem point-of-sale system. To solve this by hand, we have a master list that contains both the PLU and item number for each it…

How can I merge aggregate two dataframes in Pandas while subtracting column values?

dataframe pandas python

I’m working on a rudimentary inventory system and am having trouble finding a solution to this obstacle. I’ve got two Pandas dataframes, both sharing two columns: PLU and QTY. PLU acts as an item identifier, and QTY is the quantity of the item in one dataframe, while being the quantity sold in ano…