I have a DataFrame variable called data with ~ 6 mil rows and I’d like to process it 50 rows at a time. I have the following code: However it seems the slices obtained are not 50 rows in length. In fact they seem to be random (although every time I re-run the program they stay consistent. The first one
Tag: pandas
How to add rows with identical items in different columns in Pandas together
I have a sample dataframe that looks like below. I’d like to eventually group row 1 and row 3 together, since they contain identical items in different columns. I’ve spent a lot of time trying to solve this, but have not encountered a good solution yet. What steps should I take to reach the below …
Reading arrays from .txt file as numbers instead of strings
I’m using an automatic data acquisition software that exports the data as .txt files. I then imported the file into python (using the pandas package and turning the columns into arrays) but I’m facing a problem. Python can’t “read” the data because the automatic data acquisition …
How to color dataframe based on each group?
I have a dataframe as below I want to color the dataframe for each Variable a different color for all columns. In the above figure, the selected variables should have one color, the below variable type should in another color etc., How can I color the dataframe in different colors for each Variable groups? An…
How to calculate 12 month rolling sum based on groupby?
I am trying to calculate the 12 month rolling sum for the number of orders and revenue based on a person’s name using Python for the following dataframe: In order to give the following output: The rolling sum should add up all the totals in the past 12 months grouped by the name column. I have tried the…
Get the index of a datatime in a dataframe pandas
for row in data[‘column_name’]: if row.date() < datetime.today().date(): print(row,????) I was not able to get the index of a datatime of a specific column in a dataframe. I used this code and i want to return with the row the index of the row. Could anyone help me please? Answer You can try wi…
Combine two tables based on certain criteria using python
I have two tables (table1, table2) of the following: table1: ID Filename 12345 12345.txt 12346 12346.txt 12347 12347.txt 12348 12348.txt 12349 12349.txt 12350 12350.txt table2: contains the path where table 1 files are present Path /table/text3/12349.txt /table/text1/12345.txt /table/text2/12346.txt /table/te…
How to optimize time while converting list to dataframe?(Part II)
I didn’t get any proper answers to my previous question: How to optimize time while converting list to dataframe? Let me explain the example more: Let’s consider the data frame more precisely as I want the output dataframe when converted to csv as The character PH,AG, AD,N should not be mapped. It…
Converting list to comma separated integers to be substituted in IN clause of Pandas dataframe query
I have a list of integers that contains EMPI_ID I have a variable that defines the SQL query Columns for the dataframe: When I try to convert them to comma separated integer values, the sql_string hold Str values and is failing to fetch the data from database. Please advise how i can change the query to subst…
Send column names that contain a certain string to a list in pandas
I have the following dataframe that contains columns like: I would like the column names that contain _main sent to a list. such as: How best could I do that? Thanks very much! Answer try via filter() method: OR Via boolean masking: