Tag: dataframe

Group by Issue with Years Pandas

I’m following the answer for this StackOverflow post to group a column of years by decades to make it easier for me to visualize later, but I’m not getting the same results. It seems like when DSM did it, it yielded integers for years, while mine is yielding floats for years. I’ve implemented: My Results: Picture of Results Answer You

Pandas: efficiently inserting a large number of rows

dataframe numpy pandas performance python

I have a large dataframe in this format, call this df: index val1 val2 0 0.2 0.1 1 0.5 0.7 2 0.3 0.4 I have a row I will be inserting, call this myrow: index val1 val2 -1 0.9 0.9 I wish to insert this row 3 times after every row in the original dataframe, i.e.: index val1 val2 0

Why does matplotlib.pyplot.savefig() mess up image outputs for very large pandas.plotting.scatter_matrix()?

dataframe matplotlib pandas python

I was trying to compute the pandas.plotting.scatter_matrix() values for very large pandas.DataFrame() (relatively speaking for this specific operation, most libraries either run OOM most of the time or implement a row count check of 50000, see vaex-scatter). The ‘Time series’ DataFrame shape I have is (10000000, 41). Every value is either a float or an integer. Q1: So the first

Python 3 – How do I extract data from SQL database and process the data and append to pandas dataframe row by row?

dataframe mysql pandas python python-3.x

I have a MySQL database, its columns are: I need to extract data from it and process the data and add the data to a pandas DataFrame. I know how to extract data from SQL database, and I have already implemented a way to pass the data to DataFrame, but it is extremely slow (about 30 seconds), whereas when I

Python Pandas Dataframe enrichment (from another)

dataframe merge numpy pandas python

I would like to enrich a dataframe (df1) from another(df2) by adding a new column in df1 and enriching it based on what I find in df2. The size of the 2 df is different as well as the name of the columns. I would like to do like a Vlookup function in Excel. This what I’ve done but I

checking if it is a holiday based on date using holidays library -python

dataframe pandas python python-3.x python-holidays

I have a dataset from the last 3 years, I would like to add a new column based on holidays. when I try this : I get the result now I wanted to create a new column in my existing dataset with true/false in case of holiday. I tried to use the below code snippet. The result I was expecting

I want to filter rows from data frame where the year is 2020 and 2021 using re.search and re.match functions

dataframe pandas python

Data Frame: I want the data frame which consists for only year with 2020 and 2021 using search and match methods. Answer

Check if values in one dataframe match values from another, updating dataframe

dataframe pandas python python-3.x

Let’s say I have 2 dataframes, both have different lengths but the same amount of columns Lets assume that some of the data in df1 is outdated and I’ve received a new dataframe that contains some new data but not which may or may not exist already in the outdated dataframe. I want to find out if any of the

How can I split a cell in a pandas dataframe and keep the delimiter in another column?

dataframe pandas python split

persons John New York Janet New York Mike Denver Michelle Texas I want to split into 2 columns: person and city. I tried this: and it gives me this: What I want is to split by cities and keep the separator in the city column like this: Answer You can use regex with a capture group: Read more on how

DataFrame comparison with SQL Server table and upload just the differences

dataframe pandas python sql sql-server

I have an SQL table (table_1) that contains data, and I have a Python script that reads a csf and creates a dataframe. I want to compare the dataframe with the SQL table data and then insert the missing data from the dataframe into the SQL table. I went around and read this comparing pandas dataframe with sqlite table via