Tag: pandas

How to set values in a dataframe according to multiple conditions and different groups?

I have a complex mapping problem. I have a table with groups. Each group has a list of 7 products The product list may contain duplicate products. MY GOAL : I have to map 5 products in the list of 7 products If the value of the column rank is equal to 1, 2, 3 then Product_1, Product_2, Product_3. If

Most efficient way to split up a dataframe into smaller dataframes

pandas performance python

I am writing a python program that will parse a large dataframe (tens of thousands of lines) into smaller dataframes based on a column value, and it needs to be fairly efficient, because the user can change the ways they break up the dataframe, and I would like the output to update dynamically. Example input: id Column_1 Column_2 1 Oct

TypeError when plotting multiple scatter graphs?

pandas python scatter-plot typeerror

I am having an issue when trying to plot scatter graphs. I am using Tiingo Api to get stock data and have plotted 4 separate histograms for my data with no issue (code below) However when I run this code to try to do something similar with a scatter graph I get a TypeError: no numerical data to plot, although

How can I create a column in one DataFrame containing the count of matches in another DataFrame’s column?

pandas python

I have two pandas DataFrames. The first one, df1, contains a column of file paths and a column of lists containing what users have read access to these file paths. The second DataFrame, df2, contains a list of all possible users. I’ve created an example below: The end goal is to create a new column df2[‘read_count’], which should take each

Performing calculations on DataFrames of different lengths

pandas python

I have two different DataFrames that look something like this: Lat Lon 28.13 -87.62 28.12 -87.65 …… …… Calculated_Dist_m 34.5 101.7 ………….. The first DataFrame (name=df) (consisting of the Lat and Lon columns) has just over 1000 rows (values) in it. The second DataFrame (name=new_calc_dist) (consisting of the Calculated_Dist_m column) has over 30000 rows (values) in it. I want to

vlookup in pandas python

pandas python

I have two dataframes I want to check if a column from first dataframe contains values that are in the column of second dataframe, and if it does, create a column and add 1 to the row where it contains a value from first column first df: A header Another header First apple Second orange third banana fourth tea desired

How to take sample of data from very unbalanced DataFrame so as to not lose too many ‘1’?

dataframe pandas python sampling

I have a Pandas DataFrame like below with ID and Target variable (for machine learning model). My DataFrame is really large and unbalanced. I need to make sampling on my DataFrame because it is really large Balancing the DataFrame looks like this: 99.60% – 0 0.40 % – 1 ID TARGET 111 1 222 1 333 0 444 1 …

How to combine multiple rows into one row WITHOUT group by in Python Pandas?

data-science pandas python

I tried searching it on stack overflow, and I got a lot of similar titles but the problem isn’t quite the same (it appears very different). Also, read some of the documentation (not all from Pandas) but couldn’t find any method to do this. Suppose I have a dataframe like: How do I combine this into one row in Pandas?