Tag: pandas

Find the median of a column based on criteria within dataframe and insert this as a new column

I’m trying to look in a dataframe, and find the median of data within a column based on another column. I have a dataframe with ‘zipcode’ data and ‘price’ data. I want to find the median of the ‘price’ based on the ‘zipcode’, and report it in a new column. When I run the program as is, I get a

pandas – creating new rows of combination with value of 0

dataframe matrix pandas python sparse-matrix

I have a pandas dataframe like user_id music_id has_rating A a 1 B b 1 and I would like to automatically add new rows for each of user_id & music_id for those users haven’t rated, like user_id music_id has_rating A a 1 A b 0 B a 0 B b 1 for each of user_id and music_id combination pairs those

ValueError: Invalid classes inferred from unique values of `y` in XGBoost

machine-learning pandas python xgboost

I’m new to the Data Science field and I’m trying to apply XGBoost in a table having 5 rows × 46 columns and my last column is my target column. and the error I’m getting is Can anyone help me with the resolution? Answer I think you need to have the class numerotated from 0 to n-1 where n is

Print side by side .describe() in pandas

numpy pandas python

Hello so i have two columns that im using describe() and im getting their stats. I have something like this I want to print desk1 and desk2 below of each category.I am doing this: I get this : And i my desired output is this: I would like to not create a dataframe.Any solutions? Thanks in advance Answer What about

Combining multiple CSVs in pandas

dataframe pandas python

I have multiple csv files (which I’ve moved into pandas dataframes) in a folder, each of which holds monthly website data and need to combine them by copying the Value column from each to make a new dataframe (which will ultimately be exported to another csv) A new csv file will be added to the folder each month, so I

Concatenate columns at the end of a MultiIndex columns DataFrame

pandas python

Consider the following DataFrames df : and df1: I want to concatenate the two DataFrames such that the resulting DataFrame is: What I run is pandas.concat([df1, df2, axis=1).sort_index(level=”kind”, axis=1) but that results in i.e. the column potato is appended at the beginning of df[“A”] whereas I want it appended to the end. Answer Add parameter sort_remaining=False in DataFrame.sort_index:

Collapsing rows

pandas python

I have the following table below: I would like to collapse Code_1 and Code_2 columns based on ID and Date. Based on what I have found online, I have tried the below snippet of code but it does not seem to be working. df= df.groupby([‘ID’,’Date’]).agg(”.join) DF: ID Date Count_Code1 Count_Code2 Code_1 Code_2 A1 2022-02-02 90 0 AAAA NaN A1 2022-02-02

Python Pandas how to test equality between pandas columns that are category data types

pandas python

I have large datasets that I cross-join with python pandas. Both datasets load in pandas and I convert all ‘object’ columns to ‘category’. The issue is I need to pd.query() against various ‘category’ dtype columns. When doing so with ‘category’ columns it returns an error (I expect this because not all columns have the same values (e.g. subsets and supersets

‘pyarrow’ is not installed – Snowpark stored procedure with Python

pandas python snowflake-cloud-data-platform snowpark

I have created this basic stored procedure to query a Snowflake table based on a customer id: It works fine but I would like my sproc to return a JSON object for the whole result set. I modified it thusly: It compiles without errors but fails at runtime with this error: What am I missing here? Answer You need to

How to append dataframe to an existing excel file with some rows of data in it?

dataframe excel pandas python

I have an excel sheet which has 150 rows of data. Now, I want to append a dataframe to that sheet without deleting or replacing the data using python. I have tried code like this, it is deleting the existing content from the excel and writing the dataframe into it. And other solutions provided here but with no outcome. Any