Tag: pandas

Change in pandas Series.tz_convert behaviour in v1.X (or thereabouts)?

I am seeing a difference in the behaviour of Series.tz_convert between pandas 0.20.1 and 1.2.4, but I don’t understand the cause and cannot find where this change is documented, if it is intentional. Here is some test code: Under pandas 0.20.1 it gives this output: But under 1.2.4 we get this: Looks lik…

Standardizing a set of columns in a pandas dataframe with sklearn

pandas python scikit-learn standardized

I have a table with four columns: CustomerID, Recency, Frequency and Revenue. I need to standardize (scale) the columns Recency, Frequency and Revenue and save the column CustomerID. I used this code: But the result is a table without the column CustomerID. Is there any way to get a table with the correspondi…

Is there an easy way to establish a hierarchy between entities using only 2 ID fields?

dataframe pandas python python-3.x

I have a table with 2 fields like so: Account_ID Parent_ID x y x1 y x2 y y z y1 z y2 z z z z a z1 a a a b b The IDs fields are both in int64 format. The first field represents accounts which could be controlled by a parent account which could be itself controlled by

Python pandas dataframe populate hierarchical levels from parent child

pandas python recursion

I have the following dataframe which contains Parent child relation: I would like to get a new dataframe which contains e.g. all children of parent a: child level1 level2 level x d a b – b a – – c a – – f a c – h a c f g a c – I do not know how

Python DataFrame: Map two dataframes based on day of month?

dataframe numpy pandas python python-3.x

I have two dataframes. month_data dataframe has days from start of the month to the end. student_df with each student’s only present data. I’m trying to map both dataframes so that the remaining days left for each student should be marked as absent month_data month_data = pd.DataFrame({‘day_…

KMeans clustering from all possible combinations of 2 columns not producing correct output

cluster-analysis k-means matplotlib pandas python

I have a 4 column dataframe which I extracted from the iris dataset. I use kmeans to plot 3 clusters from all possible combinations of 2 columns. However, there seems to be something wrong with the output, especially since the cluster centers are not placed at the center of the clusters. I have provided examp…

Is there a function to write certain values of a dataframe to a .txt file in Python?

dataframe file pandas python text-files

I have a dataframe as follows: Basically I would like to write the dataframe to a txt file, such that every row consists of the index and the subsequent column name only, excluding the zeroes. For example: The dataset is quite big, about 1k rows, 16k columns. Is there any way I can do this using a function in…

import 2 dataframes from a function in a different python file

dataframe function pandas python

I have a python file which I have called Pre_Processing_File.py, this file has the function pre_Processing which loads in a text file and creates 3 data frames; userListing_DF,PrivAcc,allAccountsDF, this function then returns the 3 DFs. What I want to do is create another script and import the 3 DFs from the …

Pandas: using column of date to calculate number of days

pandas python

I am using an AirBnb dataset. I have a column, ‘host_since’. The column contains date objects in the format of ‘DD/MM/YYYY’: for example, 24/09/2008. The columns’ data shows the date that an individual became a host. I want to create a new column in my dataframe that contains the…

How to match multiple columns from two dataframes that have different sizes?

dataframe pandas python

One of the solutions that is similar is found in here where the asker only have a single dataframe and their requirements was to match a fixed string value: result = df.loc[(df[‘Col1′] ==’Team2’) & (df[‘Col2′]==’Medium’), ‘Col3’].values[0] Howeve…