I am seeing a difference in the behaviour of Series.tz_convert between pandas 0.20.1 and 1.2.4, but I don’t understand the cause and cannot find where this change is documented, if it is intentional. Here is some test code: Under pandas 0.20.1 it gives this output: But under 1.2.4 we get this: Looks like under 1.2.4 the Series.tz_convert routine just doesn’t
Tag: pandas
Standardizing a set of columns in a pandas dataframe with sklearn
I have a table with four columns: CustomerID, Recency, Frequency and Revenue. I need to standardize (scale) the columns Recency, Frequency and Revenue and save the column CustomerID. I used this code: But the result is a table without the column CustomerID. Is there any way to get a table with the corresponding CustomerID and the scaled columns? Answer fit_transform
Is there an easy way to establish a hierarchy between entities using only 2 ID fields?
I have a table with 2 fields like so: Account_ID Parent_ID x y x1 y x2 y y z y1 z y2 z z z z a z1 a a a b b The IDs fields are both in int64 format. The first field represents accounts which could be controlled by a parent account which could be itself controlled by
Python pandas dataframe populate hierarchical levels from parent child
I have the following dataframe which contains Parent child relation: I would like to get a new dataframe which contains e.g. all children of parent a: child level1 level2 level x d a b – b a – – c a – – f a c – h a c f g a c – I do not know how
Python DataFrame: Map two dataframes based on day of month?
I have two dataframes. month_data dataframe has days from start of the month to the end. student_df with each student’s only present data. I’m trying to map both dataframes so that the remaining days left for each student should be marked as absent month_data month_data = pd.DataFrame({‘day_of_month’:pd.date_range(’01/01/2021′,’31/01/2021′)}) student_df final_df Answer You can create a new dataframe containing all dates and
KMeans clustering from all possible combinations of 2 columns not producing correct output
I have a 4 column dataframe which I extracted from the iris dataset. I use kmeans to plot 3 clusters from all possible combinations of 2 columns. However, there seems to be something wrong with the output, especially since the cluster centers are not placed at the center of the clusters. I have provided examples of the output. Only cluster_1
Is there a function to write certain values of a dataframe to a .txt file in Python?
I have a dataframe as follows: Basically I would like to write the dataframe to a txt file, such that every row consists of the index and the subsequent column name only, excluding the zeroes. For example: The dataset is quite big, about 1k rows, 16k columns. Is there any way I can do this using a function in Pandas?
import 2 dataframes from a function in a different python file
I have a python file which I have called Pre_Processing_File.py, this file has the function pre_Processing which loads in a text file and creates 3 data frames; userListing_DF,PrivAcc,allAccountsDF, this function then returns the 3 DFs. What I want to do is create another script and import the 3 DFs from the pre_Processing.py file, I have created a script called call_DFs
Pandas: using column of date to calculate number of days
I am using an AirBnb dataset. I have a column, ‘host_since’. The column contains date objects in the format of ‘DD/MM/YYYY’: for example, 24/09/2008. The columns’ data shows the date that an individual became a host. I want to create a new column in my dataframe that contains the the number of days since the host first joined. I am
How to match multiple columns from two dataframes that have different sizes?
One of the solutions that is similar is found in here where the asker only have a single dataframe and their requirements was to match a fixed string value: result = df.loc[(df[‘Col1′] ==’Team2’) & (df[‘Col2′]==’Medium’), ‘Col3’].values[0] However, the problem I encountered with the .loc method is that it requires the 2 dataframes to have the same size because it will