In pandas, if we have a time series and need to group it by a certain frequency (say, every two weeks), it’s possible to use the Grouper class, like this: Is there any equivalent in Spark (more specifically, using Scala) for this feature? Answer You can use the sql function window. First, you create the…
Tag: pandas
Alternative way of writing for loop and if in python when working with a dataframe to make it faster
I have a data frame named ‘plans_to_csv’ looking like this: I need to do the following analysis to realize what is the actual mode. But this takes so long to run. Is there an alternative way for writing this code to make it faster? Thanks a lot for your help in advance. Answer You can shift the co…
plotly axis labels don’t show
I am trying to plot something with plotly bt the axis labels don’t show. I can’t find what I’m doing wrong. Answer You just need to set xaxis_title and yaxis_title Using Plotly Express
Pandas: automatically reorder columns based on condition
I have this table, with index columns represented by week number: I want to reorganize columns order to obtain this: Weeks 1 & 2 are for 2022, weeks 45 to 52 are for 2021, so I want reorganize table to have weeks 1 & 2 after week 52. I did this code, but I would have an automatic solution: For
How can I handle invalid phone numbers using python’s phonenumbers package and apply?
I have a dataframe containing a variety of phone numbers that I want to extract the time zone for. I am apply to loop over the series in the dataframe as follows And this works just fine as long as the phone number in x.external_number doesn’t contain a single invalid phone number; however, if one singl…
Python – Iterate through multiple dataframes and append data to a new dataframe
I have 3 pandas dataframes. I would like to append one row from each in each iteration to an existing dataframe. Example shown below: Dummy code: Please could someone point me in the right direction? Answer Concatenate them, using the keys argument to associate an index with rows from each original dataframe,…
Matching datetime column in pandas with another datetime column and return index
I have two DataFrames – df1 and df2. Both of them contain a datetime column, say date1 and date2. I want to match each value of date1 column to date2 and store the index in a new column. I am trying the following code: but this line throws out following error: Can only compare identically-labeled series…
Scikit-learn pipeline: Non-finite test scores error / Inconsistent number of samples
I have a dataframe with two columns of texts and only the POS tags (of the same texts), which I want to use for language classification. I am trying to use both features as part of my model. This is what the data looks like: X_train.head() This is what the shape of the data looks like: When I run my
Finding whether there is any overlap between two date periods in DataFrame
I have the following pd.DataFrame that looks like: I want to create a new column received_medidation that states whether or not (boolean) the patient received medication between admission_timestamp and end_period (even if it was for only one second). So, the boolean should state if there is any time between a…
Pandas – Groupby and Standardize
I have tried to tackle this for quite some time, but haven’t been able to get a pythonic way around it by using the built-in groupby and transform methods from pandas. The goal is to group the data by columns ex_date and id, then within the groups identified, standardize the column called ref_value_1 ag…