I have a variable called ‘Specimen’ from a dataframe imported from a xlsx file, and I want to implement the result of this variable (in this case 25) in a new dataframe? What I have: array([[25.0]], dtype=object) What I want: I have tried But this gives the following error: TypeError: RangeIndex.n…
Tag: pandas
Linear regression prediction based on group of data in test set
I have a simple dataset which looks like this: I created a simple LR model to train and predict the target variable “sales”. And I used MAE to evaluate the model My code works well, but what I want to do is to predict the sales in the X_test grouped by hour of the day. In the above dataset example…
How to convert dataframe column into list of lists json format?
My dataframe looks like below and I want to use sales column and convert it into json files for each month as a list of lists. sales dt 156 2022-01 192 2022-01 147 2022-02 192 2022-02 Using this, I am getting this format of json files: However, I want to have: I have created this as a toy example. I
How to set merge and normalize multple dataframes for pd.merge_as_of
I am trying to merge multiple dataframes using pd.merge_asof. They all contain 2 columns with datetime as index column and a variable column with floating values. They are not balanced in their indexes and times so I have to normalize the values. I can succesfully merge the dfs and normalize the values like t…
Ungrouping a pandas dataframe after aggregation operation
I have used the “groupby” method on my dataframe to find the total number of people at each location. To the right of the “sum” column, I need to add a column that lists all of the people’s names at each location (ideally in separate rows, but a list would be fine too). Is there …
How to calculate a Process Duration from a TimeSeries Dataset with Pandas
I have a huge dataset of various sensor data sorted chronologically (by timestamp) and by sensor type. I want to calculate the duration of a process in seconds by subtracting the first entry of a sensor from the last entry. This is to be done with python and pandas. Attached is an example for better understan…
getting “KeyError” while implementing Z-score on a dataset
I have been trying to implement z-score normalization to all of the numeric values present in combined_data with the following code: Here, combined_data is the combination of training and testing datasets as a dataframe and passed through one-hot encoding. I am seeing the following error: The dataset combined…
Looking for the quickest way to find the value of a column given an index for all rows
I’m looking for a more efficient way to grab acertain value at a certain index for every row in a Pandas DataFrame than the one I have currently. Here’s the code I have so far. The .apply() function with the lambda syntax is super slow so would love a more optimised version. Let me know if you nee…
How can i filter on column names part of which contain a value in a given list?
Not sure if this has been asked else where but i couldn’t find the relevant question if there was. I have a list of values e.g.: i then have colnames such as: col1_6757, col_1234, col1_5432, col1_1110, amount_1110, etc I would like to filter on the pandas dataframe such that i only retrieve those colnam…
Count number of days in each continuous period pandas
Suppose I have next df N03_zero (date_code is already datetime): Millions of rows with date_code assigned to some item_code. I am trying to get the number of days of each continuous period for each item_code, all other similar questions doesn’t helped me. The expected df should be: Once days sequence br…