if statement and for loop I am stuck with the following code, I have a column in which I want to divide by 2 if the number is above 10 and run this for all the rows. I have tried this code but it gives the error of the series is ambiguous: I suppose that I need a for loop
Tag: dataframe
How to solve ValueError while checking rows in a particular column in pandas dataframe?
I’m trying to get number of “NaN” values in particular column using below code. I can’t use df[“column_name”].isna().sum() because i have thousands and column and i want to check number of null values in each column. Sometimes i also need to check symbols presents in the co…
Replace multiple “less than values” in different columns in pandas dataframe
I am working with python and pandas. I have a dataset of lab analysis where I am dealing with multiple parameters and detection limits(dl). Many of the samples are reported as below the dl (e.g.<dl,<4) For example: My goal is to replace all <dl with dl/2 as a float value. I can do this for one column…
Mapping complex JSON to Pandas Dataframe
BackgroundI have a complex nested JSON object, which I am trying to unpack into a pandas df in a very specific way. JSON Objectthis is an extract, containing randomized data of the JSON object, which shows examples of the hierarchy (inc. children) for 1x family (i.e. ‘Falconer Family’), however th…
df.to_dict make duplicated index (pandas) as primary key in a nested dict
I have this data frame which I’d like to convert to a dict in python, I have many other categories, but showed just two for simplicity I want the output to be like this Answer You can do this without assigning an additional column or aggregating using list: I created a separate function for readability …
Most efficient way to check cells and change neighbors matching a condition in a dataframe
I’m using a pandas dataframe to store a dynamic 2D game map for a rougelike style game map editor. The player can draw and erase rooms. I need to draw walls around these changing rooms. I have this: And need this: What is the most efficient way to do this? So far I followed the approach outlined here, b…
Find a substring in cells across multiple columns in a Pandas dataframe
I have a large DataFrame with 50+ columns which I’m simplifying here below: I’m trying to find a) whether there are any instances of ‘—>’ in any of the cells across the DataFrame? b) if so where? (optional) So far I’ve tried 2 approaches this only works for strings not s…
Pandas: Check each row for condition and insert row below if condition is met
this is my first question here as I really couldn’t figure it out with related answers: I have a list of dataframes “df_list”, for each user I have a dataframe which basically looks like: Data: I would like to go through all the dataframes in my df_list and inside each df I would like to add…
Logical with count in Pyspark
I’m new to Pyspark and I have a problem to solve. I have a dataframe with 4 columns, being customers, person, is_online_store and count: customer PersonId is_online_store count afabd2d2 4 true 1 afabd2d2 8 true 2 afabd2d2 3 true 1 afabd2d2 2 false 1 afabd2d2 4 false 1 I need to create according to the f…
Calculate difference between date column entries and date minimum Pyspark
I feel like this is a stupid question, but I cannot seem to figure it out, so here goes. I have a PySpark data frame and one of the columns consists of dates. I want to compute the difference between each date in this column and the minimum date in the column, for the purpose of filtering to the past