I have a data frame of temperatures with below columns,
| datetime | 0 | 1 | 2 | 3 | |--------------------------|---|-------|-------|-------| | 21-10-2022 09:17:21.1721 | 0 | 47.79 | 43.62 | 46.22 | | 21-10-2022 09:17:31.1731 | 0 | 47.79 | 43.56 | 46.22 | | 21-10-2022 09:17:41.1741 | 0 | 47.82 | 43.59 | 46.32 | | 21-10-2022 09:17:51.1751 | 0 | 47.85 | 43.62 | 46.32 | | 21-10-2022 09:18:01.181 | 0 | 47.85 | 43.65 | 46.45 | | 21-10-2022 09:18:11.1811 | 0 | 47.85 | 43.56 | 56.42 | | 21-10-2022 09:18:21.1821 | 0 | 57.85 | 43.59 | 46.52 |
Requirement:
- To create new columns based on existing column values
- To create a list for future use using append
| datetime | 0 | 1 | 2 | 3 | temp_ng | 900 | 901 | 902 | 903 | |--------------------------|---|-------|-------|-------|---------|-----|-----|-----|-----| | 21-10-2022 09:17:21.1721 | 0 | 47.79 | 43.62 | 46.22 | 0 | 0 | 0 | 0 | 0 | | 21-10-2022 09:17:31.1731 | 0 | 47.79 | 43.56 | 46.22 | 0 | 0 | 0 | 0 | 0 | | 21-10-2022 09:17:41.1741 | 0 | 47.82 | 43.59 | 46.32 | 0 | 0 | 0 | 0 | 0 | | 21-10-2022 09:17:51.1751 | 0 | 47.85 | 43.62 | 46.32 | 0 | 0 | 0 | 0 | 0 | | 21-10-2022 09:18:01.181 | 0 | 47.85 | 43.65 | 46.45 | 0 | 0 | 0 | 0 | 0 | | 21-10-2022 09:18:11.1811 | 0 | 47.85 | 43.56 | 56.42 | 1 | 0 | 0 | 0 | 1 | | 21-10-2022 09:18:21.1821 | 0 | 57.85 | 43.59 | 46.52 | 1 | 0 | 1 | 0 | 0 |
List:
issue = ['1 is not_ok', '3 is not_ok']
Code: Works
issue = [] # creates a new list for col in range(0, 4): df.loc[df[col] >= 50, 'temp_ng'] = 1 # 1.chk all rows & create new 'temp_ng' column with 1/0 df.loc[df[col] >= 50, 900 + col] = 1 # 2.again chk all rows &new columns eg 900,901,902,903 if (df[col] >= 50).any(): # 3.again chk all rows & to create a list issue.append(str(col) +' is not_ok')
Doubt:
I have to check all rows for each column 3 times using .loc and if condition. Is there a way to reduce checking 3 times, since the condition to check >50 is the same?
Advertisement
Answer
With the dataframe you provided:
import pandas as pd df = pd.DataFrame( { "datetime": [ "21-10-2022 09:17:21.1721", "21-10-2022 09:17:31.1731", "21-10-2022 09:17:41.1741", "21-10-2022 09:17:51.1751", "21-10-2022 09:18:01.1810", "21-10-2022 09:18:11.1811", "21-10-2022 09:18:21.1821", ], 0: [0, 0, 0, 0, 0, 0, 0], 1: [47.79, 47.79, 47.82, 47.85, 47.85, 47.85, 57.85], 2: [43.62, 43.56, 43.59, 43.62, 43.65, 43.56, 43.59], 3: [46.22, 46.22, 46.32, 46.32, 46.45, 56.42, 46.52], } )
Here is one way to do it:
# Check rows and create a temporary dataframe with '90x' columns int_cols = [col for col in df.columns if isinstance(col, int)] temp = pd.DataFrame( df.apply(lambda x: [1 if x[col] >= 50 else 0 for col in int_cols], axis=1).tolist(), columns=[900 + col for col in int_cols], ) # Add 'temp_ng' column to df, then '90x' columns df["temp_ng"] = ( temp[[col for col in temp.columns if isinstance(col, int) and col >= 900]] .any(axis=1) .astype(int) ) df = pd.concat([df, temp], axis=1) # Get list of not-ok columns issues = [ f"{col} is not_ok" for col in (col for col in df.columns if isinstance(col, int) and col >= 900) if df[col].any() ]
print(df) # Output datetime 0 1 2 3 temp_ng 900 901 902 903 0 21-10-2022 ... 0 47.79 43.62 46.22 0 0 0 0 0 1 21-10-2022 ... 0 47.79 43.56 46.22 0 0 0 0 0 2 21-10-2022 ... 0 47.82 43.59 46.32 0 0 0 0 0 3 21-10-2022 ... 0 47.85 43.62 46.32 0 0 0 0 0 4 21-10-2022 ... 0 47.85 43.65 46.45 0 0 0 0 0 5 21-10-2022 ... 0 47.85 43.56 56.42 1 0 0 0 1 6 21-10-2022 ... 0 57.85 43.59 46.52 1 0 1 0 0
print(issues) # ['901 is not_ok', '903 is not_ok']