I have a data frame of temperatures with below columns,
| datetime | 0 | 1 | 2 | 3 | |--------------------------|---|-------|-------|-------| | 21-10-2022 09:17:21.1721 | 0 | 47.79 | 43.62 | 46.22 | | 21-10-2022 09:17:31.1731 | 0 | 47.79 | 43.56 | 46.22 | | 21-10-2022 09:17:41.1741 | 0 | 47.82 | 43.59 | 46.32 | | 21-10-2022 09:17:51.1751 | 0 | 47.85 | 43.62 | 46.32 | | 21-10-2022 09:18:01.181 | 0 | 47.85 | 43.65 | 46.45 | | 21-10-2022 09:18:11.1811 | 0 | 47.85 | 43.56 | 56.42 | | 21-10-2022 09:18:21.1821 | 0 | 57.85 | 43.59 | 46.52 |
Requirement:
- To create new columns based on existing column values
- To create a list for future use using append
| datetime | 0 | 1 | 2 | 3 | temp_ng | 900 | 901 | 902 | 903 | |--------------------------|---|-------|-------|-------|---------|-----|-----|-----|-----| | 21-10-2022 09:17:21.1721 | 0 | 47.79 | 43.62 | 46.22 | 0 | 0 | 0 | 0 | 0 | | 21-10-2022 09:17:31.1731 | 0 | 47.79 | 43.56 | 46.22 | 0 | 0 | 0 | 0 | 0 | | 21-10-2022 09:17:41.1741 | 0 | 47.82 | 43.59 | 46.32 | 0 | 0 | 0 | 0 | 0 | | 21-10-2022 09:17:51.1751 | 0 | 47.85 | 43.62 | 46.32 | 0 | 0 | 0 | 0 | 0 | | 21-10-2022 09:18:01.181 | 0 | 47.85 | 43.65 | 46.45 | 0 | 0 | 0 | 0 | 0 | | 21-10-2022 09:18:11.1811 | 0 | 47.85 | 43.56 | 56.42 | 1 | 0 | 0 | 0 | 1 | | 21-10-2022 09:18:21.1821 | 0 | 57.85 | 43.59 | 46.52 | 1 | 0 | 1 | 0 | 0 |
List:
issue = ['1 is not_ok', '3 is not_ok']
Code: Works
issue = [] # creates a new list
for col in range(0, 4):
df.loc[df[col] >= 50, 'temp_ng'] = 1 # 1.chk all rows & create new 'temp_ng' column with 1/0
df.loc[df[col] >= 50, 900 + col] = 1 # 2.again chk all rows &new columns eg 900,901,902,903
if (df[col] >= 50).any(): # 3.again chk all rows & to create a list
issue.append(str(col) +' is not_ok')
Doubt:
I have to check all rows for each column 3 times using .loc and if condition. Is there a way to reduce checking 3 times, since the condition to check >50 is the same?
Advertisement
Answer
With the dataframe you provided:
import pandas as pd
df = pd.DataFrame(
{
"datetime": [
"21-10-2022 09:17:21.1721",
"21-10-2022 09:17:31.1731",
"21-10-2022 09:17:41.1741",
"21-10-2022 09:17:51.1751",
"21-10-2022 09:18:01.1810",
"21-10-2022 09:18:11.1811",
"21-10-2022 09:18:21.1821",
],
0: [0, 0, 0, 0, 0, 0, 0],
1: [47.79, 47.79, 47.82, 47.85, 47.85, 47.85, 57.85],
2: [43.62, 43.56, 43.59, 43.62, 43.65, 43.56, 43.59],
3: [46.22, 46.22, 46.32, 46.32, 46.45, 56.42, 46.52],
}
)
Here is one way to do it:
# Check rows and create a temporary dataframe with '90x' columns
int_cols = [col for col in df.columns if isinstance(col, int)]
temp = pd.DataFrame(
df.apply(lambda x: [1 if x[col] >= 50 else 0 for col in int_cols], axis=1).tolist(),
columns=[900 + col for col in int_cols],
)
# Add 'temp_ng' column to df, then '90x' columns
df["temp_ng"] = (
temp[[col for col in temp.columns if isinstance(col, int) and col >= 900]]
.any(axis=1)
.astype(int)
)
df = pd.concat([df, temp], axis=1)
# Get list of not-ok columns
issues = [
f"{col} is not_ok"
for col in (col for col in df.columns if isinstance(col, int) and col >= 900)
if df[col].any()
]
print(df)
# Output
datetime 0 1 2 3 temp_ng 900 901 902 903
0 21-10-2022 ... 0 47.79 43.62 46.22 0 0 0 0 0
1 21-10-2022 ... 0 47.79 43.56 46.22 0 0 0 0 0
2 21-10-2022 ... 0 47.82 43.59 46.32 0 0 0 0 0
3 21-10-2022 ... 0 47.85 43.62 46.32 0 0 0 0 0
4 21-10-2022 ... 0 47.85 43.65 46.45 0 0 0 0 0
5 21-10-2022 ... 0 47.85 43.56 56.42 1 0 0 0 1
6 21-10-2022 ... 0 57.85 43.59 46.52 1 0 1 0 0
print(issues) # ['901 is not_ok', '903 is not_ok']