Skip to content
Advertisement

Looping dataframe to check for specific condition using if and loc

I have a data frame of temperatures with below columns,

| datetime                 | 0 | 1     | 2     | 3     |
|--------------------------|---|-------|-------|-------|
| 21-10-2022 09:17:21.1721 | 0 | 47.79 | 43.62 | 46.22 |
| 21-10-2022 09:17:31.1731 | 0 | 47.79 | 43.56 | 46.22 |
| 21-10-2022 09:17:41.1741 | 0 | 47.82 | 43.59 | 46.32 |
| 21-10-2022 09:17:51.1751 | 0 | 47.85 | 43.62 | 46.32 |
| 21-10-2022 09:18:01.181  | 0 | 47.85 | 43.65 | 46.45 |
| 21-10-2022 09:18:11.1811 | 0 | 47.85 | 43.56 | 56.42 |
| 21-10-2022 09:18:21.1821 | 0 | 57.85 | 43.59 | 46.52 |

Requirement:

  1. To create new columns based on existing column values
  2. To create a list for future use using append
| datetime                 | 0 | 1     | 2     | 3     | temp_ng | 900 | 901 | 902 | 903 |
|--------------------------|---|-------|-------|-------|---------|-----|-----|-----|-----|
| 21-10-2022 09:17:21.1721 | 0 | 47.79 | 43.62 | 46.22 | 0       | 0   | 0   | 0   | 0   |
| 21-10-2022 09:17:31.1731 | 0 | 47.79 | 43.56 | 46.22 | 0       | 0   | 0   | 0   | 0   |
| 21-10-2022 09:17:41.1741 | 0 | 47.82 | 43.59 | 46.32 | 0       | 0   | 0   | 0   | 0   |
| 21-10-2022 09:17:51.1751 | 0 | 47.85 | 43.62 | 46.32 | 0       | 0   | 0   | 0   | 0   |
| 21-10-2022 09:18:01.181  | 0 | 47.85 | 43.65 | 46.45 | 0       | 0   | 0   | 0   | 0   |
| 21-10-2022 09:18:11.1811 | 0 | 47.85 | 43.56 | 56.42 | 1       | 0   | 0   | 0   | 1   |
| 21-10-2022 09:18:21.1821 | 0 | 57.85 | 43.59 | 46.52 | 1       | 0   | 1   | 0   | 0   |

List:

issue = ['1 is not_ok', '3 is not_ok']

Code: Works

issue = []                                        # creates a new list
for col in range(0, 4): 
    df.loc[df[col] >= 50, 'temp_ng'] = 1  # 1.chk all rows & create new 'temp_ng' column with 1/0    
    df.loc[df[col] >= 50, 900 + col] = 1  # 2.again chk all rows &new columns eg 900,901,902,903
    if (df[col] >= 50).any():             # 3.again chk all rows & to create a list
        issue.append(str(col) +' is not_ok')

Doubt:

I have to check all rows for each column 3 times using .loc and if condition. Is there a way to reduce checking 3 times, since the condition to check >50 is the same?

Advertisement

Answer

With the dataframe you provided:

import pandas as pd

df = pd.DataFrame(
    {
        "datetime": [
            "21-10-2022 09:17:21.1721",
            "21-10-2022 09:17:31.1731",
            "21-10-2022 09:17:41.1741",
            "21-10-2022 09:17:51.1751",
            "21-10-2022 09:18:01.1810",
            "21-10-2022 09:18:11.1811",
            "21-10-2022 09:18:21.1821",
        ],
        0: [0, 0, 0, 0, 0, 0, 0],
        1: [47.79, 47.79, 47.82, 47.85, 47.85, 47.85, 57.85],
        2: [43.62, 43.56, 43.59, 43.62, 43.65, 43.56, 43.59],
        3: [46.22, 46.22, 46.32, 46.32, 46.45, 56.42, 46.52],
    }
)

Here is one way to do it:

# Check rows and create a temporary dataframe with '90x' columns
int_cols = [col for col in df.columns if isinstance(col, int)]
temp = pd.DataFrame(
    df.apply(lambda x: [1 if x[col] >= 50 else 0 for col in int_cols], axis=1).tolist(),
    columns=[900 + col for col in int_cols],
)

# Add 'temp_ng' column to df, then '90x' columns
df["temp_ng"] = (
    temp[[col for col in temp.columns if isinstance(col, int) and col >= 900]]
    .any(axis=1)
    .astype(int)
)
df = pd.concat([df, temp], axis=1)

# Get list of not-ok columns
issues = [
    f"{col} is not_ok"
    for col in (col for col in df.columns if isinstance(col, int) and col >= 900)
    if df[col].any()
]
print(df)
# Output
         datetime  0      1      2      3  temp_ng  900  901  902  903
0  21-10-2022 ...  0  47.79  43.62  46.22        0    0    0    0    0
1  21-10-2022 ...  0  47.79  43.56  46.22        0    0    0    0    0
2  21-10-2022 ...  0  47.82  43.59  46.32        0    0    0    0    0
3  21-10-2022 ...  0  47.85  43.62  46.32        0    0    0    0    0
4  21-10-2022 ...  0  47.85  43.65  46.45        0    0    0    0    0
5  21-10-2022 ...  0  47.85  43.56  56.42        1    0    0    0    1
6  21-10-2022 ...  0  57.85  43.59  46.52        1    0    1    0    0
print(issues)  # ['901 is not_ok', '903 is not_ok']
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement