Skip to content
Advertisement

How to skip apply function for missing value cell in pandas

I have a Dataset as below :

import pandas as pd
from workdays import workday, networkdays 
path = r'C:UsersuserDocumentsGitHublearningabc1test_labtatlab.xlsx'
df = pd.read_excel(path)





    start date  End date       HT            D
0   2022-02-08  NaT         indirect         BL
1   2022-01-20  NaT         direct           None
2   2022-01-23  NaT         direct           None
3   2022-01-23  NaT         direct           None
4   2022-02-07  NaT         direct           None
5   2022-02-07  NaT         direct           None
6   2022-02-09  NaT         direct           None
7   2022-02-09  NaT         direct           None
8   2022-02-10  NaT         direct           None
9   2022-02-11  2022-02-13  direct           None
10  2022-02-16  NaT         direct           None
11  2022-02-16  NaT         direct           None
12  2022-02-16  NaT         direct           None
13  2022-01-15  2022-01-21  direct           None
14  2022-01-17  2022-01-17  direct           None

I write the code to calculate networkdays for these row have date value in column ‘End Date’ :

df.loc[df['D']=='BL', 'D'] = df.apply(lambda x: networkdays(x['start date'],x['End date']) if not pd.isnull(x['End date']) else x['End date'],axis=1)  #if column'D' value = 'BL' then skip its value , just apply for the rest cell in D with criterias  ['End date'], ['Start date'] not null

however, I got the error below, I don’t know how I got this, could you please help look ?

my expect output like below:

    start date  End date    HT     D
0   2022-02-08  NaT indirect      BL
1   2022-01-20  NaT direct  None
2   2022-01-23  NaT direct  None
3   2022-01-23  NaT direct  None
4   2022-02-07  NaT direct  None
5   2022-02-07  NaT direct  None
6   2022-02-09  NaT direct  None
7   2022-02-09  NaT direct  None
8   2022-02-10  NaT direct  None
9   2022-02-11  2022-02-13  direct  3
10  2022-02-16  NaT direct  None
11  2022-02-16  NaT direct  None
12  2022-02-16  NaT direct  None
13  2022-01-15  2022-01-21  direct  5
14  2022-01-17  2022-01-17  direct  1

Advertisement

Answer

I believe the problem comes from how you call the apply function.

By default, apply works on columns [1], but you can change that using the axis parameter.

Something like this might give you the expected result:

df['days'] = df.apply(
    lambda x:
        networkdays(x['start date'], x['End date'])
        if not pd.isnull(x['End date'])
        else "can not call"
    , axis=1                # use axis=1 to work with rows instead of columns
)

[1]

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement