I have monthly dataframe (df) that is already in min – max ranges like the below:
Wind Jan Feb Nov Dec calib West 0.1-25.5 2.8-65.3 1.3-61.3 0.9-35.3 50 North 0.2-28.3 3.1-66.4 1.0-67.7 1.9-40.1 60 South 0.3-29.5 2.5-49.4 1.9-63.4 0.3-33.0 60 East 20.5 1.1-41.1 0.9-40.3 nan 50
I want to know the number of times the max wind speed was below the calib number each month. So I am trying to create a column Speed below calib (sbc) like below.
month_col = ['Jan', 'Feb', 'Nov', 'Dec'] df['sbc'] = (pd.to_numeric(df[month_col].str.extract(r"(?<=-)(d+.d+)")) < df["calib"]).sum(axis=1)
The above code is not working and I am getting the error AttributeError: 'DataFrame' object has no attribute 'str'
. How would I fix this?
Advertisement
Answer
You can use melt
:
sbc = (df.melt(['Wind', 'calib'], var_name='month') .assign(value=lambda x: x['value'].str.split('-').str[1].astype(float)) .query('value < calib').value_counts('Wind')) df['sbc'] = df['Wind'].map(sbc)
Output:
>>> df Wind Jan Feb Nov Dec calib sbc 0 West 0.1-25.5 2.8-65.3 1.3-61.3 0.9-35.3 50 2 1 North 0.2-28.3 3.1-66.4 1.0-67.7 1.9-40.1 60 2 2 South 0.3-29.5 2.5-49.4 1.9-63.4 0.3-33.0 60 3 3 East 20.5 1.1-41.1 0.9-40.3 NaN 50 2
Step by step:
- Reshape your dataframe
>>> out = df.melt(['Wind', 'calib'], var_name='month') Wind calib month value 0 West 50 Jan 0.1-25.5 1 North 60 Jan 0.2-28.3 2 South 60 Jan 0.3-29.5 3 East 50 Jan 20.5 4 West 50 Feb 2.8-65.3 5 North 60 Feb 3.1-66.4 6 South 60 Feb 2.5-49.4 7 East 50 Feb 1.1-41.1 8 West 50 Nov 1.3-61.3 9 North 60 Nov 1.0-67.7 10 South 60 Nov 1.9-63.4 11 East 50 Nov 0.9-40.3 12 West 50 Dec 0.9-35.3 13 North 60 Dec 1.9-40.1 14 South 60 Dec 0.3-33.0 15 East 50 Dec NaN
- Extract max wind from range
>>> out = out.assign(value=lambda x: x['value'].str.split('-').str[1].astype(float)) Wind calib month value 0 West 50 Jan 25.5 1 North 60 Jan 28.3 2 South 60 Jan 29.5 3 East 50 Jan NaN 4 West 50 Feb 65.3 5 North 60 Feb 66.4 6 South 60 Feb 49.4 7 East 50 Feb 41.1 8 West 50 Nov 61.3 9 North 60 Nov 67.7 10 South 60 Nov 63.4 11 East 50 Nov 40.3 12 West 50 Dec 35.3 13 North 60 Dec 40.1 14 South 60 Dec 33.0 15 East 50 Dec NaN
- Filter out rows and count
>>> out = out.query('value < calib').value_counts('Wind') Wind South 3 East 2 North 2 West 2 dtype: int64
Finally map (merge) this series to your original dataframe.