I have two dfs
xx
AVERAGE_CALL_DURATION | AVERAGE_DURATION | CHANGE_OF_DETAILS |
---|---|---|
267 | 298 0 | 0 |
421 | 609.33 | 0.33 |
330 | 334 0 | 0 |
240.5 | 666.5 | 0 |
628 | 713 0 | 0 |
and
NoC_c
AVERAGE_CALL_DURATION | AVERAGE_DURATION | CHANGE_OF_DETAILS |
---|---|---|
-5.93 | -4.95 | 0.90 |
593.50 | 595.70 | 1.00 |
I want to return 1 if the xx
column contains the range within NoC_c
(where column names are the same
I can do this for one column
def check_between_ranges(xx, NoC_c): ranges = NoC_c['AVERAGE_CALL_DURATION'] if (xx['AVERAGE_CALL_DURATION'] >= ranges.iloc[0]) and (xx['AVERAGE_CALL_DURATION'] <= ranges.iloc[1]): return 1 return xx['AVERAGE_CALL_DURATION'] xx['AVERAGE_CALL_DURATION2'] = xx.apply(lambda x: check_between_ranges(x, NoC_c), axis=1)
However, I need remove the element of manually specifying the column name as the actual dfs contain many more columns.
I have tried
a = NoC_c.columns def check_between_ranges(xx, NoC_c): ranges = NoC_c[a] if (xx[a] >= ranges.iloc[0]) & (xx[a] <= ranges.iloc[1]): return 1 xx.apply(lambda x: check_between_ranges(x, NoC_c[a]), axis=1)
However, I get the error
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
.
I tried the solutions listed here, although, they were unsuccessful
Also read this to address the specific error but didn’t aid in my issue
Any help would be appreciated.
Traceback (most recent call last): File "<ipython-input-11-2affca771555>", line 10, in <module> xx.apply(lambda x: check_between_ranges(x, NoC_c[a]), axis=1) File "C:Program FilesAnaconda3libsite-packagespandascoreframe.py", line 7552, in apply return op.get_result() File "C:Program FilesAnaconda3libsite-packagespandascoreapply.py", line 185, in get_result return self.apply_standard() File "C:Program FilesAnaconda3libsite-packagespandascoreapply.py", line 276, in apply_standard results, res_index = self.apply_series_generator() File "C:Program FilesAnaconda3libsite-packagespandascoreapply.py", line 305, in apply_series_generator results[i] = self.f(v) File "<ipython-input-11-2affca771555>", line 10, in <lambda> xx.apply(lambda x: check_between_ranges(x, NoC_c[a]), axis=1) File "<ipython-input-11-2affca771555>", line 6, in check_between_ranges if (xx[a] >= ranges.iloc[0]) & (xx[a] <= ranges.iloc[1]): File "C:Program FilesAnaconda3libsite-packagespandascoregeneric.py", line 1330, in __nonzero__ f"The truth value of a {type(self).__name__} is ambiguous. " ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Edit:: Many thanks to @jch for the solution. I’m re-posting here as I had to modify some of the syntax for it to work with my datasets
def check_between_ranges(x): v = [] for c in x.index: if (x[c] >= NoC_c.iloc[0][c]) & (x[c] <= NoC_c.iloc[1][c]): v += [1] else: v += [x[c]] return pd.Series(v, index=x.index) xx.apply(check_between_ranges, axis=1)
Advertisement
Answer
Would this work for you?
Comparison Function
def check_between_ranges(x): v = [] for c in x.index: if (x[c] >= NoC_c.at[0,c]) & (x[c] <= NoC_c.at[1,c]): v += [1] else: v += [x[c]] return pd.Series(v, index=x.index)
Execution
xx.apply(check_between_ranges, axis=1)
Result
AVERAGE_CALL_DURATION AVERAGE_DURATION CHANGE_OF_DETAILS 0 1.0 1.00 0.00 1 1.0 609.33 0.33 2 1.0 1.00 0.00 3 1.0 666.50 0.00 4 628.0 713.00 0.00