I have a dataset that looks like this:
Value Type X_sq -1.975767 Weather -0.540979 Fruits -2.359127 Fruits -2.815604 Corona -0.929755 Weather
I want to iterate through each row and calculate a sum of squares value for each row above (only if the Type matches). I want to put this value in the X.sq column.
So for example, in the first row, there’s nothing above. So only (-1.975767 x -1.975767). In the second row, there’s no FRUITS row above it, so it will just be -0.540979 x -0.540979. However, in the third row, when we scan all previous rows, we should find that FRUITS is already there. So we should get the last’s FRUIT’s ….. X_sq value and calculate a new sum of squares.
Value Type X_sq -1.975767 Weather -1.975767 * -1.975767 = x -0.540979 Fruits -0.540979 * -0.540979 = y -2.359127 Fruits y + ( -2.359127 x -2.359127) -2.815604 Corona -2.815604 * -2.815604 -0.929755 Weather x + (-0.929755 * -0.929755)
What would be an efficient way to do this?
def updateSS(X_sq, X_new): return X_sq + X_new**2
EDIT:
----> 1 df['sumOfSquares'] = df['avg_country_tone'].pow(2).groupby(['themes', 'suppliers_country']).cumsum() File /usr/local/Cellar/ipython/8.0.1/libexec/lib/python3.10/site-packages/pandas/core/series.py:1929, in Series.groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed, dropna) 1925 axis = self._get_axis_number(axis) 1927 # error: Argument "squeeze" to "SeriesGroupBy" has incompatible type 1928 # "Union[bool, NoDefault]"; expected "bool" -> 1929 return SeriesGroupBy( 1930 obj=self, 1931 keys=by, 1932 axis=axis, 1933 level=level, 1934 as_index=as_index, 1935 sort=sort, 1936 group_keys=group_keys, 1937 squeeze=squeeze, # type: ignore[arg-type] 1938 observed=observed, 1939 dropna=dropna, 1940 ) File /usr/local/Cellar/ipython/8.0.1/libexec/lib/python3.10/site-packages/pandas/core/groupby/groupby.py:882, in GroupBy.__init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, observed, mutated, dropna) 879 if grouper is None: 880 from pandas.core.groupby.grouper import get_grouper --> 882 grouper, exclusions, obj = get_grouper( 883 obj, 884 keys, 885 axis=axis, 886 level=level, 887 sort=sort, 888 observed=observed, 889 mutated=self.mutated, 890 dropna=self.dropna, 891 ) 893 self.obj = obj 894 self.axis = obj._get_axis_number(axis) File /usr/local/Cellar/ipython/8.0.1/libexec/lib/python3.10/site-packages/pandas/core/groupby/grouper.py:882, in get_grouper(obj, key, axis, level, sort, observed, mutated, validate, dropna) 880 in_axis, level, gpr = False, gpr, None 881 else: --> 882 raise KeyError(gpr) 883 elif isinstance(gpr, Grouper) and gpr.key is not None: 884 # Add key to exclusions 885 exclusions.add(gpr.key) KeyError: 'themes' even though themes is there. Themes = type
Advertisement
Answer
Use:
df['X_sq'] = df['Value'].pow(2).groupby(df['Type']).cumsum() print(df) # Output Value Type X_sq 0 -1.975767 Weather 3.903655 1 -0.540979 Fruits 0.292658 2 -2.359127 Fruits 5.858138 3 -2.815604 Corona 7.927626 4 -0.929755 Weather 4.768100