I have a dataset that looks like this:
JavaScript
x
7
1
Value Type X_sq
2
-1.975767 Weather
3
-0.540979 Fruits
4
-2.359127 Fruits
5
-2.815604 Corona
6
-0.929755 Weather
7
I want to iterate through each row and calculate a sum of squares value for each row above (only if the Type matches). I want to put this value in the X.sq column.
So for example, in the first row, there’s nothing above. So only (-1.975767 x -1.975767). In the second row, there’s no FRUITS row above it, so it will just be -0.540979 x -0.540979. However, in the third row, when we scan all previous rows, we should find that FRUITS is already there. So we should get the last’s FRUIT’s ….. X_sq value and calculate a new sum of squares.
JavaScript
1
7
1
Value Type X_sq
2
-1.975767 Weather -1.975767 * -1.975767 = x
3
-0.540979 Fruits -0.540979 * -0.540979 = y
4
-2.359127 Fruits y + ( -2.359127 x -2.359127)
5
-2.815604 Corona -2.815604 * -2.815604
6
-0.929755 Weather x + (-0.929755 * -0.929755)
7
What would be an efficient way to do this?
JavaScript
1
3
1
def updateSS(X_sq, X_new):
2
return X_sq + X_new**2
3
EDIT:
JavaScript
1
47
47
1
----> 1 df['sumOfSquares'] = df['avg_country_tone'].pow(2).groupby(['themes', 'suppliers_country']).cumsum()
2
3
4
File /usr/local/Cellar/ipython/8.0.1/libexec/lib/python3.10/site-packages/pandas/core/series.py:1929, in Series.groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed, dropna)
5
1925 axis = self._get_axis_number(axis)
6
1927 # error: Argument "squeeze" to "SeriesGroupBy" has incompatible type
7
1928 # "Union[bool, NoDefault]"; expected "bool"
8
-> 1929 return SeriesGroupBy(
9
1930 obj=self,
10
1931 keys=by,
11
1932 axis=axis,
12
1933 level=level,
13
1934 as_index=as_index,
14
1935 sort=sort,
15
1936 group_keys=group_keys,
16
1937 squeeze=squeeze, # type: ignore[arg-type]
17
1938 observed=observed,
18
1939 dropna=dropna,
19
1940 )
20
21
File /usr/local/Cellar/ipython/8.0.1/libexec/lib/python3.10/site-packages/pandas/core/groupby/groupby.py:882, in GroupBy.__init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, observed, mutated, dropna)
22
879 if grouper is None:
23
880 from pandas.core.groupby.grouper import get_grouper
24
--> 882 grouper, exclusions, obj = get_grouper(
25
883 obj,
26
884 keys,
27
885 axis=axis,
28
886 level=level,
29
887 sort=sort,
30
888 observed=observed,
31
889 mutated=self.mutated,
32
890 dropna=self.dropna,
33
891 )
34
893 self.obj = obj
35
894 self.axis = obj._get_axis_number(axis)
36
37
File /usr/local/Cellar/ipython/8.0.1/libexec/lib/python3.10/site-packages/pandas/core/groupby/grouper.py:882, in get_grouper(obj, key, axis, level, sort, observed, mutated, validate, dropna)
38
880 in_axis, level, gpr = False, gpr, None
39
881 else:
40
--> 882 raise KeyError(gpr)
41
883 elif isinstance(gpr, Grouper) and gpr.key is not None:
42
884 # Add key to exclusions
43
885 exclusions.add(gpr.key)
44
45
KeyError: 'themes'
46
even though themes is there. Themes = type
47
Advertisement
Answer
Use:
JavaScript
1
11
11
1
df['X_sq'] = df['Value'].pow(2).groupby(df['Type']).cumsum()
2
print(df)
3
4
# Output
5
Value Type X_sq
6
0 -1.975767 Weather 3.903655
7
1 -0.540979 Fruits 0.292658
8
2 -2.359127 Fruits 5.858138
9
3 -2.815604 Corona 7.927626
10
4 -0.929755 Weather 4.768100
11