Say we have this data:
JavaScript
x
11
11
1
list1, list2, list3 = [1,2,3,4], [1990, 1990, 1990, 1991], [2009, 2009, 2009, 2009]
2
df = pd.DataFrame(list(zip(list1, list2, list3)), columns = ['Index', 'Y0', 'Y1'])
3
4
> df
5
6
Index Y0 Y1
7
1 1990 2009
8
2 1990 2009
9
3 1990 2009
10
4 1991 2009
11
I want to count, for each year, how many rows (“index”) fall within each year, but excluding the Y0.
So say we start at the first available year, 1990:
How many rows do we count? 0.
1991:
- Three (row 1, 2, 3)
1992:
- Four (row 1, 2, 3, 4)
…
2009:
- Four (row 1, 2, 3, 4)
So I want to end up with a dataframe that says:
JavaScript
1
7
1
Count Year
2
0 1990
3
3 1991
4
4. 1992
5
6
4 2009
7
My attempt:
JavaScript
1
10
10
1
df['Y0'] = pd.to_datetime(df['Y0'], format='%Y')
2
df['Y1'] = pd.to_datetime(df['Y1'], format='%Y')
3
4
# Group by the interval between Y0 and Y1
5
df = d.groupby([d['Y0'].dt.year, d['Y1'].dt.year]).agg({'count'})
6
df.columns = ['count', 'Y0 count', 'Y1 count']
7
8
# sum the total
9
df_sum = pd.DataFrame(df.groupby(df.index)['count'].sum())
10
But the result doesn’t look right.
Appreciate any help.
Advertisement
Answer
you could do:
JavaScript
1
6
1
min_year = df[['Y0', 'Y1']].values.min()
2
max_year = df[['Y0', 'Y1']].values.max()
3
year_range = np.arange(min_year, max_year+1)
4
counts = ((df[['Y0']].values < year_range) & (year_range<= df[['Y1']].values)).sum(axis=0)
5
o = pd.DataFrame({"counts": counts, 'year': year_range})
6
JavaScript
1
22
22
1
counts year
2
0 0 1990
3
1 3 1991
4
2 4 1992
5
3 4 1993
6
4 4 1994
7
5 4 1995
8
6 4 1996
9
7 4 1997
10
8 4 1998
11
9 4 1999
12
10 4 2000
13
11 4 2001
14
12 4 2002
15
13 4 2003
16
14 4 2004
17
15 4 2005
18
16 4 2006
19
17 4 2007
20
18 4 2008
21
19 4 2009
22