I have a dataset like below-:
JavaScript
x
7
1
df = pd.DataFrame({
2
'state':['California'] * 4 + ['Florida'] * 5 + ['Minnesota'] * 3 + ['New Hampshire'],
3
'population':['50-100', '0-50', '150-200', '50-100', '0-50', '150-200',
4
'100-150', 'NA', '0-50', 'NA', '100-150', '50-100', 'NA'],
5
'locale':['rural', 'urban', 'town', 'suburb', 'suburb', 'urban', 'rural', 'suburb', 'NA', 'town', 'town', 'urban', 'rural']
6
})
7
I want new columns for each category in all columns for each state. An example of a row is below-:
JavaScript
1
3
1
state population=0-50 population=50-100 population=100-150 population=150-200 locale=rural locale=urban locale=town locale=suburb
2
California 1 2 0 1 1 1 1 1
3
EDIT Data dump of 1st 5 rows as asked-:
JavaScript
1
24
24
1
{'state': {0: 'Connecticut',
2
1: 'Connecticut',
3
2: 'Connecticut',
4
3: 'Connecticut',
5
4: 'Connecticut'},
6
'locale': {0: 'Suburb', 1: 'Suburb', 2: 'Suburb', 3: 'Suburb', 4:
7
'Suburb'},
8
'pct_black/hispanic': {0: '[0.6, 0.8[',
9
1: '[0.6, 0.8[',
10
2: '[0.6, 0.8[',
11
3: '[0.6, 0.8[',
12
4: '[0.6, 0.8['},
13
'pct_free/reduced': {0: '[0.2, 0.4[',
14
1: '[0.2, 0.4[',
15
2: '[0.2, 0.4[',
16
3: '[0.2, 0.4[',
17
4: '[0.2, 0.4['},
18
'county_connections_ratio': {0: '[0.18, 1[',
19
1: '[0.18, 1[',
20
2: '[0.18, 1[',
21
3: '[0.18, 1[',
22
4: '[0.18, 1['},
23
'pp_total_raw': {0: 'NA', 1: 'NA', 2: 'NA', 3: 'NA', 4: 'NA'}}
24
Advertisement
Answer
Use pd.get_dummies
+ Groupby.sum()
, as follows:
JavaScript
1
5
1
(pd.get_dummies(df.set_index('state'))
2
.groupby('state').sum()
3
.reset_index()
4
)
5
Result:
JavaScript
1
6
1
state population_0-50 population_100-150 population_150-200 population_50-100 population_NA locale_NA locale_rural locale_suburb locale_town locale_urban
2
0 California 1 0 1 2 0 0 1 1 1 1
3
1 Florida 2 1 1 0 1 1 1 2 0 1
4
2 Minnesota 0 1 0 1 1 0 0 0 2 1
5
3 New Hampshire 0 0 0 0 1 0 1 0 0 0
6
If you want to exclude the entries with value NA
, you can use:
JavaScript
1
5
1
(pd.get_dummies(df[df != 'NA'].set_index('state'))
2
.groupby('state').sum()
3
.reset_index()
4
)
5
Result:
JavaScript
1
6
1
state population_0-50 population_100-150 population_150-200 population_50-100 locale_rural locale_suburb locale_town locale_urban
2
0 California 1 0 1 2 1 1 1 1
3
1 Florida 2 1 1 0 1 2 0 1
4
2 Minnesota 0 1 0 1 0 0 2 1
5
3 New Hampshire 0 0 0 0 1 0 0 0
6