I have a two columns Pandas data frame containing a list of user_ids and some URLs they have visited. It looks like this:
JavaScript
x
11
11
1
users urls
2
0 user1 url1
3
1 user1 url3
4
2 user1 url5
5
3 user2 url2
6
4 user2 url4
7
5 user2 url5
8
6 user3 url1
9
7 user3 url4
10
8 user3 url5
11
I want to create a vector representation of itself, like this:
JavaScript
1
5
1
url1 url2 url3 url4 url5
2
user1 1.0 NaN 1.0 NaN 1.0
3
user2 NaN 1.0 NaN 1.0 1.0
4
user3 1.0 NaN NaN 1.0 1.0
5
I’ve tried different things, but keep hitting a wall. Any ideas?
Advertisement
Answer
What you’re describing is a pivot of the url column
JavaScript
1
24
24
1
# Make data
2
df = pd.DataFrame([
3
['user1', 'url1'],
4
['user1', 'url3'],
5
['user1', 'url5'],
6
['user2', 'url2'],
7
['user2', 'url4'],
8
['user2', 'url5'],
9
['user3', 'url1'],
10
['user3', 'url4'],
11
['user3', 'url5']
12
], columns=['users', 'urls'])
13
# add column to fill pivoted values
14
df['count'] = 1
15
16
new_df = df.pivot(index='users',columns='urls',values='count').fill_na(0)
17
new_df
18
19
# urls url1 url2 url3 url4 url5
20
# users
21
# user1 1.0 0.0 1.0 0.0 1.0
22
# user2 0.0 1.0 0.0 1.0 1.0
23
# user3 1.0 0.0 0.0 1.0 1.0
24
This puts the users column in the index, but you can use reset_index to make it a regular column again.