How to turn a Pandas DataFrame into a table of vectors

Tags: , , , ,



I have a two columns Pandas data frame containing a list of user_ids and some URLs they have visited. It looks like this:

    users   urls
0   user1   url1
1   user1   url3
2   user1   url5
3   user2   url2
4   user2   url4
5   user2   url5
6   user3   url1
7   user3   url4
8   user3   url5

I want to create a vector representation of itself, like this:

        url1    url2    url3    url4    url5
user1   1.0     NaN     1.0     NaN     1.0
user2   NaN     1.0     NaN     1.0     1.0
user3   1.0     NaN     NaN     1.0     1.0

I’ve tried different things, but keep hitting a wall. Any ideas?

Answer

What you’re describing is a pivot of the url column

# Make data
df = pd.DataFrame([
               ['user1', 'url1'], 
               ['user1', 'url3'], 
               ['user1', 'url5'],
               ['user2', 'url2'],
               ['user2', 'url4'],
               ['user2', 'url5'],
               ['user3', 'url1'],
               ['user3', 'url4'],
               ['user3', 'url5']
               ], columns=['users', 'urls'])
# add column to fill pivoted values
df['count'] = 1

new_df = df.pivot(index='users',columns='urls',values='count').fill_na(0)
new_df

# urls   url1  url2  url3  url4  url5
# users                              
# user1   1.0   0.0   1.0   0.0   1.0
# user2   0.0   1.0   0.0   1.0   1.0
# user3   1.0   0.0   0.0   1.0   1.0

This puts the users column in the index, but you can use reset_index to make it a regular column again.



Source: stackoverflow