I have a list of sub-categories that correspond to a particular category, think of it like this:
Category Sub Category
a | 1
a | 2
a | 3
b | 4
b | 5
etc…
I was wondering the best way to apply the Category value to each row of the dataframe (~800,000 rows) based on the Sub Category which is defined.
I am currently using this method, but I know its not the best or even good:
df.loc[df.Subcategory =='1', 'Category'] = 'a' df.loc[df.Subcategory =='2', 'Category'] = 'a' df.loc[df.Subcategory =='3', 'Category'] = 'a' df.loc[df.Subcategory =='4', 'Category'] = 'b' and so on...
That leaves me with a long chunk of ugly code and isnt very efficient.
I was wondering if anyone has another method that might be able to help, I am fairly new to coding so this is only the 5th or so code I’ve written and am mostly self taught so any help would be really appreciated.
Advertisement
Answer
Based on your code, it looks like you have a DataFrame column called “Subcategory” and you want to create the column “Category” based on some mapping of subcategory to category. (Your initial description suggests that you already have the “Category” column, but then there would be no point to your code.)
If I’m understanding correctly and you want to create the “Category” column, equal to “a” when subcategory == 1, equal to “a” when subcategory == 2, …, equal to “b” when subcategory == 5, and so on, then you could use the pandas map() function.
subcategory_to_category_map = { "1": "a", "2": "a", "3": "a", "4": "b", "5": "b" } df["Category"] = df["Subcategory"].map( subcategory_to_category_map )
Make sure you use the same data type in the dictionary/map as your “Subcategory” values (i.e. if they are numeric use numberic keys, and if they are strings (“1”, “2”, etc.) then use strings (as shown)). Also note that any value of “Subcategory” that is not a key in the dictionary will result in the new “Category” column having a missing value.