Skip to content
Advertisement

How do I bin and categorize numbers in Python?

I’m not sure if binning is the correct term, but I want to implement the following for a project I am working on:

I have an array or maybe a dict describing boundaries and/or regions, for example:

boundaries = OrderedDict([(10,'red'),(20,'blue'),(55,'purple')])

The areas are indexed from 0 to 100 (for example). I want to classify each area into a color (that is less than the key in the dict) and then plot it. For example, if it is less than 10, it is red.

So far, I have:

boundaries = OrderedDict([(10,'red'),(20,'blue'),(55,'purple')])
areas = range(0,101)
binned = []
for area in areas:
    for border in boundaries.keys():
         if area < border:
             binned.append(boundaries[border])
             break

Also, I need to figure out a way to define the colors and find a package to plot it. So if you have any ideas how can I plot a 2-D color plot (the actual project will be in 2-D). Maybe matplotlib or PIL? I have used matplotlib before but never for this type of data.

Also, is there a scipy/numpy function that already does what I’m trying to do? It would be nice if the code is short and fast. This is not for an assignment of any sort (it’s for a little experiment / data project of mine), so I don’t want to reinvent the wheel here.

Advertisement

Answer

import matplotlib.pyplot as plt
boundaries = collections.OrderedDict([(10,'red'),(20,'blue'),(55,'purple')])
areas = range(0,101)
n, bins, patches = plt.hist(areas, [0]+list(boundaries), histtype='bar', rwidth=1.0)
for (patch,color) in zip(patches,boundaries.values()):
    patch.set_color(color)
plt.show()
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement