Plotting catecorigal XY data including labels using Python (e. g. BCG matrices)

Question

I like to draw 2&#215;2 / BCG matrices. This time I have a rather big dataset (more than 50 topics and multiple values, e. g. A and B). I wonder how I can draw this using Python? The result should look similiar to this: I have found a couple of questions regarding scatter plots, but none of those really deals

Accepted Answer

The code below should get pretty close to what you&#8217;re looking for, I think. The basic idea is that each set of points clustered at a location are placed in a circle centered on that location. I defined the radius of the circle in a bit of an ad hoc way just to make it look nice for the dimensions I encountered, but you might need to alter it a bit for your specific task.First, this is just a copy/paste of your values put into a list.values = ['ID  Name        value_A     value_B',          'A   topic_1     2           4',          'B   topic_2     4           2',          'C   topic_3     3           3',          'D   topic_4     3           5',          'E   topic_5     3           4',          'F   topic_6     5           1',          'G   topic_7     4           5',          'H   topic_8     1           2',          'I   topic_9     4           1',          'J   topic_10    3           3',          'K   topic_11    5           5',          'L   topic_12    5           3',          'M   topic_13    3           5',          'N   topic_14    1           5',          'O   topic_15    4           1',          'P   topic_16    4           2',          'Q   topic_17    1           5',          'R   topic_18    2           3',          'S   topic_19    1           2',          'T   topic_20    5           1',          'U   topic_21    3           4',          'V   topic_22    2           5',          'W   topic_23    1           3',          'X   topic_24    3           3',          'Y   topic_25    4           1',          'Z   topic_26    2           4',          '1   topic_27    2           4',          '2   topic_28    5           4',          '3   topic_29    3           3',          '4   topic_30    4           4',          '5   topic_31    3           2',          '6   topic_32    4           2',          '7   topic_33    2           3',          '8   topic_34    2           3',          '9   topic_35    2           5',          '10  topic_36    4           2']Next, take the data you provided above and organize it slightly differently into one list of IDs and another of values for A and B.import revalues = [re.split(r's+', v) for v in values][1:]points = [[int(v[2]), int(v[3])] for v in values]labels = [v[0] for v in values]Now we need to find the unique AB pairs and their ID. There are many ways to arrive at this from your original list, others may have improved suggestions depending on your original data structure and efficiency considerations.unique_points = []n_labels = []for i in range(len(points)):    if points[i] not in unique_points:        unique_points.append(points[i])        n_labels.append([labels[i],])    else:        n_labels[unique_points.index(points[i])] += [labels[i],]For another project of mine, I had designed this class to do something very similar to what you&#8217;re trying to do and so I&#8217;m re implementing it here with a couple of variations. Basically each unique point and accompanying IDs go into their own object, which allows you to plot the points in a circle centered around the unique point easily.import numpy as npfrom matplotlib import pyplot as pltclass clique():    def __init__(self, center, labels, r):        self.n = len(labels)        self.x = center[0]        self.y = center[1]        self.labels = labels        self.r = r                # The random addition below just spins the points about         # the circle so groups of the same size look different        self.theta = np.arange(0, 2 * np.pi, 2 * np.pi / self.n) + np.random.rand() * 2 * np.pi                if self.n == 1:             self.nodes_x = [self.x,]            self.nodes_y = [self.y,]        else:             self.nodes_x = self.x + r * np.cos(self.theta)            self.nodes_y = self.y + r * np.sin(self.theta)                def draw_nodes(self, shape = 'o', color = 'k', markersize = 12):        for i in range(len(self.nodes_x)):            plt.plot(self.nodes_x[i], self.nodes_y[i], shape, color = color,                     markersize = markersize)    def label_nodes(self, color = 'w', fs = 10):        for i in range(len(self.nodes_x)):            plt.text(self.nodes_x[i], self.nodes_y[i], self.labels[i],                     va = 'center', ha = 'center', color = color, fontsize = fs)Now, create clique object for each cluster of points and plot it.for i in range(len(unique_points)):    radius = 0.05 + 0.2 / 5 * len(n_labels[i])    G = clique(unique_points[i], n_labels[i], radius)    G.draw_nodes()    G.label_nodes()And, finally, clean up the plot a little.plt.axis('equal') # This ensures things look circular on the                  # figure. If you want non-equal axes and a circular                  # look, you'll need to work out the equation for                  # plotting in "clique" as ellipses based on the                  # figure dimensionsA = np.array([u[0] for u in unique_points])B = np.array([u[1] for u in unique_points])plt.xticks([min(A), max(A)], ['Low', 'High'])plt.yticks([min(B), max(B)], ['Low', 'High'])plt.xlabel('A')plt.ylabel('B')

Advertisement

Answer