Skip to content
Advertisement

Plotting catecorigal XY data including labels using Python (e. g. BCG matrices)

I like to draw 2×2 / BCG matrices. This time I have a rather big dataset (more than 50 topics and multiple values, e. g. A and B). I wonder how I can draw this using Python?

The result should look similiar to this:

enter image description here

I have found a couple of questions regarding scatter plots, but none of those really deals well with e.g. two topics with identical values (see topics 3,2,L,J,… above in the drawing).

The ID should be displayed in the drawing and ID’s with same set of values should not overlap, but stay rather close together.

Is there a way to do this? If not Python, I am also happy for other suggestions.

Here is an example dataset:

ID  Name        value_A     value_B
A   topic_1     2           4
B   topic_2     4           2
C   topic_3     3           3
D   topic_4     3           5
E   topic_5     3           4
F   topic_6     5           1
G   topic_7     4           5
H   topic_8     1           2
I   topic_9     4           1
J   topic_10    3           3
K   topic_11    5           5
L   topic_12    5           3
M   topic_13    3           5
N   topic_14    1           5
O   topic_15    4           1
P   topic_16    4           2
Q   topic_17    1           5
R   topic_18    2           3
S   topic_19    1           2
T   topic_20    5           1
U   topic_21    3           4
V   topic_22    2           5
W   topic_23    1           3
X   topic_24    3           3
Y   topic_25    4           1
Z   topic_26    2           4
1   topic_27    2           4
2   topic_28    5           4
3   topic_29    3           3
4   topic_30    4           4
5   topic_31    3           2
6   topic_32    4           2
7   topic_33    2           3
8   topic_34    2           3
9   topic_35    2           5
10  topic_36    4           2

Advertisement

Answer

The code below should get pretty close to what you’re looking for, I think. The basic idea is that each set of points clustered at a location are placed in a circle centered on that location. I defined the radius of the circle in a bit of an ad hoc way just to make it look nice for the dimensions I encountered, but you might need to alter it a bit for your specific task.

First, this is just a copy/paste of your values put into a list.

values = ['ID  Name        value_A     value_B',
          'A   topic_1     2           4',
          'B   topic_2     4           2',
          'C   topic_3     3           3',
          'D   topic_4     3           5',
          'E   topic_5     3           4',
          'F   topic_6     5           1',
          'G   topic_7     4           5',
          'H   topic_8     1           2',
          'I   topic_9     4           1',
          'J   topic_10    3           3',
          'K   topic_11    5           5',
          'L   topic_12    5           3',
          'M   topic_13    3           5',
          'N   topic_14    1           5',
          'O   topic_15    4           1',
          'P   topic_16    4           2',
          'Q   topic_17    1           5',
          'R   topic_18    2           3',
          'S   topic_19    1           2',
          'T   topic_20    5           1',
          'U   topic_21    3           4',
          'V   topic_22    2           5',
          'W   topic_23    1           3',
          'X   topic_24    3           3',
          'Y   topic_25    4           1',
          'Z   topic_26    2           4',
          '1   topic_27    2           4',
          '2   topic_28    5           4',
          '3   topic_29    3           3',
          '4   topic_30    4           4',
          '5   topic_31    3           2',
          '6   topic_32    4           2',
          '7   topic_33    2           3',
          '8   topic_34    2           3',
          '9   topic_35    2           5',
          '10  topic_36    4           2']

Next, take the data you provided above and organize it slightly differently into one list of IDs and another of values for A and B.

import re
values = [re.split(r's+', v) for v in values][1:]
points = [[int(v[2]), int(v[3])] for v in values]
labels = [v[0] for v in values]

Now we need to find the unique AB pairs and their ID. There are many ways to arrive at this from your original list, others may have improved suggestions depending on your original data structure and efficiency considerations.

unique_points = []
n_labels = []

for i in range(len(points)):
    if points[i] not in unique_points:
        unique_points.append(points[i])
        n_labels.append([labels[i],])
    else:
        n_labels[unique_points.index(points[i])] += [labels[i],]

For another project of mine, I had designed this class to do something very similar to what you’re trying to do and so I’m re implementing it here with a couple of variations. Basically each unique point and accompanying IDs go into their own object, which allows you to plot the points in a circle centered around the unique point easily.

import numpy as np
from matplotlib import pyplot as plt


class clique():
    def __init__(self, center, labels, r):
        self.n = len(labels)
        self.x = center[0]
        self.y = center[1]
        self.labels = labels
        self.r = r
        
        # The random addition below just spins the points about 
        # the circle so groups of the same size look different
        self.theta = np.arange(0, 2 * np.pi, 2 * np.pi / self.n) + np.random.rand() * 2 * np.pi
        
        if self.n == 1: 
            self.nodes_x = [self.x,]
            self.nodes_y = [self.y,]
        else: 
            self.nodes_x = self.x + r * np.cos(self.theta)
            self.nodes_y = self.y + r * np.sin(self.theta)
            
    def draw_nodes(self, shape = 'o', color = 'k', markersize = 12):
        for i in range(len(self.nodes_x)):
            plt.plot(self.nodes_x[i], self.nodes_y[i], shape, color = color,
                     markersize = markersize)
    def label_nodes(self, color = 'w', fs = 10):
        for i in range(len(self.nodes_x)):
            plt.text(self.nodes_x[i], self.nodes_y[i], self.labels[i],
                     va = 'center', ha = 'center', color = color, fontsize = fs)

Now, create clique object for each cluster of points and plot it.

for i in range(len(unique_points)):
    radius = 0.05 + 0.2 / 5 * len(n_labels[i])
    G = clique(unique_points[i], n_labels[i], radius)
    G.draw_nodes()
    G.label_nodes()

And, finally, clean up the plot a little.

plt.axis('equal') # This ensures things look circular on the
                  # figure. If you want non-equal axes and a circular
                  # look, you'll need to work out the equation for
                  # plotting in "clique" as ellipses based on the
                  # figure dimensions

A = np.array([u[0] for u in unique_points])
B = np.array([u[1] for u in unique_points])
plt.xticks([min(A), max(A)], ['Low', 'High'])
plt.yticks([min(B), max(B)], ['Low', 'High'])
plt.xlabel('A')
plt.ylabel('B')

resultant figure

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement