I like to draw 2×2 / BCG matrices. This time I have a rather big dataset (more than 50 topics and multiple values, e. g. A and B). I wonder how I can draw this using Python?
The result should look similiar to this:
I have found a couple of questions regarding scatter plots, but none of those really deals well with e.g. two topics with identical values (see topics 3,2,L,J,… above in the drawing).
The ID should be displayed in the drawing and ID’s with same set of values should not overlap, but stay rather close together.
Is there a way to do this? If not Python, I am also happy for other suggestions.
Here is an example dataset:
ID Name value_A value_B A topic_1 2 4 B topic_2 4 2 C topic_3 3 3 D topic_4 3 5 E topic_5 3 4 F topic_6 5 1 G topic_7 4 5 H topic_8 1 2 I topic_9 4 1 J topic_10 3 3 K topic_11 5 5 L topic_12 5 3 M topic_13 3 5 N topic_14 1 5 O topic_15 4 1 P topic_16 4 2 Q topic_17 1 5 R topic_18 2 3 S topic_19 1 2 T topic_20 5 1 U topic_21 3 4 V topic_22 2 5 W topic_23 1 3 X topic_24 3 3 Y topic_25 4 1 Z topic_26 2 4 1 topic_27 2 4 2 topic_28 5 4 3 topic_29 3 3 4 topic_30 4 4 5 topic_31 3 2 6 topic_32 4 2 7 topic_33 2 3 8 topic_34 2 3 9 topic_35 2 5 10 topic_36 4 2
Advertisement
Answer
The code below should get pretty close to what you’re looking for, I think. The basic idea is that each set of points clustered at a location are placed in a circle centered on that location. I defined the radius of the circle in a bit of an ad hoc way just to make it look nice for the dimensions I encountered, but you might need to alter it a bit for your specific task.
First, this is just a copy/paste of your values put into a list.
values = ['ID Name value_A value_B', 'A topic_1 2 4', 'B topic_2 4 2', 'C topic_3 3 3', 'D topic_4 3 5', 'E topic_5 3 4', 'F topic_6 5 1', 'G topic_7 4 5', 'H topic_8 1 2', 'I topic_9 4 1', 'J topic_10 3 3', 'K topic_11 5 5', 'L topic_12 5 3', 'M topic_13 3 5', 'N topic_14 1 5', 'O topic_15 4 1', 'P topic_16 4 2', 'Q topic_17 1 5', 'R topic_18 2 3', 'S topic_19 1 2', 'T topic_20 5 1', 'U topic_21 3 4', 'V topic_22 2 5', 'W topic_23 1 3', 'X topic_24 3 3', 'Y topic_25 4 1', 'Z topic_26 2 4', '1 topic_27 2 4', '2 topic_28 5 4', '3 topic_29 3 3', '4 topic_30 4 4', '5 topic_31 3 2', '6 topic_32 4 2', '7 topic_33 2 3', '8 topic_34 2 3', '9 topic_35 2 5', '10 topic_36 4 2']
Next, take the data you provided above and organize it slightly differently into one list of IDs and another of values for A and B.
import re values = [re.split(r's+', v) for v in values][1:] points = [[int(v[2]), int(v[3])] for v in values] labels = [v[0] for v in values]
Now we need to find the unique AB pairs and their ID. There are many ways to arrive at this from your original list, others may have improved suggestions depending on your original data structure and efficiency considerations.
unique_points = [] n_labels = [] for i in range(len(points)): if points[i] not in unique_points: unique_points.append(points[i]) n_labels.append([labels[i],]) else: n_labels[unique_points.index(points[i])] += [labels[i],]
For another project of mine, I had designed this class to do something very similar to what you’re trying to do and so I’m re implementing it here with a couple of variations. Basically each unique point and accompanying IDs go into their own object, which allows you to plot the points in a circle centered around the unique point easily.
import numpy as np from matplotlib import pyplot as plt class clique(): def __init__(self, center, labels, r): self.n = len(labels) self.x = center[0] self.y = center[1] self.labels = labels self.r = r # The random addition below just spins the points about # the circle so groups of the same size look different self.theta = np.arange(0, 2 * np.pi, 2 * np.pi / self.n) + np.random.rand() * 2 * np.pi if self.n == 1: self.nodes_x = [self.x,] self.nodes_y = [self.y,] else: self.nodes_x = self.x + r * np.cos(self.theta) self.nodes_y = self.y + r * np.sin(self.theta) def draw_nodes(self, shape = 'o', color = 'k', markersize = 12): for i in range(len(self.nodes_x)): plt.plot(self.nodes_x[i], self.nodes_y[i], shape, color = color, markersize = markersize) def label_nodes(self, color = 'w', fs = 10): for i in range(len(self.nodes_x)): plt.text(self.nodes_x[i], self.nodes_y[i], self.labels[i], va = 'center', ha = 'center', color = color, fontsize = fs)
Now, create clique
object for each cluster of points and plot it.
for i in range(len(unique_points)): radius = 0.05 + 0.2 / 5 * len(n_labels[i]) G = clique(unique_points[i], n_labels[i], radius) G.draw_nodes() G.label_nodes()
And, finally, clean up the plot a little.
plt.axis('equal') # This ensures things look circular on the # figure. If you want non-equal axes and a circular # look, you'll need to work out the equation for # plotting in "clique" as ellipses based on the # figure dimensions A = np.array([u[0] for u in unique_points]) B = np.array([u[1] for u in unique_points]) plt.xticks([min(A), max(A)], ['Low', 'High']) plt.yticks([min(B), max(B)], ['Low', 'High']) plt.xlabel('A') plt.ylabel('B')