Skip to content
Advertisement

Matplotlib: how to classify values/data in a scatter plot?

I’m trying to create a scatter plot that, on the graph, you can differentiate two things:

  1. By color. For example, if the value is negative the color is red and if the value is positive the color is blue.

  2. By marker size. For example, if the value it’s between -0.20 and 0 size is 100, if the value is between 0 and 0.1 size is 200, if the value is between 0.5 and 1 size is 300 and so on…

Here is some of the data I’m working on (just in case):

0    0.15
1    0.04
2    0.02
3    0.01
4   -0.03
5   -0.07
6   -0.25
7   -0.27
8   -0.30

I have tried the following:

fig = plt.figure(figsize=(15, 8))
ax = fig.add_subplot(1, 1, 1) 
    
res = np.genfromtxt(os.path.join(folder, 'residuals.csv'), delimiter=',', names=True)

for name in res.dtype.names[1:]: 
    plt.scatter(res.x, res.y, s=200, c=res.residual, cmap='jet')

That works fine but it only sorts my data by color. The size is the same and I can’t tell which are negative/positive values, so that’s why I’m looking for those two conditions previously mentioned.

Any help is very appreciated!

Advertisement

Answer

Seaborn’s scatterplot allows both coloring and a size depending on variables. Here is how it could look like with your type of data.

import matplotlib.pyplot as plt
from matplotlib.colors import TwoSlopeNorm
import numpy as np
import seaborn as sns

res = np.array([0.15, 0.04, 0.02, 0.01, -0.03, -0.07, -0.25, -0.27, -0.30])
x = np.arange(len(res))
y = np.ones(len(res))
hue_norm = TwoSlopeNorm(vcenter=0)  # to make sure the center color goes to zero
ax = sns.scatterplot(x=x, y=y, hue=res, hue_norm=hue_norm, size=res, sizes=(100, 300), palette='Spectral')

for xi, resi in zip(x, res):
    ax.text(xi, 1.1, f'{resi:.2f}', ha='center', va='bottom')
ax.set_ylim(0.75, 2)
ax.set_yticks([])
plt.show()

seaborn scatterplot with sizes and hue

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement