Skip to content
Advertisement

Reproducing a 2d histogram in Python

I’m working with a very large dataset in Python, so I’m trying to use histograms instead of arrays (the arrays get way too large for saving/loading/mapping). I’m crawling over a bunch of files and pulling information from them, and I would like to then take the information and remake the histograms afterwards. I can do this with a 1D histogram as follows:

counter, bins = np.histogram(nSigmaProtonHisto, bins=1000, range=(-10000, 10000))
nSigmaProtonPico[0] += counter
nSigmaProtonPico[1] = bins[:-1]

nSigmaProtonPico is a 2D array to store the bin edges and the final count for the histogram values. nSigmaProtonHisto is a 1D array for a particular event, and I loop over millions of events. Once the script is done, it will have crawled over all the events and I’ll have a 2D array with the histogram values and positions. I can simply graph it, like so:

plt.plot(nSigmaProtonPico[1], nSigmaProtonPico[0])

enter image description here

When I try to do this for a 2D histogram, it falls apart. I’m missing something. Here’s what I have:

counter, bins1, bins2 = np.histogram2d(dEdX, pG, bins=1000, range=((0, 20), (-5, 5)))
dEdXpQPRIME[0] += counter[0]
dEdXpQPRIME[1] += counter[1]
dEdXpQPRIME[2] = bins1[:-1]
dEdXpQPRIME[3] = bins2[:-1]

This gets me something, but I can’t figure out how to plot it so that I reproduce the histogram I would have from all the data. I would think it would be as simple as x, y, and z coordinates, but there are 4 and not 3 coordinates.

What am I missing?

Advertisement

Answer

counter is a 2D array. Provided you have the same bins at each call of histogram2d, you will get an array of the same size. You can therefore simply add all the counter arrays. Consider:

x1, y1 = np.random.normal(loc=0,scale=1, size=(2,10000))
x2, y2 = np.random.normal(loc=3,scale=1, size=(2,10000))

x_bins = np.linspace(-5,5,100)
y_bins = np.linspace(-5,5,100)

H1, xedges, yedges = np.histogram2d(x1, y1, bins=(x_bins, y_bins))
H2, xedges, yedges = np.histogram2d(x2, y2, bins=(x_bins, y_bins))

H1 and H2 are both shape (99,99) (100 edges in each dimension).

X, Y = np.meshgrid(xedges, yedges)
H = H1+H2

fig, axs = plt.subplots(1,3, figsize=(9,3))
axs[0].pcolormesh(X, Y, H1)
axs[1].pcolormesh(X, Y, H2)
axs[2].pcolormesh(X, Y, H)

enter image description here

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement