I’m working with a very large dataset in Python, so I’m trying to use histograms instead of arrays (the arrays get way too large for saving/loading/mapping). I’m crawling over a bunch of files and pulling information from them, and I would like to then take the information and remake the histograms afterwards. I can do this with a 1D histogram as follows:
counter, bins = np.histogram(nSigmaProtonHisto, bins=1000, range=(-10000, 10000)) nSigmaProtonPico[0] += counter nSigmaProtonPico[1] = bins[:-1]
nSigmaProtonPico is a 2D array to store the bin edges and the final count for the histogram values. nSigmaProtonHisto is a 1D array for a particular event, and I loop over millions of events. Once the script is done, it will have crawled over all the events and I’ll have a 2D array with the histogram values and positions. I can simply graph it, like so:
plt.plot(nSigmaProtonPico[1], nSigmaProtonPico[0])
When I try to do this for a 2D histogram, it falls apart. I’m missing something. Here’s what I have:
counter, bins1, bins2 = np.histogram2d(dEdX, pG, bins=1000, range=((0, 20), (-5, 5))) dEdXpQPRIME[0] += counter[0] dEdXpQPRIME[1] += counter[1] dEdXpQPRIME[2] = bins1[:-1] dEdXpQPRIME[3] = bins2[:-1]
This gets me something, but I can’t figure out how to plot it so that I reproduce the histogram I would have from all the data. I would think it would be as simple as x, y, and z coordinates, but there are 4 and not 3 coordinates.
What am I missing?
Advertisement
Answer
counter
is a 2D array. Provided you have the same bins at each call of histogram2d
, you will get an array of the same size. You can therefore simply add all the counter
arrays.
Consider:
x1, y1 = np.random.normal(loc=0,scale=1, size=(2,10000)) x2, y2 = np.random.normal(loc=3,scale=1, size=(2,10000)) x_bins = np.linspace(-5,5,100) y_bins = np.linspace(-5,5,100) H1, xedges, yedges = np.histogram2d(x1, y1, bins=(x_bins, y_bins)) H2, xedges, yedges = np.histogram2d(x2, y2, bins=(x_bins, y_bins))
H1
and H2
are both shape (99,99)
(100 edges in each dimension).
X, Y = np.meshgrid(xedges, yedges) H = H1+H2 fig, axs = plt.subplots(1,3, figsize=(9,3)) axs[0].pcolormesh(X, Y, H1) axs[1].pcolormesh(X, Y, H2) axs[2].pcolormesh(X, Y, H)