Friday 15 July 2011

python - asymmetry in matplotlib histograms -


I had hoped that after reading the matplotlib document about the histogram, that limit would be reduced and the upper outlayer was ignored is.

"Category: Tupal, Optional, Default: None Lower and upper limit of the compartment. Lower and upper outliers are ignored. If not provided, then category (x.min (), x.max

Take a look at the following example:

  Import NP as salt as import matplotlib.pyplot As plt number 1 = np.arange (1., 101.) number 2 = np.arange (0.5,100,5) number 3 = np.arange (0,100) plt.figure (figsize = (12,4) ) Plt .subplot (1,3,1) plt.hist (number1, beans = 25, range = (25,75), criterion = true) plt.title ('numbers1') plt.ylim ((0,0.035)) plt .subplot (1,3,2) plt.hist (numbers 2, bins = 25, range) = (25,75), criteria = true) plt.title ('numbers2') plt.ylim ((0,0.035)) plt .subplot (1,3,3) plt.hist (numbers3, bins = 25, Class = (25,75), criterion = true) plt.title ('n Umbers3') plt.ylim ((0,0.035)) Unfortunately I do not post the result image Can not ... (Not enough reputation), but: There is a value of both the histogram and number 3 of number 1 which is more than expected for the final bin.  

Why does this happen, and should it really happen? I used to expect all of them to look like a middle one. : - (

Enter image details here < P>

All numbers are used in the 25-75 range for the histogram. numbers1 for the arrays and Numbers3 , which is actually 51 numbers, because both 25 and 75 are included. You force these numbers into 25 bins, which means that there will be 24 coaches of the height 2/51 and the height of a 3/51 . Matplotlib in final bin 73, 74 and 75 Chooses to choose and makes it the largest.

For number 2 , the numbers in the 50 range are 25.5 to 74.5, therefore, the height of each bin 2/50 .

You can see that, number1 and number 3 , when you limit the ( 25, 74.99999) make or (25.0000001, 75) , then the upper bins disappear, because either 25 or 75 is excluded and the limit is 50 numbers.


You can get the limits of the compartment, because plt.hist returns values, cans and patches, so if you

  

(n, bins, p) = plt.hist (number1, bins = 25, range = (25,75), criterion = true) > There are 26 boundaries of coaches in the compartment, so it is the starting point of the bin and all the points of the end of the final bin. By using it you can know exactly where each value has gone in the bin.


No comments:

Post a Comment