[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Number of histogram bins
From: |
Jason H. Stover |
Subject: |
Re: Number of histogram bins |
Date: |
Sun, 5 Dec 2004 19:14:27 -0500 |
User-agent: |
Mutt/1.4.2.1i |
I found a recent preprint online suggesting an automatic
bin width selection:
L. Birge and Y. Rozenholc. How many bins should be put in a regular histogram.
www.proba.jussieu.fr/pageperso/rozen/preprint/Histo-030629.pdf
This paper also mentions a few other methods, including three used
by GNU/R: Sturges' (number of bins approximately 1 + log_2(N)), Scott's
(mentioned by Ben below), and one by Freedman and Diaconis in:
D. Freedman P. Diaconis. On the histogram as a density estimator: L_2
theory. Z. Wahrscheinlichkeitstheor. Verw. Geb. 1981. 57, 453-476.
Sturges' is the default in R.
-Jason
On Sun, Dec 05, 2004 at 12:18:56PM -0800, Ben Pfaff wrote:
> John Darrington <address@hidden> writes:
>
> > Does anyone know how spss decides the number of bins to construct a
> > histogram? Or can anyone suggest a suitable algorithm for doing so?
>
> PSPP 0.1.0 had vestigial support for plotting histograms. At the
> time, if I recall correctly, I checked out in some detail how
> SPSS/PC+ chose the number of bins. Here's the code that that
> version used to decide:
>
> #define MIN_HIST_BARS 3
> #define MAX_HIST_BARS 20
> ...
> double upper = /* maximum value in data */;
> double lower = /* minimum value in data */;
> if (upper - lower >= 10)
> {
> double l, u;
>
> u = round_up (upper, 5);
> l = round_down (lower, 5);
> nbars = (u - l) / 5;
> if (nbars * 2 + 1 <= MAX_HIST_BARS)
> {
> nbars *= 2;
> u = round_up (upper, 2.5);
> l = round_down (lower, 2.5);
> if (l + 1.25 <= lower && u - 1.25 >= upper)
> nbars--, lower = l + 1.25, upper = u - 1.25;
> else if (l + 1.25 <= lower)
> lower = l + 1.25, upper = u + 1.25;
> else if (u - 1.25 >= upper)
> lower = l - 1.25, upper = u - 1.25;
> else
> nbars++, lower = l - 1.25, upper = u + 1.25;
> }
> else if (nbars < MAX_HIST_BARS)
> {
> if (l + 2.5 <= lower && u - 2.5 >= upper)
> nbars--, lower = l + 2.5, upper = u - 2.5;
> else if (l + 2.5 <= lower)
> lower = l + 2.5, upper = u + 2.5;
> else if (u - 2.5 >= upper)
> lower = l - 2.5, upper = u - 2.5;
> else
> nbars++, lower = l - 2.5, upper = u + 2.5;
> }
> else
> nbars = MAX_HIST_BARS;
> }
> else
> {
> nbars = /* number of unique values in data. */
> if (nbars > MAX_HIST_BARS)
> nbars = MAX_HIST_BARS;
> }
> if (nbars < MIN_HIST_BARS)
> nbars = MIN_HIST_BARS;
> interval = (upper - lower) / nbars;
>
> It seemed to make some kind of sense at the time, but this was
> way back in 1994 or so and I didn't write as many useful comments
> then as I do now. I think that the rationale is roughly this:
> the upper and lower values should by preference be rounded to
> "round" numbers, like multiples of 5, because it makes the graph
> easier to read and data tends to be more naturally interpretable
> that way. Then it tries to recenter the actual range plotted
> based on the actual lower and upper values.
>
> I'm not sure we want any part of this anymore. The above is a
> pretty weak defense of the rationale, and I actually wrote the
> code.
>
> A search for "histogram bin width" turned up this webpage:
> http://www.fmrib.ox.ac.uk/analysis/techrep/tr00mj2/tr00mj2/node24.html
> which gives the formula
> W = 3.49 * s * N^(-1/3)
> as an "optimal bin width" given s as the standard deviation from
> the mean and N as the number of samples, as well as
> W = 2 * IQR * N^(-1/3)
> where IQR is additionally the interquartile range. Either one of
> these would be pretty easy to implement, and the webpage claims
> the latter is more robust.
> --
> On Perl: "It's as if H.P. Lovecraft, returned from the dead and speaking by
> seance to Larry Wall, designed a language both elegant and terrifying for his
> Elder Things to write programs in, and forgot that the Shoggoths didn't turn
> out quite so well in the long run." --Matt Olson
>
>
> _______________________________________________
> pspp-dev mailing list
> address@hidden
> http://lists.gnu.org/mailman/listinfo/pspp-dev
>