Benford's Law

All:

Once upon a time I had an extension that generated random numbers that
adhered to Benford's Law.

However, neither http://extensions.libreoffice.org/ nor
http://extensions.services.openoffice.org nor
http://libreplanet.org/wiki/Group:OpenOfficeExtensions/List nor
http://www.multiracio.com/index.php?lang=en&style=eurooffice&page=eo_ext
show anything that is similar.

Does anybody know what happened to that extension?

jonathon

Do you need one? Unless I misunderstand, the formula
=10^RAND()
should create random variates in the range (1,10) following the law.

(Here's hoping you will not use this to create plausible fake scientific data!)

I trust this helps.

Brian Barker

10^RAND generates a set of random numbers that does _not_ adhere to
Benford's Law. I need a random number generator whose output does
adhere to Benford's Law.

jonathon

Unless I misunderstand,the formula =10^RAND() should create random variates in the range (1,10) following the law.

10^RAND generates a set of random numbers that does _not_ adhere to Benford's Law.

Well, I must say you have not exactly offered much evidence for this assertion! If we knew why you thought this was so, we might be able to help.

Probabilities of initial digits according to Benford's Law and proportions from 10 000 trials of my formula:
   1 2 3 4 5 6 7 8 9
.301 .176 .125 .097 .079 .067 .058 .051 .046
.310 .171 .126 .093 .080 .066 .059 .053 .043

How much closer would you expect these trials to get? Have you done a chi-squared test?

I need a random number generator whose output does adhere to Benford's Law.

And I wish you good luck in persuading someone to help!

Brian Barker

Hi
From http://en.wikipedia.org/wiki/Benford's_law
Therefore, this is the distribution expected if the mantissae <http://en.wikipedia.org/wiki/Significand> of the /logarithms/ of the numbers (but not the numbers themselves) are uniformly and randomly distributed <http://en.wikipedia.org/wiki/Uniform_distribution_(continuous)>.

In a logarithm the part after the decimal point is the mantissa. Convert a logarithm to decimal by number=10^(logarithm).
For numbers between 1 and 10 logarithm will be between 0 and 1. Therefore 10^(rand()) should produce numbers randomly between 1 and 10 conforming with Benford's law.

=10^RAND() is the same as 10^(rand())

Steve

Uniform random number generators do not conform to Benford's law.

To get uniform digits in the range 1 to 10, try =FLOOR(10*RAND();1;1)

However, Benford's law is about the *first* digit of a wide variety of numbers.
See <http://en.wikipedia.org/wiki/Benford's_law#Mathematical_statement>.

To get the Benford distribution of digits 1 to 9,
I think you want =FLOOR(10^RAND();1;1)

What makes you think these do not have the Benford distribution? How are you testing that.

You should be able to create a histogram for the frequencies of values of 1, 2, 3, ..., 9 and show that it approaches the Benford distribution as you increase the number of samples.

While the RNG used for RAND() may not be cryptographically wonderful, I expect it would pass a reasonable test (say chi-squared) for correspondence to the Benford distribution.

- Dennis

I meant, of course, that large samples of =FLOOR(10^RAND();1;1) should satisfy the chi-squared distribution for conformance to the Benford Distribution.

Hey Jonathon,

This is the distribution I got after 100 million cycles:
10^RAND

9 - 0.045
8 - 0.051
7 - 0.057
6 - 0.066
5 - 0.079
4 - 0.096
3 - 0.124
2 - 0.176
1 - 0.300

I don't think you are going to get any closer to Benford's law distribution than that.

Here is the code I used:

use strict;
use warnings;

my $num;
my $lpcnt;
my @Distribution;
$Distribution[0]=0;

srand;

$lpcnt = 100000000;
while ($lpcnt) {
         $num=substr(10**rand(),0,1);
         ++$Distribution[$num];
         --$lpcnt;
}

$lpcnt = 9;
while ($lpcnt) {
         print "$lpcnt - ".substr($Distribution[$lpcnt]/100000000,0,5)."\n";
         --$lpcnt;
}

First three digits, not first digit.
The fourth and subsequent digits should be uniformly distributed.

jonathon

The striking thing about variates that follow Benford's Law is indeed that the initial digits are not equally distributed. But such variates don't stop being what they are after some arbitrary number of significant figures - whether it be one, three, or any other. The values come from the distribution - so all their decimal digits are part of the story.

If you want values that follow Benford's Law up to three digits, you can easily take the true values from my suggested formula, truncate (or round?) them after three digits, and add further random digits selected from a uniform distribution.

But I'm not at all sure why you would want to do this. Does the source of this suggestion mean merely that, although the early digits are not uniformly distributed, later ones are more nearly so - and that there is little point in worrying about the difference after, say, three digits? If so, there is equally no point in worrying that these later digits might be too right!

Brian Barker

don't stop being what they are after some arbitrary number of
significant figures - whether it be one, three, or any other.

At the fourth significant digit, 0 and 9 occur slightly (¿1:10,000?)
more frequently than 1, 2, 3, 4, 5, 6, 7, and 8. For most practical
purposes, the fourth digit can be treated as a uniformly random number.
At the fifth, and subsequent digits, the numbers are randomly, and
uniformly distributed.

If you want values that follow Benford's Law up to three digits, you can
easily take the true values from my suggested formula, truncate (or
round?) them after three digits, and add further random digits selected
from a uniform distribution.

And my original post was asking what happened to the macro that
automatically did that.

jonathon

... don't stop being what they are after some arbitrary number of significant figures - whether it be one, three, or any other.

At the fourth significant digit, 0 and 9 occur slightly (¿1:10,000?) more frequently than 1, 2, 3, 4, 5, 6, 7, and 8. For most practical purposes, the fourth digit can be treated as a uniformly random number. At the fifth, and subsequent digits, the numbers are randomly, and uniformly distributed.

It's surely intuitively obvious that this cannot be so. The first digits are very non-uniformly distributed, the second ones less so, and so on. What your source is telling you is that the fourth digit is very, very nearly uniformly distributed and that subsequent digits are so nearly so that they may be considered so for all practical purposes - not that they really are. (That would be wrong.)

In any case, if the formula I suggested works (and we've seen plenty of evidence that it does and none that it doesn't - but I'm still open to correction), then according to your theory it *will* provide such uniformly distributed digits after the fourth. You've already got what you want: the problem appears to be that you cannot believe the distribution will turn out the way that you say it will! It's irrational of you to suggest removing one set of digits that you claim are already uniformly distributed and replacing them with another also uniformly distributed set! And if there were any difference, how many variates would you have to call upon before any difference would be noticeable? Billions of billions of billions?! More than you are going to use, at any rate.

If you want values that follow Benford's Law up to three digits, you can easily take the true values from my suggested formula, truncate (or round?) them after three digits, and add further random digits selected from a uniform distribution.

And my original post was asking what happened to the macro that automatically did that.

And you now have an even simpler solution: a formula that does it. But you are very welcome not to use it if you don't like it. Even if you wanted to make the irrational change, you could easily construct a formula to do this.

Brian Barker