Non-uniform shuffling

A shuffle of a finite sequence of length n of distinguishable elements refers to an algorithmic process which — modulo pseudo-randomness — can be modeled as a random variable uniformly distributed on the permutations \mathbb{S}_n.
However, most pseudo-random entropy sources provide only a pseudo-uniformly distributed realization of \mathbb{F}_2^\mathbb{N}, leading to the necessity of finding an algorithmic transformation process if one wishes to achieve a shuffle.
In the following, I will assume that a transforming process to a family of independent and uniformly on \{1..n\} distributed random variables is already present for any n\in\mathbb{N}.

One naive and seemingly correct (it is not) approach is to traverse the given sequence, uniformly swapping the current entry with another one, i.e.

void falseShuffle(uint64_t *arr, size_t len) {
    for (size_t j = 0; j < len; j++)
        swap(arr, j, unif(len)); }

as an exemplary C implementation where \texttt{unif}(n) is independent and uniformly distributed on \{0..n-1\}.

Yet, even though sensible on first sight, the above defined random variable is only in the most trivial cases uniformly distributed and — as empirical evidence suggests, see below — horrendously non-uniformly distributed otherwise.
To prove the non-uniformity postulated above, I first present the following number-theoretic result.

Claim. In only three trivial cases does the factorial of a natural number divide its tetration; formally

\forall\,n\in\mathbb{N}_{>2}:n!\nmid n^n.

Proof. Let n\in\mathbb{N}_{>2} be a natural number larger than two. By the definition of the factorial, \prod_{p<n\ \text{prime}}p\mid n!\mid n^n is evident. Adhering to the uniqueness of prime factorizations, \prod_{p<n}p\mid n follows. Observe that n-1>1 has to be prime since \forall\,p<n\ \text{prime}:n-1\equiv -1\not\equiv 0\mod p, implying n-1\mid\prod_p=n which cannot hold for n>2. QED

Now suppose, \texttt{falseShuffle} was indeed non-trivially distributed uniformly. Without loss of generality, all involved probability spaces were finite. Then there had to exist a surjection from this algorithm’s entropic state to \mathbb{S}_n with fibers of the same finite cardinality, implying n!\mid n^n. By the above proven claim, n<3 followed, making the distribution trivial.¬†QED

One possible reason for the surprising nature of this non-uniformity is the striking source code resemblance to a correct implementation, i.e.

void shuffle(uint64_t *arr, size_t len) {
    for (size_t j = 0; j < len; j++)
        swap(arr, j, j + unif(len - j)); }

as an exemplary C implementation which can be inductively shown to resemble the same structure as \mathbb{S}_n, in each step sprinkling in some uniform randomness and thus being itself uniformly distributed.

To see just how non-uniform \texttt{falseShuffle} is, I have calculated its discrete density for n=4:

[       |                ]
[   |   |||              ]
[   |   |||              ]
[   |   |||              ]
[   ||  ||||||| ||       ]
[||||| |||||||| |||    ||]
[|||||||||||||||||| || ||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
[||||||||||||||||||||||||]
n = 4

If it was uniformly distributed, the discrete density would look like a rectangle; [||||| ... |||||]. Further plots for 0\leq n\leq 6 are shown in nonUniformity.txt.

Source code for the analysis and plotting: nonUniformity.hs. Empirical evidence of non-uniformity: nonUniformity.c.

Sudoku Generation

Over two years ago, I wrote a basic 3×3-sudoku solver which uses both fundamental rule-based elimination and guessing to arrive at the solution. Revisiting the topic of computer-aided sudoku manipulation, I wrote a generalized sudoku generator (sudoku.hs).

    | 4  
  3 |    
---------
  2 | 1  
  4 |    

./sudoku 5 2

Continue reading

Symbolic Closed-Form Fibonacci

Let V := \{(a_j)_{j\in\mathbb{N}}\subset\mathbb{C}|a_n=a_{n-1}+a_{n-2}\forall n>1\} be the two-dimensional complex vector space of sequences adhering to the Fibonacci recurrence relation with basis B := ((0,1,\dots), (1,0,\dots)).
Let furthermore f: V\to V, (a_j)_{j\in\mathbb{N}}\mapsto(a_{j+1})_{j\in\mathbb{N}} be the sequence shift endomorphism represented by the transformation matrix

A := M^B_B(f) = \begin{pmatrix}1&1\\1&0\end{pmatrix}.

By iteratively applying the sequence shift a closed-form solution for the standard Fibonacci sequence follows.

Continue reading