How to store data without arrays, or how to put a lot of information into a single variable

The question of how to generate 26 random numbers without repetition makes for a good introduction to using a single integer to store information about many unique numbers.

Another alternative is to use unique identifiers for permutations, but given that there are 26! of them, keeping track of which is which can be challenging.

So, what I'm going to demonstrate here is a method for keeping track of which of the 26 numbers we've selected thus far.

How to generate the variables:

Consider the following mapping from choices made, to binary representation, to integers

Sequence                      3                                                3 5                                               3  5 18                                              

Binary Representation    000000000000000000000001000   000000000000000000000101000   000000001000000000000101000      

Integer Value Stored       8                                                40                                               262184

How the Int is calculated 8                                                8    + 32                                       8   +  32  + 262,144

                                     2^3                                             2^3 + 2^5                                     2^3 + 2^5 + 2^18

Since we have only 26 variables with 2 values (whether each of the 26 values has been visited or not), we only need 2^26 different values, rather than the 26! values that we would need to also store information about the order in which the values are received.

How to use these variables to check whether we've used the number yet:

So in our representation, the nth bit is 1 if we've already used the number n.  To check whether we've used the number, we need merely check whether the nth bit is 1.

We can do so by the following method.

(var/ (2^n) ) % 2

Dividing the variable by 2^n shifts our number n digits to the right.

For example, when n = 3

The binary value 00001000 has integer value 8

Dividing by 2^3 (8), we get binary value 00000001, which matches 8/8=1.

The mod 2 (%2) tells us whether the last digit is 1.  All odd numbers end with a 1 in binary representation, and all even numbers end in a 0.

So the division lets us ignore all values to the right of our digit, and the modulus allows us to ignore all values to the left, giving us the value of the single bit we are interested in.

Another example, when we've chosen the values 4, 5, and 6 already, and we want to check whether we've used the number 5.

The numbers in parentheses fall out in that step, no longer being used.

val                                    val/2^n                                (val/2^n)%2

2^4+2^5+2^6                    ( 2^-1+) 2^0+2^1                        2^0 + (2^1)

16 + 32 + 64                    (1/2 + ) 1 + 2                            1 + (2)

112                                    3                                          1

01110000                        00000011 (1)                            (0000001) 1 

So, given this method of storing and accessing data, we can generate a sequence of 26 unique values without having to use an array.

As a side note, this will be one of the most space efficient methods of storing the data.

Storing an ArrayList of integers we've already selected uses 4 bytes of data per element, for 26*4*8 bits of storage.

Using an array of boolean values, though each boolean only has 2 values, generally uses a full byte per boolean in Java, so 26*8 bits.

I've implemented here the storage of the data with the primitive int type, which uses a 4 bytes, or 32 bits to store these 26 bits of data.

There is a BitSet object that fuctions similarly to the bit vector from C, which should allow even more fine grained storage, but issues with memory storage mean that a full word would likely be reserved, so the storage will be anywhere from 26 bits to 32 bits.

Thankfully we're in a day and age where constraints on space are rarely an issue for the average programmer, but when you are working with many objects, space can become an issue, and efficient packing of data will allow your program to run, or keep your application running quickly, instead of thrashing memory, falling out of RAM.

Note as well here that I'm not using the 0th bit.  The equations above can be modified to use the 0th bit, and use unsigned integers, to take advantage of all 32 bits available in an int, as opposed to the 30 values you have here.

Also, since we are using an int, instead of a bit vector, we can also record states with 3 or more values, as opposed to just 2.  By using powers of 3 ( (0,1,2) * 3^variable number) , and mod 3 (returning state 0, 1, or 2), the same style of algorithm can be applied.

It's an unusual way of thinking about variables, but can come in handy, so keep it in mind.