Saturday, 6 December 2008

Generate all combinations of chars

Generating all combinations of the characters is not a simple task and one must think before starting any coding. The problem size (example 4 represents all combinations between 'a' and 'ZZZZ') can vary and this makes the problem more complex to solve but simpler to program! There are various ways how to solve this problem. The solution discussed here describes how to convert an integer into a sequence of characters.

This article describes how to write a Java program which returns all combinations of characters for a given length. Note that even though this is a programming related article it requires a good knowledge of mathematics. The examples make use of logs and power of to carry out the calculations required.

An overview of the solution

The solution discussed here describes how to convert an integer into a sequence of characters. Let's have a simple example, the integer 0 is equivalent to ['a'], while integer 1 is equivalent to ['b'] and the integer 51 is equivalent to ['Z']. This is great for a problem of size one. What about a problem of size two? In this case 0 is still ['a'] and 1 is still ['b'] and 51 is still equivalent to ['Z']. Now 52 is (or should be) equivalent to ['a', 'a']. The array size has changed from one to two. The integer 53 is equivalent ['a', 'b'] and so on.

Not everything that shines is gold! Let's switch to numbers instead of characters for now so we appreciate the problem that this will lead us into. We have ten numbers, from 0 until 9. The leading 0s are usually ignored and are not used. This creates a problem for our solution as the 0 is representing the char 'a'. What I'm saying here? In numbers the value 00140 is equivalent to 140, but this is not the case with our solution as the combination: 'aabda' is not equivalent to 'bda'. To avoid this problem we have to build sets of combinations with unique size. Thus the value 0 for problem size 4 is 'aaaa' while the same value for a problem with size 7 will be: 'aaaaaaa'. This is covered mathematically below.

Let's start with the simplest version of the problem when the size is one, that is, all the characters available in the English alphabet.

Convert an int into a char

We need to provide map for all characters in the alphabet. The following function does so by returning the char 'a' for the integer 0 and the char 'z' for the int 25. The char 'A' is returned for the int 26 while the char 'Z' is returned for the int 51. Integer values outside the mentioned range (0 and 51 inclusive) will throw an illegal argument exception.


public static char getCharForInt(int i) 
                   throws IllegalArgumentException {
  if (i < 0 | i >= BASE) {
    throw new IllegalArgumentException(
              "Integer value out of range: " + i);
  }
  int length = (BASE) / 2;

  // Lower case char
  if (i < length) {
    return (char) (i + 'a');
  }
  // Upper case char
  return (char) (i - length + 'A');
}

This method is quite straight forward and simple to understand. This method returns just a single character which is only valid for a problem with size one. We need yet another method to generate an array of these. And that's what we're discussing next. Like here, this method will expect an integer and it will return the relative array of chars for the given integer. But before we proceed it's good to note that the method makes use of a constant named BASE which is declared at class level as illustrated below.


private static final int BASE = ('z' - 'a') + ('Z' - 'A') + 2;

Build an array of chars for a given int

If we can provide a map of chars to ints programmatically, then we can also provide a map between an integer (referred to as key in this method) to an array of chars in conjunctions with the previous method. How? Let's have a numerical example first. The number 143 (base 10) can be expressed as: 1 * 102 + 4 * 101 + 3 * 100. So to convert the number 14310 into the array of chars ['1', '4', '3'] we need to perform the following steps:


0) 143 / 102 = 1
1) 143 – (102*1) / 101
1) 143 – (100) / 101 = 4
2) 143 - ((102*1) + (101*4)) / 100
2) 143 - (140) / 100 = 3

Using the above approach we can convert the key integer value into an array of characters by changing the base from 10 to our base: 52. Before we can sing victory, we have to appreciate that this approach has a limitation. It will work well for the least significant character but it will miss one on the other characters. What am I saying? Let's have another example first:

Key Expression Value Expected Output
0 0 * 520 0 'a' 'a'
52 52 * 521
0 * 520
1
0
'aa' 'ba'

When dividing 52 by 52 we get 1, which is equivalent to 'b' not the answer we wanted! A simple way around this problem the solution and to reduce the number conditions we're going to make a small change to solution discussed before. We'll build sets of combinations with unique size. Thus the key value 0 for problem size 4 is ['a', 'a', 'a', 'a'] while the same value for a problem with size 7 will be: ['a', 'a', 'a', 'a', 'a', 'a', 'a'] .

This immediately provokes the following question, how are we going to generate all combinations? We can still obtain all combinations by first generating all combinations for size one and size two and so on.

With that out of the way, we can proceed in our adventure. The following code provides the algorithm used to generate all combination for a specific size.


private static char[] getCharsFormKey(int key, int size) {
  char[] chars = new char[size];
  for (int i = size - 1, j = 0; i >= 0; i--, j++) {
    // Get the divider for this character position
    int divider = (int) Math.pow(BASE, i);
    // Calculate the character index by dividing the 
    // key with the divider
    int c = key / divider;
    chars[j] = getCharForInt(c);
    // Deduct the processed numbers
    key -= (c * divider);
  }
  return chars;
}

Given a key integer and a size, this method returns the equivalent array of chars mapping the given key integer. Note that an illegal argument exception is thrown if the key integers requires a larger array that the specified size. This immediately brings us the following question: can we calculate the size automatically to fit the given key value? Yes and to do so we need to make use of the BigDecimal class to perform some calculations. We can determine the number of digits within a number by calculating the log to base 10 of that number incrementing it by one. In our case we have to calculate the log to base 52. There is not function ready which does that. But we can calculate the log of any base by using log to base 10 or ln as logn(X) is equivalent to log10(X) / log10(n).


private static char[] getCharsFormKey(int key) {
  int size = 1;
  if (key > 0) {
    BigDecimal keyDB = 
                 new BigDecimal(String.valueOf(Math.log(key)));
    size = (int) (keyDB.divide(
                   BASE_LOG_BD, RoundingMode.HALF_UP).doubleValue()
                  ) + 1;
  }
  return getCharsFormKey(key, size);
}

This method may be more attractive as it requires less parameters. The calculations involved here take some time and affect the performance. This method is mentioned for the sake of completeness and should be avoided if possible. This method makes use of a constant named BASE_LOG_BD which is declared at class level as illustrated below. Also note that the string version of the big decimal class is used as the double version is more prone to error as discussed at: http://www.opensourcestrategies.com/ofbiz/ofbiz_bigdecimal_cookbook.txt.


private static final BigDecimal BASE_LOG_BD = 
    new BigDecimal(String.valueOf(Math.log(52)));

Generating all combinations

To generate all combinations for a problem size 4, that is, generate all combinations starting from 'a' until 'ZZZZ'. We need two loops in order to achieve this: one to generate all combinations for a specific size or length, while the other to go through all sizes/lengths starting from 1 until the problem size, four in this case.


int size = 4;
for (int i = 1; i <= size; i++) {
  int maxKeyForSize = (int) Math.pow(BASE, i);
  for (int j = 0; j < maxKeyForSize; j++) {
    char[] chars = getCharsFormKey(j, i);
    String text = String.copyValueOf(chars);
    System.out.println(text);
  }
}

Conclusion

This article describes how to generate all combinations of characters for a given size using the mathematical functions ln and power of.