combinations thumb

Published : December 12, 2021 - Yanis MANSOUR

Algorithm to generate all combinations in a set

Introduction

This article is intended to explain the approach I took to develop PyCombs, a python package that generates all the possible combinations of length k in a given set.

Before to start, Let's make a quick reminder of what is a combination :

In mathematics, a combination is a selection of items from a set that has distinct members, such that the order of selection does not matter. Wikipedia

Another reminder I'd like to make is the number of combinations in a set

Cnk=n!k!(nk)!C^{k}_{n} = \frac{n!}{k! (n-k)!}

Note

This is a solution I thought about. It has the advantage to be non recursive, and is efficient if you are looking for combinations of a small length. It goes linearly with the length of the set. Here is the link of the full project on GitHub

Now that we have the necessary reminders, let's dive into the approach :

Representation

To simplify treatments, we will use lists to represent our sets. Let's suppose that we have the following set of 6 cities, in which we want tp look for all possible combinations of length 3 :

mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]

To represent combinations, one approach is to have indexes that point to elements, and because we look for combinations of length 3, we will need 3 indexes that will each point to an element of the set :

mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
          ^             ^        ^
          i             j        k

This will give us the first combination :

 comb = ["Madrid", "Paris", "Lyon"]

To go to the next combinations, we have to increment the last index k :

mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
          ^             ^                ^
          i             j                k

comb = ["Madrid", "Paris", "Marseille"]

By incrementing k until it reaches the end of the list, we obtain all combinations which start with the first two elements, combined with each element amongst the rest of the list :

mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
          ^             ^                ^
          i             j                k
comb = ["Madrid", "Paris", "Marseille"]

mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
          ^             ^                             ^
          i             j                             k
comb = ["Madrid", "Paris", "Washington"]

mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
          ^             ^                                           ^
          i             j                                           k
comb = ["Madrid", "Paris", "Dublin"]

Here is the list of combinations at this point :

combs = [
	['Madrid', 'Paris', 'Lyon'],
	['Madrid', 'Paris', 'Marseille'],
	['Madrid', 'Paris', 'Washington'],
	['Madrid', 'Paris', 'Dublin']
]

To enumerate the remaining combinations, we have to increment j and replace k by j+1, and restart the same process of incrementing k to the end of the list :

mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
          ^                      ^       ^
          i                      j       k
comb = ["Madrid", "Lyon", "Marseille"]

mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
          ^                      ^                    ^
          i                      j                    k
comb = ["Madrid", "Lyon", "Washington"]

mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
          ^                      ^                                  ^
          i                      j                                  k
comb = ["Madrid", "Lyon", "Dublin"]

Let N be the length of our set. By repeating the previous process, we will reach this situation :

mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
          ^                                           ^             ^
          i                                           j             k
comb = ["Madrid", "Washington", "Dublin"]

Now, it's the turn of i to be incremented :

mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
                        ^        ^       ^
                        i        j       k
comb = ["Paris", "Lyon", "Marseille"]

Repeat the process until i=N-3 & j=N-2 & k=N-1 :

mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
                                         ^            ^             ^
                                         i            j             k
comb = ["Marseille", "Washington", "Dublin"]

To represent the indexes, let's store them in a list. The situation above will give the following indexes :

indexes = [3, 4, 5]

The functions

In order to implement the full generator, I divided the full process into 2 functions. Let's go :

The move function

Now, let's write the move function that will calculate the next indexes list, given the current indexes list and the length of the set we are looking for combinations in (See examples below)

Let capital I be the index of a given index in the list of indexes, maxLength be the length of the set, and length be the length of the indexes list :

length = 3
I = 0
indexes = [0,1,2]
           ^
           I

mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
maxLength = 6

First, by looking the pattern showed above, we notice that the index we move is determined by going through the list of indexes backwards, and looking for the the first index that doesn't verify the condition of indexes[length - 1 - i] == maxLength - 1 - i. That is, the index that is being moved is always the first (in reverse order) that doesn't point to the end of the list :

indexes = [0,1,2]
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]

In this example, the index to move is indexes[2] because :

  • The length of the indexes list is 3 ⇒ length == 3 and the length of the set is 6 ⇒ maxLength == 6
  • We go through the indexes list backwards. Thus, the first element is indexes[length - 1 -i] such that i=0 (the first element in reverse order)
  • The condition of indexes[length - 1 - i] == maxLength - 1 - i is not verified :
    • indexes[length - 1 - i] = indexes[3 - 1 - 0] = indexes[2] = 2
    • maxLength - 1 - i = 6 - 1 - 0 = 5

Let's take another example :

indexes = [0,1,5]
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
comb = ["Madrid", "Paris", "Dublin"]

In this case, the index to move is indexes[1] because :

  • We go through the indexes list backwards. Thus, the first element is indexes[length - 1 -i] such that i=0 (the first element in reverse order)
  • The condition of indexes[length - 1 - i] == maxLength - 1 - i is verified :
    • indexes[length - 1 - i] = indexes[3 - 1 - 0] = indexes[2] = 5
    • maxLength - 1 - i = 6 - 1 - 0 = 5
  • We increment ii=1 to point to the second index in reverse order
  • The condition of indexes[length - 1 - i] == maxLength - 1 - i is not verified this time :
    • indexes[length - 1 - i] = indexes[3 - 1 - 1] = indexes[1] = 1
    • maxLength - 1 - i = 6 - 1 - 1 = 4

The index to increment is thus indexes[1] :

indexes = [0,2,5]
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
comb = ["Madrid", "Lyon", "Dublin"]

But according to the logic we followed in the illustration in the beginning of the article, the element that comes after indexes=[0,1,5] is indexes=[0,2,3] and not indexes=[0,2,5]. To achieve this, we have to set all the indexes that are after indexes[1] :

indexes = [0,2,5]
indexToMove = 1
for i in range (indexToMove + 1, length) :
		indexes[i] = indexes[i - 1] + 1

The process of the move function is the following :

  • Find the index to increment : indexToIncrement
  • Increment the index : indexToIncrement += 1
  • Increment all the indexes that come after indexToIncrement

Note that this function returns True if it moved the indexes list, and False if no move is possible (the end of combinations)

Here is the full function :

def move(indexes, maxLength) :

    # Determining the indexToMove
    length = len(indexes)
    indexToMove = -1
    for i in range (length) :
        if indexes[length - 1 -i] >= maxLength :
            raise ValueError ("Indexes must be less than maxLength")
        if indexes[length - 1 - i] != maxLength - 1 - i :
            indexToMove = length - 1 - i
            break
    if indexToMove == -1 :
        return False

    # Moving the index and the indexes after it
    indexes[indexToMove] += 1
    for i in range (indexToMove + 1, length) :
        indexes[i] = indexes[i - 1] + 1
    return True

Some examples of outputs :

indexes = [0,1,2]
move(indexes, 6)
print(indexes)    # [0, 1, 3]

indexes = [0,1,5]
move(indexes, 6)
print(indexes)    # [0, 2, 3]

indexes = [0,4,5]
move(indexes, 6)
print(indexes)    # [1, 2, 3]

The find_combinations function

This function takes a set (that is represented by a list) and the length of the combinations to generate in parameter. It returns all the combinations in a list.

This is the here is the logic of the function :

  • Initialise the indexes list : indexes = [i for i in range (k)], and adds the first combination to the combinations list
  • As long as a move is possible, move and add the corresponding combination to the combinations list
def find_combinations (list, k) :
    n = len(list)
    combinations = []
    if k < 0 or k > n :
        raise ValueError ("k must be equal to 0 or positive and less than or equal to n")
    if k == 0 :
        return [[]]
    elif k == 1 :
        return [[i] for i in list]
    elif k == len (list) :
        return [list]
    else :
        indexes = [i for i in range (k)]
        combinations.append ([list[i] for i in indexes])
        while move(indexes, n) :
            combinations.append([list[i] for i in indexes])
    return combinations

Here is an example of the output :

mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
combs = find_combinations(mySet, 3)
print (combs)

Output

[
  ['Madrid', 'Paris', 'Lyon'], 
  ['Madrid', 'Paris', 'Marseille'], 
  ['Madrid', 'Paris', 'Washington'], 
  ['Madrid', 'Paris', 'Dublin'], 
  ['Madrid', 'Lyon', 'Marseille'], 
  ['Madrid', 'Lyon', 'Washington'], 
  ['Madrid', 'Lyon', 'Dublin'], 
  ['Madrid', 'Marseille', 'Washington'], 
  ['Madrid', 'Marseille', 'Dublin'], 
  ['Madrid', 'Washington', 'Dublin'], 
  ['Paris', 'Lyon', 'Marseille'], 
  ['Paris', 'Lyon', 'Washington'], 
  ['Paris', 'Lyon', 'Dublin'], 
  ['Paris', 'Marseille', 'Washington'], 
  ['Paris', 'Marseille', 'Dublin'], 
  ['Paris', 'Washington', 'Dublin'], 
  ['Lyon', 'Marseille', 'Washington'], 
  ['Lyon', 'Marseille', 'Dublin'], 
  ['Lyon', 'Washington', 'Dublin'], 
  ['Marseille', 'Washington', 'Dublin']
]

Conclusion

This solution has the advantage of being non recursive. It is efficient for combinations of a small length. But if you are looking for combinations of an important length, performances may drop (not significantly) since the move function is linear with k (the length of the combinations).

Implementations that optimise the process of the move function do exist (see this thread) but for combinations of small length, the solution of this article is good.

Please feel free to reach me out on my instagram for any suggestion / question.