combinations thumb

Published : December 12, 2021 - Yanis MANSOUR

Algorithm to generate all combinations in a set

Introduction

This article is intended to explain the approach I took to develop PyCombs, a python package that generates all the possible combinations of length k in a given set.

Before to start, Let's make a quick reminder of what is a combination :

In mathematics, a combination is a selection of items from a set that has distinct members, such that the order of selection does not matter. Wikipedia

Another reminder I'd like to make is the number of combinations in a set

Cnk=n!k!(nk)! C^{k}_{n} = \frac{n!}{k! (n-k)!}

Note

This is a solution I thought about. It has the advantage to be non recursive, and is efficient if you are looking for combinations of a small length. It goes linearly with the length of the set. Here is the link of the full project on GitHub

Now that we have the necessary reminders, let's dive into the approach :

Representation

To simplify treatments, we will use lists to represent our sets. Let's suppose that we have the following set of 6 cities, in which we want tp look for all possible combinations of length 3 :

mySet = ["Tizi Ouzou", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]

To represent combinations, one approach is to have indexes that point to elements, and because we look for combinations of length 3, we will need 3 indexes that will each point to an element of the set :

mySet = ["Tizi Ouzou", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
          ^             ^        ^
          i             j        k

This will give us the first combination :

 comb = ["Tizi Ouzou", "Paris", "Lyon"]

To go to the next combinations, we have to increment the last index k :

mySet = ["Tizi Ouzou", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
          ^             ^                ^
          i             j                k

comb = ["Tizi Ouzou", "Paris", "Marseille"]

By incrementing k until it reaches the end of the list, we obtain all combinations which start with the first two elements, combined with each element amongst the rest of the list :

mySet = ["Tizi Ouzou", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
          ^             ^                ^
          i             j                k
comb = ["Tizi Ouzou", "Paris", "Marseille"]

mySet = ["Tizi Ouzou", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
          ^             ^                             ^
          i             j                             k
comb = ["Tizi Ouzou", "Paris", "Washington"]

mySet = ["Tizi Ouzou", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
          ^             ^                                           ^
          i             j                                           k
comb = ["Tizi Ouzou", "Paris", "Dublin"]

Here is the list of combinations at this point :

combs = [
	['Tizi Ouzou', 'Paris', 'Lyon'],
	['Tizi Ouzou', 'Paris', 'Marseille'],
	['Tizi Ouzou', 'Paris', 'Washington'],
	['Tizi Ouzou', 'Paris', 'Dublin']
]

To enumerate the remaining combinations, we have to increment j and replace k by j+1, and restart the same process of incrementing k to the end of the list :

mySet = ["Tizi Ouzou", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
          ^                      ^       ^
          i                      j       k
comb = ["Tizi Ouzou", "Lyon", "Marseille"]

mySet = ["Tizi Ouzou", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
          ^                      ^                    ^
          i                      j                    k
comb = ["Tizi Ouzou", "Lyon", "Washington"]

mySet = ["Tizi Ouzou", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
          ^                      ^                                  ^
          i                      j                                  k
comb = ["Tizi Ouzou", "Lyon", "Dublin"]

Let N be the length of our set. By repeating the previous process, we will reach this situation :

mySet = ["Tizi Ouzou", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
          ^                                           ^             ^
          i                                           j             k
comb = ["Tizi Ouzou", "Washington", "Dublin"]

Now, it's the turn of i to be incremented :

mySet = ["Tizi Ouzou", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
                        ^        ^       ^
                        i        j       k
comb = ["Paris", "Lyon", "Marseille"]

Repeat the process until i=N-3 & j=N-2 & k=N-1 :

mySet = ["Tizi Ouzou", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
                                         ^            ^             ^
                                         i            j             k
comb = ["Marseille", "Washington", "Dublin"]

To represent the indexes, let's store them in a list. The situation above will give the following indexes :

indexes = [3, 4, 5]

The functions

In order to implement the full generator, I divided the full process into 2 functions. Let's go :

The move function

Now, let's write the move function that will calculate the next indexes list, given the current indexes list and the length of the set we are looking for combinations in (See examples below)

Let capital I be the index of a given index in the list of indexes, maxLength be the length of the set, and length be the length of the indexes list :

length = 3
I = 0
indexes = [0,1,2]
           ^
           I

mySet = ["Tizi Ouzou", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
maxLength = 6

First, by looking the pattern showed above, we notice that the index we move is determined by going through the list of indexes backwards, and looking for the the first index that doesn't verify the condition of indexes[length - 1 - i] == maxLength - 1 - i. That is, the index that is being moved is always the first (in reverse order) that doesn't point to the end of the list :

indexes = [0,1,2]
mySet = ["Tizi Ouzou", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]

In this example, the index to move is indexes[2] because :

Let's take another example :

indexes = [0,1,5]
mySet = ["Tizi Ouzou", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
comb = ["Tizi Ouzou", "Paris", "Dublin"]

In this case, the index to move is indexes[1] because :

The index to increment is thus indexes[1] :

indexes = [0,2,5]
mySet = ["Tizi Ouzou", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
comb = ["Tizi Ouzou", "Lyon", "Dublin"]

But according to the logic we followed in the illustration in the beginning of the article, the element that comes after indexes=[0,1,5] is indexes=[0,2,3] and not indexes=[0,2,5]. To achieve this, we have to set all the indexes that are after indexes[1] :

indexes = [0,2,5]
indexToMove = 1
for i in range (indexToMove + 1, length) :
		indexes[i] = indexes[i - 1] + 1

The process of the move function is the following :

Note that this function returns True if it moved the indexes list, and False if no move is possible (the end of combinations)

Here is the full function :

def move(indexes, maxLength) :

    # Determining the indexToMove
    length = len(indexes)
    indexToMove = -1
    for i in range (length) :
        if indexes[length - 1 -i] >= maxLength :
            raise ValueError ("Indexes must be less than maxLength")
        if indexes[length - 1 - i] != maxLength - 1 - i :
            indexToMove = length - 1 - i
            break
    if indexToMove == -1 :
        return False

    # Moving the index and the indexes after it
    indexes[indexToMove] += 1
    for i in range (indexToMove + 1, length) :
        indexes[i] = indexes[i - 1] + 1
    return True

Some examples of outputs :

indexes = [0,1,2]
move(indexes, 6)
print(indexes)    # [0, 1, 3]

indexes = [0,1,5]
move(indexes, 6)
print(indexes)    # [0, 2, 3]

indexes = [0,4,5]
move(indexes, 6)
print(indexes)    # [1, 2, 3]

The find_combinations function

This function takes a set (that is represented by a list) and the length of the combinations to generate in parameter. It returns all the combinations in a list.

This is the here is the logic of the function :

def find_combinations (list, k) :
    n = len(list)
    combinations = []
    if k < 0 or k > n :
        raise ValueError ("k must be equal to 0 or positive and less than or equal to n")
    if k == 0 :
        return [[]]
    elif k == 1 :
        return [[i] for i in list]
    elif k == len (list) :
        return [list]
    else :
        indexes = [i for i in range (k)]
        combinations.append ([list[i] for i in indexes])
        while move(indexes, n) :
            combinations.append([list[i] for i in indexes])
    return combinations

Here is an example of the output :

mySet = ["Tizi Ouzou", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
combs = find_combinations(mySet, 3)
print (combs)

Output

[
  ['Tizi Ouzou', 'Paris', 'Lyon'], 
  ['Tizi Ouzou', 'Paris', 'Marseille'], 
  ['Tizi Ouzou', 'Paris', 'Washington'], 
  ['Tizi Ouzou', 'Paris', 'Dublin'], 
  ['Tizi Ouzou', 'Lyon', 'Marseille'], 
  ['Tizi Ouzou', 'Lyon', 'Washington'], 
  ['Tizi Ouzou', 'Lyon', 'Dublin'], 
  ['Tizi Ouzou', 'Marseille', 'Washington'], 
  ['Tizi Ouzou', 'Marseille', 'Dublin'], 
  ['Tizi Ouzou', 'Washington', 'Dublin'], 
  ['Paris', 'Lyon', 'Marseille'], 
  ['Paris', 'Lyon', 'Washington'], 
  ['Paris', 'Lyon', 'Dublin'], 
  ['Paris', 'Marseille', 'Washington'], 
  ['Paris', 'Marseille', 'Dublin'], 
  ['Paris', 'Washington', 'Dublin'], 
  ['Lyon', 'Marseille', 'Washington'], 
  ['Lyon', 'Marseille', 'Dublin'], 
  ['Lyon', 'Washington', 'Dublin'], 
  ['Marseille', 'Washington', 'Dublin']
]

Conclusion

This solution has the advantage of being non recursive. It is efficient for combinations of a small length. But if you are looking for combinations of an important length, performances may drop (not significantly) since the move function is linear with k (the length of the combinations).

Implementations that optimise the process of the move function do exist (see this thread) but for combinations of small length, the solution of this article is good.

Please feel free to reach me out on my instagram for any suggestion / question.