Published : December 12, 2021 - Yanis MANSOUR
This article is intended to explain the approach I took to develop PyCombs, a python package that generates all the possible combinations of length k
in a given set.
Before to start, Let's make a quick reminder of what is a combination :
In mathematics, a combination is a selection of items from a set that has distinct members, such that the order of selection does not matter. Wikipedia
Another reminder I'd like to make is the number of combinations in a set
This is a solution I thought about. It has the advantage to be non recursive, and is efficient if you are looking for combinations of a small length. It goes linearly with the length of the set. Here is the link of the full project on GitHub
Now that we have the necessary reminders, let's dive into the approach :
To simplify treatments, we will use lists to represent our sets. Let's suppose that we have the following set of 6 cities, in which we want tp look for all possible combinations of length 3
:
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
To represent combinations, one approach is to have indexes that point to elements, and because we look for combinations of length 3
, we will need 3
indexes that will each point to an element of the set :
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
^ ^ ^
i j k
This will give us the first combination :
comb = ["Madrid", "Paris", "Lyon"]
To go to the next combinations, we have to increment the last index k
:
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
^ ^ ^
i j k
comb = ["Madrid", "Paris", "Marseille"]
By incrementing k
until it reaches the end of the list, we obtain all combinations which start with the first two elements, combined with each element amongst the rest of the list :
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
^ ^ ^
i j k
comb = ["Madrid", "Paris", "Marseille"]
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
^ ^ ^
i j k
comb = ["Madrid", "Paris", "Washington"]
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
^ ^ ^
i j k
comb = ["Madrid", "Paris", "Dublin"]
Here is the list of combinations at this point :
combs = [
['Madrid', 'Paris', 'Lyon'],
['Madrid', 'Paris', 'Marseille'],
['Madrid', 'Paris', 'Washington'],
['Madrid', 'Paris', 'Dublin']
]
To enumerate the remaining combinations, we have to increment j
and replace k
by j+1
, and restart the same process of incrementing k
to the end of the list :
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
^ ^ ^
i j k
comb = ["Madrid", "Lyon", "Marseille"]
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
^ ^ ^
i j k
comb = ["Madrid", "Lyon", "Washington"]
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
^ ^ ^
i j k
comb = ["Madrid", "Lyon", "Dublin"]
Let N
be the length of our set. By repeating the previous process, we will reach this situation :
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
^ ^ ^
i j k
comb = ["Madrid", "Washington", "Dublin"]
Now, it's the turn of i
to be incremented :
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
^ ^ ^
i j k
comb = ["Paris", "Lyon", "Marseille"]
Repeat the process until i=N-3
& j=N-2
& k=N-1
:
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
^ ^ ^
i j k
comb = ["Marseille", "Washington", "Dublin"]
To represent the indexes, let's store them in a list. The situation above will give the following indexes :
indexes = [3, 4, 5]
In order to implement the full generator, I divided the full process into 2 functions. Let's go :
move
functionNow, let's write the move
function that will calculate the next indexes
list, given the current indexes
list and the length of the set we are looking for combinations in (See examples below)
Let capital I
be the index of a given index in the list of indexes, maxLength
be the length of the set, and length
be the length of the indexes
list :
length = 3
I = 0
indexes = [0,1,2]
^
I
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
maxLength = 6
First, by looking the pattern showed above, we notice that the index we move is determined by going through the list of indexes
backwards, and looking for the the first index that doesn't verify the condition of indexes[length - 1 - i] == maxLength - 1 - i
. That is, the index that is being moved is always the first (in reverse order) that doesn't point to the end of the list :
indexes = [0,1,2]
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
In this example, the index to move is indexes[2]
because :
length == 3
and the length of the set is 6 ⇒ maxLength == 6
indexes
list backwards. Thus, the first element is indexes[length - 1 -i]
such that i=0
(the first element in reverse order)indexes[length - 1 - i] == maxLength - 1 - i
is not verified :
indexes[length - 1 - i] = indexes[3 - 1 - 0] = indexes[2] = 2
maxLength - 1 - i = 6 - 1 - 0 = 5
Let's take another example :
indexes = [0,1,5]
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
comb = ["Madrid", "Paris", "Dublin"]
In this case, the index to move is indexes[1]
because :
indexes
list backwards. Thus, the first element is indexes[length - 1 -i]
such that i=0
(the first element in reverse order)indexes[length - 1 - i] == maxLength - 1 - i
is verified :
indexes[length - 1 - i] = indexes[3 - 1 - 0] = indexes[2] = 5
maxLength - 1 - i = 6 - 1 - 0 = 5
i
⇒ i=1
to point to the second index in reverse orderindexes[length - 1 - i] == maxLength - 1 - i
is not verified this time :
indexes[length - 1 - i] = indexes[3 - 1 - 1] = indexes[1] = 1
maxLength - 1 - i = 6 - 1 - 1 = 4
The index to increment is thus indexes[1]
:
indexes = [0,2,5]
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
comb = ["Madrid", "Lyon", "Dublin"]
But according to the logic we followed in the illustration in the beginning of the article, the element that comes after indexes=[0,1,5]
is indexes=[0,2,3]
and not indexes=[0,2,5]
. To achieve this, we have to set all the indexes that are after indexes[1]
:
indexes = [0,2,5]
indexToMove = 1
for i in range (indexToMove + 1, length) :
indexes[i] = indexes[i - 1] + 1
The process of the move function is the following :
indexToIncrement
indexToIncrement += 1
indexToIncrement
Note that this function returns True
if it moved the indexes
list, and False
if no move is possible (the end of combinations)
Here is the full function :
def move(indexes, maxLength) :
# Determining the indexToMove
length = len(indexes)
indexToMove = -1
for i in range (length) :
if indexes[length - 1 -i] >= maxLength :
raise ValueError ("Indexes must be less than maxLength")
if indexes[length - 1 - i] != maxLength - 1 - i :
indexToMove = length - 1 - i
break
if indexToMove == -1 :
return False
# Moving the index and the indexes after it
indexes[indexToMove] += 1
for i in range (indexToMove + 1, length) :
indexes[i] = indexes[i - 1] + 1
return True
Some examples of outputs :
indexes = [0,1,2]
move(indexes, 6)
print(indexes) # [0, 1, 3]
indexes = [0,1,5]
move(indexes, 6)
print(indexes) # [0, 2, 3]
indexes = [0,4,5]
move(indexes, 6)
print(indexes) # [1, 2, 3]
find_combinations
functionThis function takes a set (that is represented by a list) and the length of the combinations to generate in parameter. It returns all the combinations in a list.
This is the here is the logic of the function :
indexes
list : indexes = [i for i in range (k)]
, and adds the first combination to the combinations listdef find_combinations (list, k) :
n = len(list)
combinations = []
if k < 0 or k > n :
raise ValueError ("k must be equal to 0 or positive and less than or equal to n")
if k == 0 :
return [[]]
elif k == 1 :
return [[i] for i in list]
elif k == len (list) :
return [list]
else :
indexes = [i for i in range (k)]
combinations.append ([list[i] for i in indexes])
while move(indexes, n) :
combinations.append([list[i] for i in indexes])
return combinations
Here is an example of the output :
mySet = ["Madrid", "Paris", "Lyon", "Marseille", "Washington", "Dublin"]
combs = find_combinations(mySet, 3)
print (combs)
Output
[
['Madrid', 'Paris', 'Lyon'],
['Madrid', 'Paris', 'Marseille'],
['Madrid', 'Paris', 'Washington'],
['Madrid', 'Paris', 'Dublin'],
['Madrid', 'Lyon', 'Marseille'],
['Madrid', 'Lyon', 'Washington'],
['Madrid', 'Lyon', 'Dublin'],
['Madrid', 'Marseille', 'Washington'],
['Madrid', 'Marseille', 'Dublin'],
['Madrid', 'Washington', 'Dublin'],
['Paris', 'Lyon', 'Marseille'],
['Paris', 'Lyon', 'Washington'],
['Paris', 'Lyon', 'Dublin'],
['Paris', 'Marseille', 'Washington'],
['Paris', 'Marseille', 'Dublin'],
['Paris', 'Washington', 'Dublin'],
['Lyon', 'Marseille', 'Washington'],
['Lyon', 'Marseille', 'Dublin'],
['Lyon', 'Washington', 'Dublin'],
['Marseille', 'Washington', 'Dublin']
]
This solution has the advantage of being non recursive. It is efficient for combinations of a small length. But if you are looking for combinations of an important length, performances may drop (not significantly) since the move
function is linear with k
(the length of the combinations).
Implementations that optimise the process of the move
function do exist (see this thread) but for combinations of small length, the solution of this article is good.
Please feel free to reach me out on my instagram for any suggestion / question.