Introduction

This page describes the algorithm implemented in my Python script. I have a more recent script written in Javascript, which you can see on the Interactive page. This uses similar concepts but was written independently.

The source code page will allow you to download a Python script to automate the process. This script will solve all the variants described on the variants page.

The "easy" example

    19 4 
  48  6  
75      2
 9 1 2  4
     3   
5  4 6 3 
8      73
  6  84  
 1 29    

We will start off with an easy example. This one appeared in the London Evening Standard on 31st May 2005, where it was described as "Hard". It has 27 squares filled in, the examples they describe as "Easy" have 32. This doesn't make much difference to a computer algorithm, though if you were working through this by hand the more initial numbers you are given the quicker you should be able to solve it. 

I call it "easy" as it can be solved after applying only the simpler parts of the algorithm.

The start

The algorithm works by writing out all the possible answers and then eliminating those that are incompatible with the numbers published in the puzzle. The starting position is that every cell can contain any of the numbers 1-9. We represent this by writing all the possible values into each cell, which gives us a grid that looks like this.

123456789123456789123456789123456789123456789123456789123456789123456789123456789
123456789123456789123456789123456789123456789123456789123456789123456789123456789
123456789123456789123456789123456789123456789123456789123456789123456789123456789
123456789123456789123456789123456789123456789123456789123456789123456789123456789
123456789123456789123456789123456789123456789123456789123456789123456789123456789
123456789123456789123456789123456789123456789123456789123456789123456789123456789
123456789123456789123456789123456789123456789123456789123456789123456789123456789
123456789123456789123456789123456789123456789123456789123456789123456789123456789
123456789123456789123456789123456789123456789123456789123456789123456789123456789

The overall approach is to eliminate the impossible so that what remains must be the answer.

Applying the given numbers

In the example above we have been given the values for 27 of the cells. The first thing to do is to apply these to our grid and eliminate all the values that are inconsistent with these. Working across the top row the first value we have is in the 5th cell. This is a 1. This means that there can be no other '1's on the top row or in the 5th column, or in the top-middle block of 9. Crossing out all these '1's leaves:

23456789234567892345678923456789123456789234567892345678923456789
123456789123456789123456789234567892345678923456789123456789123456789123456789
123456789123456789123456789234567892345678923456789123456789123456789123456789
12345678912345678912345678912345678923456789123456789123456789123456789123456789
12345678912345678912345678912345678923456789123456789123456789123456789123456789
12345678912345678912345678912345678923456789123456789123456789123456789123456789
12345678912345678912345678912345678923456789123456789123456789123456789123456789
12345678912345678912345678912345678923456789123456789123456789123456789123456789
12345678912345678912345678912345678923456789123456789123456789123456789123456789

Continuing this process; after entering the 9 given values on the top three rows we have:

2368236823682356719357843578
12391239482357235761357913579
7513689346346346138913892
123456891234678912356789123456792345678912345678123457891235678913456789
123456891234678912356789123456792345678912345678123457891235678913456789
123456891234678912356789123456792345678912345678123457891235678913456789
123456891234678912356789123456792345678912345678123457891235678913456789
123456891234678912356789123456792345678912345678123457891235678913456789
123456891234678912356789123456792345678912345678123457891235678913456789

This hasn't made much impact on the bottom two thirds of the grid because so far we've only had one value in each column. But we've got 18 more values to place. When we get to '6' in the 6th column of the 6th row the grid is becoming a lot sparser and we notice that the only possible value left for the sixth cell on the third row is '4'. So we have found our first missing number (shown in green).

2368236823682356719357843578
12391234823575761357913579
7513689363464138913892
36893678157823578356784
1246812467812678579578931257891256789156789
512378123784789612378912378913789
1234689123467812356789235679234567891457812345789123567891356789
1234689123467812356789235679234567891457812345789123567891356789
1234689123467812356789235679234567891457812345789123567891356789

After we've used up all the given numbers (and the one we found) we we are left with the grid.

236236823835671935784578
1239234823575761591579
7513893636413891892
369378157825785684
12462467812785795783125789125689156789
5278127847861278931789
8242595645615125973
2392376357357841259159
341357295758568568

So we move onto phase 2, "Looking for Singletons".

Looking for singletons

Phase two involves counting how often the undecided numbers appear in a set of cells (row, column or 3x3 super-cell). If we start with the first column, the undecided cells are the 1st, 2nd, 4th, 5th, 8th and 9th. The possibilities are '236', '1239', '36', '1246', '239' and '34'. If we count how often each undecided number appears (the known ones are excluded from this process) we get:

Number 1 2 3 4 5 6 7 8 9
Frequency 2 4 5 2 - 3 - - 2

Every undecided number appears at least twice, so this doesn't help us. Repeating this exercise for columns 2 and 3 doesn't produce any useful information either. When we try column 4 though, we get;

Number 1 2 3 4 5 6 7 8 9
Frequency - - 3 - 4 3 3 - 1

Now the '9' only appears once, in the 5th row, so that is where it must be. We have found our second number. We can now set the '9' and repeat the process described in phase 1 giving.

236236823835671935784578
1239234823575761591579
7513893636413891892
369378157825785684
124624678127895783125781256815678
5278127847861278931789
8242595645615125973
2392376357357841259159
341357295758568568

That didn't help very much, so we restart this phase at the beginning. The first four columns offer no additional insights, but now the frequencies in column 5 are:

Number 1 2 3 4 5 6 7 8 9
Frequency - 1 3 1 5 2 5 3 -

Which gives us two more discoveries, '2' on the second row and '4' on the 7th. Filling these in using the phase 1 algorithm sets off a chain reaction...

Action Consequences
(2,5) = 2 (2,2) = 3
(2,2) = 3 None
(7,5) = 4 (7,2) = 2
(7,2) = 2 (8,2) = 7
(8,2) = 7 (6,2) = 8
(6,2) = 8 (1,2) = 6, (6,5) = 7
(1,2) = 6 (1,1) = 2, (5,2) = 4
(6,5) = 7 None
(1,1) = 2 (1,3) = 8
(5,2) = 4 None
(1,3) = 8 None

Co-ordinates are (row, column) with the top left being (1,1). The grid now looks like this (with the new discoveries highlighted).

26835719357457
1934825761591579
75193636413891892
3693715825785684
1641279583125781256815678
5812476129319
82595641515973
39763535841259159
34135295758568568

Restarting this phase counting down the columns shows us that the bottom cell in the first column must be '4'. This doesn't lead to any further discoveries. Trying again reveals that the top cell in the 4th column must be '7'. This second discovery forces both (1,9) and (2,6) to be '5', which in turn forces three more discoveries, leaving.

268719345
1934825619179
7519363641891892
3693715825785684
164127958312578125681678
5812476129319
825956415973
3976353584125919
41352975856868

earching the columns again we see that the third cell in the fifth column must be a 6. Setting this triggers another cascade of discoveries which culminates with the solution.

268719345
134825697
759364182
397182564
642953718
581476239
825641973
976538421
413297856

For this example that is as far as we need to go, which is why I've described it as "Easy". I don't have any examples, but any patterns that can be solved only by using the first phase could be described as "Trivial".  The next sections discuss some more difficult examples of the problem.

Pairs

The first two phases are sufficient to solve many examples, but here is a more difficult example. It only has 23 given numbers, 4 less than the previous example.

      2  
 58  6   
   3   85
 1 47 6  
9 6   5 7
  7 39 4 
76   8   
   9  81 
  9      

After applying phases 1 and 2 we are still left with:

134  379  134  1578  14589  14  2  679  13469
1234  5  8  127  1249  6  3479  79  1349
6  279  124  3  1249  124  479  8  5
238  1  23  4  7  5  6  29  289
9  4  6  128  128  12  5  3  7
5  28  7  6  3  9  1  4  28
7  6  12345  125  1245  8  349  259  2349
234  23  2345  9  2456  7  8  1  2346
1248  28  9  125  12456  3  47  2567  246

Which appears to have still a long way to go. If we examine the second column we can see two things of interest.

  1. The numbers 7 and 9 only appear twice, and when the do appear they appear together.
  2. There are two cells that can only contain 2 or 8. 

The first of these rules will allow us to simplify cells (1,2) and (3,2) which currently show the options 3,7,9 and 2,7,9. Because 7 and 9 only appear in these two cells, one must contain 7 and the other 9. Though we don't know which is which, this does allow us to remove the possibilities that these cells contain a 3 or a 2.

The second of these rules applies to cells (6,2) and (9,2), both of which can only contain a 2 or an 8. Again this means that one of the cells must contain 2 and the other 8. Although we don't know which is which we do no that none of the other cells in the column can be a 2 or an 8. This allows us to remove the 2s from (3,2) and (8,2), though in this case (3,2) has already had its two removed using the previous rule.

The table now looks like:

134  79  134  1578  14589  14  2  679  13469
1234  5  8  127  1249  6  3479  79  1349
6  79  124  3  1249  124  479  8  5
238  1  23  4  7  5  6  29  289
9  4  6  128  128  12  5  3  7
5  28  7  6  3  9  1  4  28
7  6  12345  125  1245  8  349  259  2349
234  3  2345  9  2456  7  8  1  2346
1248  28  9  125  12456  3  47  2567  246

With the changed cells in green. The fact that 3 is now in a cell on it's own means that we can eliminate all the other 3's in the same block or row.

134  79  134  1578  14589  14  2  679  13469
1234  5  8  127  1249  6  3479  79  1349
6  79  124  3  1249  124  479  8  5
238  1  23  4  7  5  6  29  289
9  4  6  128  128  12  5  3  7
5  28  7  6  3  9  1  4  28
7  6  1245  125  1245  8  349  259  2349
24  3  245  9  2456  7  8  1  246
1248  28  9  125  12456  3  47  2567  246

n this case there are no more pairs that can be used to simplify the grid, so we move onto phase 4, "Eliminating bad guesses".

Eliminating bad guesses

This phase is simple to describe but harder to implement. In the previous example, after exhausting phase 3 we are still left with:

134  79  134  1578  14589  14  2  679  13469
1234  5  8  127  1249  6  3479  79  1349
6  79  124  3  1249  124  479  8  5
238  1  23  4  7  5  6  29  289
9  4  6  128  128  12  5  3  7
5  28  7  6  3  9  1  4  28
7  6  1245  125  1245  8  349  259  2349
24  3  245  9  2456  7  8  1  246
1248  28  9  125  12456  3  47  2567  246

The next phase involves making guesses and observing the consequences. There are three possible outcomes:

  1. the search runs to completion with a solution
  2. the search leads to a contradiction
  3. the search is inconclusive

This first outcome at first glance appears to be ideal, but at this moment we don't know there is only one solution so we ignore it. Instead what we look for is outcome 2, the contradiction. When we find a contradiction we know that that guess must be wrong so we can remove that number from the possibilities.

Looking at the table above there are 141 possible guesses, starting with the top left cell being 1. This part of the algorithm involves making that guess and then applying the three previous phases to see where it leads us. In this case setting cell (1,1) to 1 leads to a contradiction so we can eliminate it leaving only 3 or 4 in the top left cell. Trying 3 causes no problems but 4 leads to a contradiction, so we can eliminate that too.

Now we only have 3 left so that must be the value in the top left cell. Fixing this value and applying the previous phases leads to the solution:

3  9  1  8  5  4  2  7  6
4  5  8  7  2  6  3  9  1
6  7  2  3  9  1  4  8  5
8  1  3  4  7  5  6  2  9
9  4  6  1  8  2  5  3  7
5  2  7  6  3  9  1  4  8
7  6  4  2  1  8  9  5  3
2  3  5  9  6  7  8  1  4
1  8  9  5  4  3  7  6  2

If all else fails

If we take the example from the start of phase 3 (Pairs) and remove the '6' on the fifth row we are left with this puzzle.

      2  
 58  6   
   3   85
 1 47 6  
9     5 7
  7 39 4 
76   8   
   9  81 
  9      

After applying the first 4 phases we are left with:

13479134781589452671346
12345812712963479379134
126792631291447985
81354725623939
93434616868125237
56275639148
76123412124583493592349
2345342345924567812346
14589125612456347567246

Phase 5 works by working through the possible values recursively until it finds a solution or a contradiction. It steps through the possible guesses sequentially. The first cell (1,1) can be 1, 3 or 4. Choosing 1 gives:

1793485945267346
3458726349391
2679263191447985
81354725623939
934346168125237
56275639148
7612458349359349
23453423459456781346
458915614563475672

The next guess is to try 7 in cell (1,2). This leads to:

173895264
45872639391
296314785
81547263939
934681527
627539148
76124839539
342957816
589163472

The third guess, setting (2,7) to 3 leads to a solution:

173895264
458726391
296314785
815472639
934681527
627539148
761248953
342957816
589163472

Stepping back one guess and trying (2,7) = 9 also leads to a solution. All in all there are 32 possibilities, see here.

Other pages

(c) John Whitehouse 2013 - 2020