In the mid 1990’s I was shown a snippet of C source code which remains what I consider to be the most beautiful code I have ever seen. The code was introduced to me as:
It’s a devastatingly deviously unrolled byte-copying loop, devised by Tom Duff while he was at Lucasfilm. In it’s “classic” form, it looks like:
1 2 3 4 5 6 7 8 9 10 11 12 |
register n = (count + 7) / 8; /* count > 0 assumed */ switch (count % 8) { case 0: do { *to = *from++; case 7: *to = *from++; case 6: *to = *from++; case 5: *to = *from++; case 4: *to = *from++; case 3: *to = *from++; case 2: *to = *from++; case 1: *to = *from++; } while (--n > 0); } |
The code is described elsewhere on the internet as Duff’s Device.
The code implements an 8 ways unrolled loop which copies a block of data to a single memory address which is likely to be a port address. Typically, unrolled loops do cleanup after the main body of the loop has executed. This code uses the fall-through behaviour of C’s switch statement to do the cleanup on the way into the loop which makes the code more compact.
1 |
switch(count % 8) { |
C’s post increment operator is used to identify the address of the source data for copying and then advances the next address after the copy. This is done in one expression.
1 |
*from++; |
The number of times needed through the body of the loop is calculated once before entering the loop.
1 |
register n = (count + 7) / 8; /* count > 0 assumed */ |
And using C’s pre decrement operator, decrements and tests the count each time through the loop.
1 |
while (--n > 0); |
Given the tight coupling between C and the hardware that it is compiled to run on, this code should be amazingly fast on most architectures.
I tip my hat to Tom Duff for his beautiful, and fast, code to copy data.