Undocumented Instructions on the Intel 8086

johnwalker · 21 July 2023 13:45

The Intel 8086 microprocessor, introduced in June, 1978, was Intel’s first 16-bit microprocessor, seen as the logical upgrade to their widely-used 8080 8-bit processor. A year later, Intel introduced the 8088, which was identical to the 8086 but used an 8-bit data bus, which simplified and reduced the cost of the external interface to the chip (memory and input/output) and was chosen as the processor for the IBM Personal Computer in 1981.

Intel viewed the 8086 largely as an “interim hack” to fill the 16-bit market segment which other semiconductor manufacturers such as Texas Instruments, Motorola, Zilog, and National Semiconductor were planning to enter. (Intel’s long term plans were based upon the iAPX 432 “micro-mainframe”, which was a technological and marketing debacle of epic proportions, but that is a story for another day.) To make it easier for users of the 8080 to migrate to the 8086, which was faster and allowed addressing 16 times as much memory, Intel carried over the bizarre architecture and curious collection of instructions from the earlier chip into the 8086 which, while not electrically or binary program compatible, could run assembly language code written for the 8080 simply by rebuilding with development tools that targeted the 8086.

Things in the real world did not, of course, go as Intel had planned, and the 8086 architecture, now called “x86”, went on to dominate the industry to the present day, carrying on all of its eccentricities and quirks through 13 generations of processors, used in everything from laptop computers to some of the fastest supercomputers in the world.

But the original 8086 was a hack, thrown together to meet an immediate need. Its designers were not building a product for the ages with all the rough edges scrubbed off—that was the rôle of the iAPX 432—and they had to carry over everything, however regrettable and short-sighted, from the 8080. Let’s look at some of those rough edges.

Ken Shirriff has been reverse-engineering the Intel 8086 at the transistor level. In a recent post, “Undocumented 8086 instructions, explained by the microcode”, he pulls back the cobwebs to explore the underside of the 8086—what Intel didn’t tell its customers.

Each instruction of a computer processor: add, subtract, shift, test, jump, etc., is encoded as an “operation code”: a number which instructs the processor what to do. On the 8086, instructions may be one or two bytes in length, some of which may be preceded by prefix bytes that modify the operation. If you plot all of these codes on a chart, you’ll discover there are “holes”—numbers for which no operation has been assigned. What if you try them anyway, as curious programmers are guaranteed to do? On most modern processors, designed in the Age of Extravagance where transistors cost essentially nothing, you’ll get an “illegal instruction” interrupt which usually causes the operating system to kill the program, but on the 8086, which had a total of only 29,000 transistors, every one of which had to be placed by hand and cut from rubylith by its designers and drafters, this was a luxury they couldn’t afford, and often these undefined operations just did whatever fell out of the logic.

Ken Shirriff’s investigation of these details is a masterpiece of reverse engineering and explanation, showing in detail how the processor’s microcode executed the defined instructions and what it did and why it did it when presented with one that wasn’t in the book. Here is a particularly curious one.

0xD6: SALC

The opcode D6 performs a well-known but undocumented operation that is typically called SALC, for Set AL to Carry. This instruction sets the AL register to 0 if the carry flag is 0, and sets the AL register to FF if the carry flag is 1. The curious thing about this undocumented instruction is that it exists in all x86 CPUs, but Intel didn’t mention it until 2017. Intel probably put this instruction into the processor deliberately as a copyright trap. The idea is that if a company created a copy of the 8086 processor and the processor included the SALC instruction, this would prove that the company had copied Intel’s microcode and thus had potentially violated Intel’s copyright on the microcode. This came to light when NEC created improved versions of the 8086, the NEC V20 and V30 microprocessors, and was sued by Intel. Intel analyzed NEC’s microcode but was disappointed to find that NEC’s chip did not include the hidden instruction, showing that NEC hadn’t copied the microcode. Although a Federal judge ruled in 1989 that NEC hadn’t infringed Intel’s copyright, the 5-year trial ruined NEC’s market momentum.

In addition to undefined instructions, there’s also the matter of what happens if you give an instruction a value beyond the range it’s defined to accept. For example, consider a circular bit shift instruction: what happens if you specify a shift count larger than the size of the register being shifted? I once used this in the 1970s to distinguish among three generations of Univac mainframes. On the 8086 there is a particularly cute example, motivated by limited microcode memory, but tempting to programmers who discovered it.

AAM: ASCII Adjust after Multiply

The AAM instruction is a fairly obscure one, designed to support binary-coded decimal arithmetic (BCD). After multiplying two BCD digits, you end up with a binary value between 0 and 81 (0×0 to 9×9). If you want a BCD result, the AAM instruction converts this binary value to BCD, for instance splitting 81 into the decimal digits 8 and 1, where the upper digit is 81 divided by 10, and the lower digit is 81 modulo 10.

The interesting thing about AAM is that the 2-byte instruction is D4 0A. You might notice that hex 0A is 10, and this is not a coincidence. There wasn’t an easy way to get the value 10 in the microcode, so instead they made the instruction provide that value in the second byte. The undocumented (but well-known) part is that if you provide a value other than 10, the instruction will convert the binary input into digits in that base. For example, if you provide 8 as the second byte, the instruction returns the value divided by 8 and the value modulo 8.

There is far, far more in the the analysis. Read the whole thing.

At least, in our enlightened age, we no longer speak of “illegal instructions”, but rather “undocumented features”.