ADDENDUM: FAST FORWARD - 2020

As already discussed, compilers take semi-readable, higher level code as input, and produce machine language as output.

Microprocessors such as the Arduino are small enough and single-purpose enough that they neither want nor need an operating system. However, the Raspberry Pi is sophisticted and capable enough to want one.

When you have an operating system, code needs to negotiate with the operating system for most tasks. The operating system is a traffic cop, hotel booking agent, and lots more. Among other things, the operating system has to:

  • tell programs when to pause and when to resume, in order to allow other programs to run,

  • keep an inventory of available disk space and allocate it appropriately,

  • watch for service requests from programs and determine which programs should answer those requests,

  • protect programs and data,

and more.

What appears to be simple input from a keyboard or output to a screen is more complex when there’s an operating system between the code and the hardware. So, in order to keep the generated machine code somewhat readable, the following C program does not actually attempt input or output, and thus, does not add a lot of operating system-dependent code. Instead, it mimics the first program shown in the Altair 8800 Operator’s Manual.

void main()
{
  char a =  5;
  char b = 10;

  a = a + b;
}

By using char instead of int the values are constrained to 8 bits, just as on the Altair 8800.

If the above program is saved as add.c it can be compiled, and an assembler language / machine language listing can be produced from the executable object code file produced by the compiler.

IMPORTANT DETAILS: To understand the assembler language and machine code produced, you need to have an idea of the hardware that the compiler is targeting. Typically, this will be the hardware that is running the compiler, but not always. For example, the Arduino IDE (Integrated Development Environment) is run on non-Arduino processors, but produces Arduino machine code.

To a lesser degree, it is also useful to know the version of the operating system kernel, the version of the compiler, and the “dialect” of the assembler language being used.

In the following example, those details are:

Processors:

8 × Intel® Core™ i7-2960XM CPU @ 2.70GHz

Memory:

31.3 GiB of RAM

OS Type:

Linux 64-bit

Kernel Version:

5.4.0-48-lowlatency

Compiler:

GNU C Compiler (gcc) version 9.3.0

From the Bash prompt:

$ gcc -g -c add.c
$ objdump -drwC -Mintel add.o

(There are other ways to produce an assembler language / machine language listing. For example, you can ask GCC to generate it during the initial compilation. However, the objdump method produces a more compact, less verbose output.)

Compilers take source code and produce object files. Depending upon both the source code and the way that the source code is compiled, the object file may be a non-executable library of functions and subroutines that are linked to one or more main programs, or they may be complete, executable programs. Shared object libraries (.so) on Linux and dynamically linked libraries (.dll) on Windows are examples of the former. For example, such a library may contain efficient implementations of math algorithms, or special graphics functions.

Object files can, and on most systems do, contain more than just the machine language. They can contain a “data” segment that holds all of the constants used in a program, as well as information on where in memory to put both the data and the executable code. On Linux (and other Unix-like systems) object files are stored in Executable and Linkable Format more commonly referred to by its acronym ELF.

objdump reverse-engineers the binary object file, attempting to change it back from unprintable gobblty-gook into a human-readable form – though some information, like the original variable names, can be lost.

objdump has no preconceived notion of what it is being asked to do. The option -Mintel in the command explicitly states that objdump should assume that the instruction set being used is Intel and therefore the machine language should be translated back into Intel X86 architecture mnemonics.

The output of the objdump is:

add.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <main>:
   0:        f3 0f 1e fa             endbr64
   4:        55                      push   rbp
   5:        48 89 e5                mov    rbp,rsp
   8:        c6 45 fe 05             mov    BYTE PTR [rbp-0x2],0x5
   c:        c6 45 ff 0a             mov    BYTE PTR [rbp-0x1],0xa
  10:        0f b6 55 fe             movzx  edx,BYTE PTR [rbp-0x2]
  14:        0f b6 45 ff             movzx  eax,BYTE PTR [rbp-0x1]
  18:        01 d0                   add    eax,edx
  1a:        88 45 fe                mov    BYTE PTR [rbp-0x2],al
  1d:        90                      nop
  1e:        5d                      pop    rbp
  1f:        c3                      ret

Using Wikipedia’s X86 architecture entry, the above can be interpreted as:

Registers used

Mnemonic

Name

rbp

Base Pointer

rsp

Stack Pointer

eax

Accumulator (low 32 bits only)

edx

Data (low 32 bits only)

al

Accumulator (low 8 bits only)

  1. The endbr64 is beyond the scope of this document. The curious are referred to the StackOverflow question What does endbr64 instruction actually do? For all intents and purposes, it is a nop in this code and can be safely ignored.

  2. Save the current state of the Base Pointer (rbp) by pushing it onto a stack. The Base Pointer points to an area where data, rather than code is normally stored.

  3. Move the value of the Stack Pointer (rsp) into into the Base Pointer (rbp). I believe this is a compiler optimization: Due to the relatively little amount of data, as well as the small size of each data element, the compiler (I think) has chosen to use the stack as a data storage area rather than the memory area normally allocated to data. The stack is smaller than the data area, but also faster to access.

  4. Move the value 5 (0x5) to the memory location pointed to by rbp - 2. This is the char a = 5; from the C program.

  5. Move the value 10 (0xa) to the location pointed to by rbp - 1. This corresponds to char b = 10; in the C program.

  6. Move a into the low 32 bits of the Accumulator (eax) and b into the low 32 bits of the Data Register (edx). Even though the original code specifies 8-bit quantities, the compiler has chosen to allow for quantities up to 32 bits. (Suppose, for example, instead of adding 10 to 5, the original code raised 5 to the 10th power. Both quantities, 5 and 10 would still fit in 8 bits, but the result would not.)

  7. Add the contents of the Data Register (edx) to the contents of the Accumulator Register (eax) The assignment hasn’t been made yet, but this is the addition a + b. The result lives in the Accumulator but has not yet been stored in memory.

  8. Move only the low 8 bits of the Accumulator (al) to rbp - 2, which is where a was stored in step 3. Now the assignment has been made (a = ...). Together with the previous step, a = a + b; has now been completed.

  9. The nop appears unnecessary, but the compiler probably generates it so that the pop statement falls on an even-numbered memory address. Depending upon the architecture of the hardware, and features of the operating system, careful byte-alignment of code is often more efficient, and sometimes absolutely necessary.

  10. The pop cleans up memory by resetting the rbp back to its initial state. The rbp can then be used by the next program.

  11. ret. When you start a program by typing its name at the Bash prompt, you are, in effect, calling the program as you would a subroutine. At this point in the code, the return will return control back to Bash (which is a constantly running program which awaits to do your bidding and dispatch tasks that it cannot handle to other programs).

A much more thorough coverage can be found in the official Intel reference documents. However, ten volumes is a bit much to tackle.

There are other ways to produce assembly language / machine language listings, but the above was the least wordy method I could find.

The two most popular assemblers on Linux are the GNU Assembler (gas a.k.a. as) and the Netwide Assembler (nasm). Both use many of the same mnemonics, and produce the same machine code, but offer slight variations in syntax (e.g. gas uses # as the comment delimiter, while nasm uses ; as the comment delimiter).

gas is tightly woven into the fabric of of Linux, as part of the GNU Compiler Collection (gcc) nee the GNU C Compiler) which, in addition to gas includes C, FORTRAN, C++, Go, and Java among others.

On Debian-like systems, gas is part of the binutils package. I would also suggest installing the gcc package if you do not already have it. nasm is its own package.

You can assemble and run a “Hello World” in the dialect of your choice (after ensuring the appropriate packages are installed) by looking at the comments in the two examples below:

“Hello World” examples

Assembler

Source code

nasm

hello.asm

gas

hello.s

See also: