ADDENDUM: FAST FORWARD - 2020¶
As already discussed, compilers take semi-readable, higher level code as input, and produce machine language as output.
Microprocessors such as the Arduino are small enough and single-purpose enough that they neither want nor need an operating system. However, the Raspberry Pi is sophisticted and capable enough to want one.
When you have an operating system, code needs to negotiate with the operating system for most tasks. The operating system is a traffic cop, hotel booking agent, and lots more. Among other things, the operating system has to:
tell programs when to pause and when to resume, in order to allow other programs to run,
keep an inventory of available disk space and allocate it appropriately,
watch for service requests from programs and determine which programs should answer those requests,
protect programs and data,
and more.
What appears to be simple input from a keyboard or output to a screen
is more complex when there’s an operating system between the code and
the hardware. So, in order to keep the generated machine code somewhat
readable, the following C
program does not actually attempt input
or output, and thus, does not add a lot of operating system-dependent
code. Instead, it mimics the first program shown in the Altair 8800
Operator’s Manual.
void main()
{
char a = 5;
char b = 10;
a = a + b;
}
By using char
instead of int
the values are constrained to 8
bits, just as on the Altair 8800.
If the above program is saved as add.c
it can be compiled, and an
assembler language / machine language listing can be produced from the
executable object code file produced by the compiler.
IMPORTANT DETAILS: To understand the assembler language and machine code produced, you need to have an idea of the hardware that the compiler is targeting. Typically, this will be the hardware that is running the compiler, but not always. For example, the Arduino IDE (Integrated Development Environment) is run on non-Arduino processors, but produces Arduino machine code.
To a lesser degree, it is also useful to know the version of the operating system kernel, the version of the compiler, and the “dialect” of the assembler language being used.
In the following example, those details are:
Processors: |
8 × Intel® Core™ i7-2960XM CPU @ 2.70GHz |
Memory: |
31.3 GiB of RAM |
OS Type: |
Linux 64-bit |
Kernel Version: |
5.4.0-48-lowlatency |
Compiler: |
GNU C Compiler (gcc) version 9.3.0 |
From the Bash
prompt:
$ gcc -g -c add.c
$ objdump -drwC -Mintel add.o
(There are other ways to produce an assembler language / machine
language listing. For example, you can ask GCC to generate it during
the initial compilation. However, the objdump
method produces a
more compact, less verbose output.)
Compilers take source code and produce object files. Depending upon
both the source code and the way that the source code is compiled, the
object file may be a non-executable library of functions and
subroutines that are linked to one or more main programs, or they
may be complete, executable programs. Shared object libraries
(.so
) on Linux and dynamically linked libraries (.dll
) on
Windows are examples of the former. For example, such a library may
contain efficient implementations of math algorithms, or special
graphics functions.
Object files can, and on most systems do, contain more than just the machine language. They can contain a “data” segment that holds all of the constants used in a program, as well as information on where in memory to put both the data and the executable code. On Linux (and other Unix-like systems) object files are stored in Executable and Linkable Format more commonly referred to by its acronym ELF.
objdump
reverse-engineers the binary object file, attempting to
change it back from unprintable gobblty-gook into a human-readable
form – though some information, like the original variable names, can
be lost.
objdump
has no preconceived notion of what it is being asked to
do. The option -Mintel
in the command explicitly states that
objdump
should assume that the instruction set being used is
Intel and therefore the machine language should be translated back
into Intel X86 architecture mnemonics.
The output of the objdump
is:
add.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: f3 0f 1e fa endbr64
4: 55 push rbp
5: 48 89 e5 mov rbp,rsp
8: c6 45 fe 05 mov BYTE PTR [rbp-0x2],0x5
c: c6 45 ff 0a mov BYTE PTR [rbp-0x1],0xa
10: 0f b6 55 fe movzx edx,BYTE PTR [rbp-0x2]
14: 0f b6 45 ff movzx eax,BYTE PTR [rbp-0x1]
18: 01 d0 add eax,edx
1a: 88 45 fe mov BYTE PTR [rbp-0x2],al
1d: 90 nop
1e: 5d pop rbp
1f: c3 ret
Using Wikipedia’s X86 architecture entry, the above can be interpreted as:
Mnemonic |
Name |
---|---|
|
Base Pointer |
|
Stack Pointer |
|
Accumulator (low 32 bits only) |
|
Data (low 32 bits only) |
|
Accumulator (low 8 bits only) |
The
endbr64
is beyond the scope of this document. The curious are referred to the StackOverflow question What does endbr64 instruction actually do? For all intents and purposes, it is anop
in this code and can be safely ignored.Save the current state of the Base Pointer (
rbp
) by pushing it onto a stack. The Base Pointer points to an area where data, rather than code is normally stored.Move the value of the Stack Pointer (
rsp
) into into the Base Pointer (rbp
). I believe this is a compiler optimization: Due to the relatively little amount of data, as well as the small size of each data element, the compiler (I think) has chosen to use the stack as a data storage area rather than the memory area normally allocated to data. The stack is smaller than the data area, but also faster to access.Move the value 5 (
0x5
) to the memory location pointed to byrbp - 2
. This is thechar a = 5;
from the C program.Move the value 10 (
0xa
) to the location pointed to byrbp - 1
. This corresponds tochar b = 10;
in the C program.Move
a
into the low 32 bits of the Accumulator (eax
) andb
into the low 32 bits of the Data Register (edx
). Even though the original code specifies 8-bit quantities, the compiler has chosen to allow for quantities up to 32 bits. (Suppose, for example, instead of adding 10 to 5, the original code raised 5 to the 10th power. Both quantities, 5 and 10 would still fit in 8 bits, but the result would not.)Add the contents of the Data Register (
edx
) to the contents of the Accumulator Register (eax
) The assignment hasn’t been made yet, but this is the additiona + b
. The result lives in the Accumulator but has not yet been stored in memory.Move only the low 8 bits of the Accumulator (
al
) torbp - 2
, which is wherea
was stored in step 3. Now the assignment has been made (a = ...
). Together with the previous step,a = a + b;
has now been completed.The
nop
appears unnecessary, but the compiler probably generates it so that thepop
statement falls on an even-numbered memory address. Depending upon the architecture of the hardware, and features of the operating system, careful byte-alignment of code is often more efficient, and sometimes absolutely necessary.The
pop
cleans up memory by resetting therbp
back to its initial state. Therbp
can then be used by the next program.ret
. When you start a program by typing its name at the Bash prompt, you are, in effect, calling the program as you would a subroutine. At this point in the code, thereturn
will return control back to Bash (which is a constantly running program which awaits to do your bidding and dispatch tasks that it cannot handle to other programs).
A much more thorough coverage can be found in the official Intel reference documents. However, ten volumes is a bit much to tackle.
There are other ways to produce assembly language / machine language listings, but the above was the least wordy method I could find.
The two most popular assemblers on Linux are the GNU Assembler
(gas
a.k.a. as
) and the Netwide Assembler (nasm
). Both
use many of the same mnemonics, and produce the same machine code, but
offer slight variations in syntax (e.g. gas
uses #
as the comment
delimiter, while nasm
uses ;
as the comment delimiter).
gas
is tightly woven into the fabric of of Linux, as part of the
GNU Compiler Collection (gcc
) nee the GNU C Compiler) which, in
addition to gas
includes C
, FORTRAN
, C++
, Go
, and
Java
among others.
On Debian-like systems, gas
is part of the binutils
package. I
would also suggest installing the gcc
package if you do not
already have it. nasm
is its own package.
You can assemble and run a “Hello World” in the dialect of your choice (after ensuring the appropriate packages are installed) by looking at the comments in the two examples below:
See also: