Everything you need to build your own nm and otool
Recreate the nm and otool binaries using the C programming language. You will learn about the macho format, which defines what patterns your operating system expect for executables.
Understand how your computer compiles and executes binaries.
Did you ever asked how your computer decodes binaries ? Lately I wanted to learn this concept, and that’s why I decided to implement the nm and otool commands. In C, with the most basic functions, these two programs made me understand a lot about binaries and Unix. Those interested in the field might learn a few concepts here 🎓.
This article should have all the ressources needed to build your own implementations. I strongly advise you to try doing this project by yourself. You will gain a lot of skills exploring the man and system header files.
This implementation covers Mach-O, the current executable format for MacOS. Feel free to access the complete GitHub project below.
Executables
When the operating system starts a binary, it will expect the file to follow a predefined pattern. Each operating systems has their own conventions. In this article, we will focus on the Mach-O
format, the one used by modern MacOS computers. Other conventions exist, for example Linux mainly uses ELF
and Windows PE
. You can find a complete list here.
The following document gives you a complete reference in case you want to understand this in depth.
1st step: Identify a mach-o file
The first byte of a file usually defines its identity: it’s called the magic number. By comparing it to a list of known magic numbers, we can deduct if the file follows a Mach-O pattern. To get these constants you can include the file<mach-o/loader.h>
in your project.
// Defined in <mach-o/loader.h>
#define MH_MAGIC 0xfeedface
#define MH_CIGAM NXSwapInt(MH_MAGIC)
#define MH_MAGIC_64 0xfeedfacf
#define MH_CIGAM_64 NXSwapInt(MH_MAGIC_64)
Those 4 magic numbers identify mach-o files. They differ because of their structure size (32–64 bits) and their endianness.

Archives and fat binaries can also contain mach-o data.nm
andotool
are able to parse them, so we’ll talk briefly about it in the end of the article.
Nm and otool
So why are we implementing nm
and otool
?
Those commands are great to learn about mach-o
files because they parse, analyse their structure and then display the data.
nm
displays a list of symbols of an executable.otool
displays the hexdumped data of a specified segment. We will see what is a segment later.

Parsing the structure

Access the file
We will first access and read the file. I use a simple combinaison of open, fstat, mmap
and close
to get a pointer to the start of the data.

You should check the magic number against the previous predefined mach-o
magics.

The MacOS system gives us many header files that define for us the Mach-O structures and constants. We will use them in the following sections. Because nm
and otool
need to parse the same structures, we can code common functions.
The header
A mach-o
file always start with the following header:

It gives you many informations, like a cpu_type
( cpus able to run this executable), the filetype
, etc …
Load commands
The load commands divide data of the binary in multiple sections. You can get the complete list of types of load commands in the loader.h header, under LC_XXX names. For this article, you’ll only need the LC_SYMTAB
and LC_SEGMENT
commands.

Because the load commands are placed after each other, we can iterate thought them using their size.

otool
print the content of some sections in the command LC_SEGMENT
. For nm
, we have to match items in the command LC_SYMTAB
to their relatedLC_SEGMENT
section.
LC_SEGMENT — The segment command
The segment commands tell where to find a segment in memory, and the number of bytes to allocate for it. It also specifies the number of sections it contains.

At lc->fileoff
we find the start of the segment. It also start with a header, followed by the list of nsects
sections. A section is characterized by its section name (__text
for example) and segment name (__TEXT
for example), the address of its related data in memory, the data size, etc.

With these informations, we can iterate through them. With otool
, you have to hexdump the data at addr
. With nm
you must save the segment to match it later with a symbol in the SYMTAB
. For that we need a new parameter: the id of the segment, so dont forget to save it. For example, if it’s the first segment in the file, it’s id is 0, etc.

LC_SYMTAB — The symbol table command

A symtab_command
header is followed by a list of nlist
symbols.

To get the name of a symbol, we need to parse the strtab
. The nlist
structure also gives us many useful information.


What do we need to build a line for nm
? The first column shows the address, second one gives a letter describing the symbol type: for example T
for an exported methods, U
are external methods. The complete list is available below.
Here is how to get the representation for a symbol.

When the N_SECT mask is true with sect->type
, we must find the type based on the given segment. Remember you saved the id of our sections ? You can use it here 😉

Go further
Look at you ! You should now be able to build you own nm and otool 😎. But wait, if you’re serious about this project you still need to handle some edge cases. I will briefly talk about 4 of them.
Archives and fat files
fat binary, multi-architecture binary,
Parsing those files is not complicated if you followed the previous steps. The headers are available at <mach-o/fat.h>
and <ar.h>
. The process is the same.
Support for little/big endian
Variables stored on the headers might differs on how they’re stored. When you read the values of integers, you sometimes need to reverse their bits order.
Support for 64 and 32 bits
Sometimes the header will give you a 64 bits integer, be prepared to handle it.
Secure against corrupt files 🏴☠️
This part is a bit more complicated and requires some testing. I always consider that a program should never segfault
.
In the case you receive a corrupted binary, the program could try to access a memory location that is not available. For every time you move a pointer based on file values, I suggest you to check if it never goes before the start of file, or after the end of it.
I’m starting a new website called myopen.market. It’s still in a early stage, but if you found this article useful, subscribing to its newletters would be the best way to thanks me ❤️
