Quickly code your first assembly functions (Intel x86–64 syntax)

Learn the basics of assembly programming

Quickly code your first assembly functions (Intel x86–64 syntax)
Photo by Will Porada / Unsplash
And understand the language your processor speaks

While C could be considered a low level language, its syntax still abstracts a lot of the instructions made by your processor. That’s why you might want to learn about the lower level programming languages, like Assembly.

This article will give you the basic tools to understand Assembly functions so that you can create your own 👷‍♂️.

Getting started

The language differs depending of your processor’s architecture. In the following sections, I will focus on the Intel x86–64 syntax, used by most of the desktop computers.

The tools

You can write your assembly code inside .s files and compile them with the nasm compiler.

Source code
The -f argument let you specify the format of the output. In this case macho64 is the format of modern macOS executables.

Debug

Once you begin coding your own functions, I highly recommend you to get familiar with debugging tools such as lldb and gdb.

Nick Desaulniers
I’m hacking on an assembly project, and wanted to document some of the tricks I was using for figuring out what was…

The language

The assembly language describes the succession of commands your processor will execute. One line means one instruction.

Machine code

One common mistake is to think the assembly language is the same as machine code. Machine code is the binary executed by the computer. Assembly still requires compilation, has the advantage to be more readable by humans, and is better for structuring your program (with variable names, macros, sections, etc).

The file structure and static data regions

A .s file can be divided by 4 types of sections: data, rodata, bss, text . If none is given, the default is text.

Source code

The 3 static data regions are described above. The .text section stores the code logic. The next sections will describe how you can build that logic.

Access memory

At your disposition you have 3 types of memory: registers, memory addresses (RAM) and constants. Register are very fast but you only have a dozen at your disposition. Memory addresses can store a lot of data but are slower.

General registers

rax, rbx, rcx, rdx They are used by most of the instructions. Their names don’t really matter but here are their signification.

  • a: Accumulator register
  • b: Base register
  • c: Counter register
  • d: Data register

Indexes registers

  • EDI: Destination register
  • ESI: Source register
  • EBP: Stack base Pointer
  • ESP: Stack pointer register
  • EIP: Index pointer, saves the pointer of the next instruction

Segment registers

You probably won’t need them for simple functions: CS(code segment), DS(data segment), SS(stack segment), ES FS GS

Instructions

Each line represents one instruction. Usually you have the command identifier followed by the arguments.

CMD ARG1, ARG2

Moving values

mov — <dst> [reg, mem], <src> [reg, mem, const]

It copies the data item referred in src into the location of dst. However you cannot copy data between 2 memory addresses mov mem0, mem1( you can do it by using two instruction and one register).

push — <data> [reg, mem, const]

Push data to the stack (more about the stack).

pop — <dst> [reg, mem]

Pop the first value available in the stack to dst.

lea — <dst> [reg], [<src>] [mem]

Stands for “load effective address”. It saves the address of src in the given register.

Mathematical operations

add — <dst> [reg], <src> [reg]

Adds dst and src and saves the result in dst.

sub — <dst> [reg], <src> [reg]

Substract dst and src and saves the result in dst.

dec — <dst> [reg]

Decreases by 1 the given register.

inc — <dst> [reg]

Increases by 1 the given register.

Control Flow Instructions

call — <function_name>— Call a function

jmp — <dst_location> — Jump to a section

Transfers the execution to the instruction given in the operand.

j<condition> — <dst_location> — Jump to a section if conditions are met

Same as jmp but only if the condition is true. The complete condition list is available here.

Testing and comparing values

cmp — <reg0> [reg], <reg1> [reg] — Set conditions codes for reg1-reg0

test — <reg0> [reg], <reg1> [reg] — Set conditions codes for reg0 & reg1

ft_isalnum example

ret — End the function

The calling conventions

Many conventions exists. For example arguments are usually passed in this order %rdi, %rsi, %rdx, %rcx, %r8 and %r9. For more detail, access the link below.

What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
Following links explain x86-32 system call conventions for both UNIX (BSD flavor) &amp; Linux: http://www.int80h.org/bsdasm/#system-callshttp://www.freebsd.org/doc/en/books/developers-handbook/x86-

Syscalls

The kernel gives you many system calls. The complete list is here.

Some basic examples

ft_isascii

Source code

Memory and the stack

This subject would take an entire medium article. I’ll update this article once i’ve done it.

Ressources for development

I implemented many basic functions in Assembly. If you want to have a look here is the repo 😊.

GitHub - jterrazz/42-libft-asm: 🖥 Basic functions implemented in Assembly using the x86 Intel syntax. A medium article is available in description.
🖥 Basic functions implemented in Assembly using the x86 Intel syntax. A medium article is available in description. - GitHub - jterrazz/42-libft-asm: 🖥 Basic functions implemented in Assembly using…

I’m starting a new website called myopen.market. It’s still in a early stage, but if you found this article useful, subscribing to its newletters would be the best way to encourage me ❤️

My Open Market
A plateform for connecting shops and customers using ethereum smart contracts. myopen.market