The HelloWorld application is much simpler than the Windows one. Just put parameters into registers from %eax to %edx, and trigger a 0x80 interrupt.
Tag Archives: assembly
Windows System Call Sequence and Simulation
There are hundreds of documents telling how Windows implements its system call, using int 2e
or sysenter
. But I can find no code to run to learn how exactly it works. And I managed to write it for my own.
The C code requires only SDK to compile, for I have copied all DDK definitions inline. It opens a C:\test.txt
file and write Hello World!
to it. Quite simple. I’ve tried a HelloWorld console application. But its call sequence is far more complex than I have expected, after I have made some reverse engineering and read some code from ReactOS project(Wine does not help, since it does not implement a Win32 compatible call sequence in the console case). The code is the basis of our further investigation. It invokes NtCreateFile()
, NtWriteFile()
and NtClose()
in ntdll.dll
with dynamic loading:
I found the handle value and all three function pointers are fixed, at least on my Windows XP(SP3). It may be caused by the preferred base address of ntdll.dll
. The code should work on all Windows platforms, since it has no hardcoded values.
Now, translate the C code into assembly. Error handling is ommitted:
Compile the code with:
The assembly code of NtCreateFile()
, NtWriteFile()
and NtClose()
are copied directly from ntdll.dll
. For NtCreate()
, 25h
is the system service number that will be used to index into the KiServiceTable
(SSDT, System Service Dispatch Table) to locate the kernel function that handles the call.
System service numbers vary between Windows versions. This is why they are not recommend to be used directly to invoke system calls. I only demonstrate the approach here. For Windows XP, the values of the three numbers are 25h
, 112h
and 19h
. While for Windows 7, they are 42h
, 18ch
and 32h
. Change them yourself if you’re running Windows 7. For a complete list of system service numbers, refer here or dissemble your ntdll.dll
manually :). The output executable is a tiny one, only 3KB in size, since it eliminates the usage of CRT. Moreover, it has an empty list of import functions!
At 7ffe0300h is a pointer to the following code:
NOTE: The assembly code may work only when compiled to a 32-bit application. 64-bit mode is not tested and need modification to work.
One last point, it seems the STR_HELLO
string is required to be aligned to 8 byte border. Otherwise, you will get 0x80000002
error code(STATUS_DATATYPE_MISALIGNMENT
).
Compiler Intrinsic Functions
Copied from Wikipedia:
An intrinsic function is a function available for use in a given programming language whose implementation is handled specially by the compiler. Typically, it substitutes a sequence of automatically generated instructions for the original function call, similar to an inline function. Unlike an inline function though, the compiler has an intimate knowledge of the intrinsic function and can therefore better integrate it and optimize it for the situation. This is also called builtin function in many languages.
A code snippet is written to check the code generation when intrinsic is enabled or not:
Generated assembly:
Only printf()
is in code. No abs()
nor memcpy()
. Since they are intrinsic, as listed here in gcc’s online document.
Intrinsic can be explicitly disabled. For instance, CRT intrinsic must be disabled for kernel development. Add -fno-builtin
flag to gcc, or remove /Oi
switch in MSVC. Only paste the generated code in gcc case here:
There _are_ abs()
and memcpy()
now. General MSVC intrinsic can be found here.
Intrinsic is easier than inline assembly. It is used to increase performance in most cases. Both gcc and MSVC provide intrinsic support for Intel’s MMX, SSE and SSE2 instrument set. Code snippet to use MMX:
Assembly looks like:
You see MMX registers and instruments this time. -mmmx
flag is required to build for gcc. MSVC also generate similar code. Reference for these instrument set is available on Intel’s website.
A simple benchmark to use SSE is avalable here.
Jump Instruments and EFLAGS
There was a misleading in my knowledge of a conditional jump: It checks only the result of CMP
and TEST
instruments. So when it appears after other instruments like ADD
or SUB
, I can find no clue on how it works.
Actually, a conditional jump checks flags in the EFLAGS control register. From Intel’s manual, vol 1, 3.4.3:
The status flags (bits 0, 2, 4, 6, 7, and 11) of the EFLAGS register indicate the results of arithmetic instructions, such as the ADD, SUB, MUL, and DIV instructions. The status flag functions are:
CF (bit 0) Carry flag: Set if an arithmetic operation generates a carry or a borrow out of the most-significant bit of the result; cleared otherwise. This flag indicates an overflow condition for unsigned-integer arithmetic. It is also used in multiple-precision arithmetic.
PF (bit 2) Parity flag: Set if the least-significant byte of the result contains an even number of 1 bits; cleared otherwise.
AF (bit 4) Adjust flag: Set if an arithmetic operation generates a carry or a borrow out of bit 3 of the result; cleared otherwise. This flag is used in binary-coded decimal (BCD) arithmetic.ZF (bit 6) Zero flag: Set if the result is zero; cleared otherwise.
SF (bit 7) Sign flag: Set equal to the most-significant bit of the result, which is the sign bit of a signed integer. (0 indicates a positive value and 1 indicates a negative value.)
OF (bit 11) Overflow flag: Set if the integer result is too large a positive number or too small a negative number (excluding the sign-bit) to fit in the destination operand; cleared otherwise. This flag indicates an overflow condition for signed-integer (two’s complement) arithmetic.
And again from vol 2a, section Jcc Jump if Condition is met, more details. I just copy content from here:
Instruction | Description | signed? | Flags | short jump opcodes |
near jump opcodes |
---|---|---|---|---|---|
JO | Jump if overflow | OF = 1 | 70 | 0F 80 | |
JNO | Jump if not overflow | OF = 0 | 71 | 0F 81 | |
JS | Jump if sign | SF = 1 | 78 | 0F 88 | |
JNS | Jump if not sign | SF = 0 | 79 | 0F 89 | |
JE JZ |
Jump if equal Jump if zero |
ZF = 1 | 74 | 0F 84 | |
JNE JNZ |
Jump if not equal Jump if not zero |
ZF = 0 | 75 | 0F 85 | |
JB JNAE JC |
Jump if below Jump if not above or equal Jump if carry |
unsigned | CF = 1 | 72 | 0F 82 |
JNB JAE JNC |
Jump if not below Jump if above or equal Jump if not carry |
unsigned | CF = 0 | 73 | 0F 83 |
JBE JNA |
Jump if below or equal Jump if not above |
unsigned | CF = 1 or ZF = 1 | 76 | 0F 86 |
JA JNBE |
Jump if above Jump if not below or equal |
unsigned | CF = 0 and ZF = 0 | 77 | 0F 87 |
JL JNGE |
Jump if less Jump if not greater or equal |
signed | SF <> OF | 7C | 0F 8C |
JGE JNL |
Jump if greater or equal Jump if not less |
signed | SF = OF | 7D | 0F 8D |
JLE JNG |
Jump if less or equal Jump if not greater |
signed | ZF = 1 or SF <> OF | 7E | 0F 8E |
JG JNLE |
Jump if greater Jump if not less or equal |
signed | ZF = 0 and SF = OF | 7F | 0F 8F |
JP JPE |
Jump if parity Jump if parity even |
PF = 1 | 7A | 0F 8A | |
JNP JPO |
Jump if not parity Jump if parity odd |
PF = 0 | 7B | 0F 8B | |
JCXZ JECXZ |
Jump if %CX register is 0 Jump if %ECX register is 0 |
%CX = 0 %ECX = 0 |
E3 | E3 |
There are signed and unsigned versions when comparing: JA
Vs JG
, JB
Vs JL
etc.. Let’s take JA
and JG
to explain the difference. For JA
, it’s clear that it requires CF=0(no borrow bit) and ZF=0(not equal). For JG
, when two operands are both positive or negative, it requires ZF=0 and SF=OF=0. When two operands have different signs, it requires ZF=0 and the first operand is positive, thus requires SF=OF=1.
Note, the following 2 lines(AT&T syntax) are equivalent. CPU does arithmetic calculation, it does not care about whether it is signed or unsigned. It only set flags. It is we that make the signed or unsigned jump decision.
Last, I’d like to use ndisasm
(install nasm
package to get it) to illustrate how jump instruments are encoded, including short jump, near jump and far jump: