June 12, 2016

OSx86_64 `hello world` shellcode

Shellcode, always fun to write when you've been away from your assemby for a while. On OSX quite a few things are different from Linux, mostly systemcall related issues.  If this is your first time, you should check the various hello world examples, there are several guides out there, i.e Writing your own shellcode some hello world are available at github, there is even a detailed guide here.

If you're writing your assembly on OSX you'll need to verify that your assembler understands the Macho64 format. This is done by checking the version of Nasm you have

nasm -v

NASM version 0.98.40 (Apple Computer, Inc. build 11) compiled on Feb 10 2016

This is a rather outdated version that came to my system with the xcode tools. It doesn't support Macho64. So you'll need to get the latest Nasm using either brew or MacPorts.

$ brew install nasm
==> Downloading https://homebrew.bintray.com/bottles/nasm-2.12.01.el_capitan.bottle.tar.gz
######################################################################## 100.0%
==> Pouring nasm-2.12.01.el_capitan.bottle.tar.gz
🍺  /usr/local/Cellar/nasm/2.12.01: 29 files, 2M
cannabissen@titanium:~/Documents/src/asm/$ nasm -v
NASM version 2.12.01 compiled on Mar 23 2016

or

$ sudo port install nasm
Password:
Warning: port definitions are more than two weeks old, consider updating them by running 'port selfupdate'.
--->  Fetching archive for nasm
--->  Attempting to fetch nasm-2.12_0.darwin_15.x86_64.tbz2 from http://osl.no.packages.macports.org/nasm
--->  Attempting to fetch nasm-2.12_0.darwin_15.x86_64.tbz2.rmd160 from http://osl.no.packages.macports.org/nasm
--->  Installing nasm @2.12_0
--->  Activating nasm @2.12_0
--->  Cleaning nasm
--->  Updating database of binaries
--->  Scanning binaries for linking errors
--->  No broken files found.                             
cannabissen@titanium:~/Documents/src/c/$ nasm -v
NASM version 2.12 compiled on Mar  5 2016

The Nasm version in brew and Macports seems to be 2.12 at writing. With brew having a slightly minor update .01 so use which ever version your prefer. All the apple system calls are available here and you'll need to use the shift trick explained here.

The following code has been snipped from thexploit's article on the issues with assembly on OSX 10.7 and above. I just tested the helloworld on El-Capitain.

; OSX86 helloworld assembler for el capitain!
section .data
hello_world     db "Hello World!", 0x0a

section .text
global start

start:
mov rax, 0x2000004   ; System call write = 4
mov rdi, 1           ; Write to standard out = 1
mov rsi, hello_world ; The address of hello_world string
mov rdx, 14          ; The size to write
syscall              ; Invoke the kernel
mov rax, 0x2000001   ; System call number for exit = 1
mov rdi, 0           ; Exit success = 0
syscall              ; Invoke the kernel

Nothing much in this example. There are two system calls, the first '4' is a call to write, the second '1' is a call to exit. If you wonder why the 0x2000000 is added to each call, the expiation is that the newer darwin and bsd kernels left shift the system calls 24 times.

As you may know zeros or zero bytes cannot be used when producing shellcode. This is because the functions used to inject the shellcode are usually string functions. When such a function meet a 0x00 it terminates the string. In the case above it means that once the strcpy function reads the second byte of the 0x2000000 it terminates the string and the shellcode wont work as expected.

So to produce injectable shellcode you'll have to remove all the zeros from the code. This means that you could write the first part of the assembly like this:

global start
section .data
hello_world: db "Hello World!", 0x0a ; Defining the string to output
.len: equ $ - hello_world            ; Calculating the length of the helloworld                                                    ; string. 
                                     ; $ denotes the current address subtracting                                                    ; hello_world
                                     ; will produce the length of the                                                              ; hello_world+0x0a string.
section .text

start:
mov r8b, 0x02                        ; Unix system call class = 2
shl r8, 24                           ; Moving from 0x02 to 0x2000000 with left                                                      ; shift 24 times.
or r8, 0x04                          ; Adding the actual system call to the call                                                    ; class 0x2000000 to 0x2000004
                                     ; the 3 instructions above removes any zeros                                                  ; in the 0x200000 address and
                                     ; effectively places the 0x2000004 systemcall
                                     ; number in r8
mov rax, r8                          ; setting the system call 
mov rdi, 0x1                         ; write to std out
mov rsi, hello_world                 ; Add the address of the string to write
mov rdx, hello_world.len             ; Add the length of the string to write  
syscall                              ; using the calculated length from the .len                                                    ; definition above

sub r8, 0x3                          ; as r8 contain the latest syscall 0x2000004                                                  ; So, subtracting 3 will produce 
                                     ; 0x2000001, which is the syscall for exit
mov rax, r8                          ; setting the system call 
xor rdi, rdi                         ; zeroing out rdi for the return value of the                                                  ; exit system call
syscall

$ nasm -f macho64 helloworld_shell.s 
$ ld helloworld_shell.o
$ ./a.out 
Hello World!

Seems like the code runs smoothly. Lets find the opcodes to put in the test shellcode program

$ gobjdump -sd a.out

a.out:     file format mach-o-x86-64

Contents of section .text:
 1fd0 41b00249 c1e01849 83c8044c 89c0bf01  A..I...I...L....
 1fe0 00000048 be002000 00000000 00ba0d00  ...H.. .........
 1ff0 00000f05 4983e803 4c89c048 31ff0f05  ....I...L..H1...
Contents of section .data:
 2000 48656c6c 6f20576f 726c6421 0a        Hello World!.   
Contents of section LC_THREAD.x86_THREAD_STATE64.0:
 0000 00000000 00000000 00000000 00000000  ................
 0010 00000000 00000000 00000000 00000000  ................
 0020 00000000 00000000 00000000 00000000  ................
 0030 00000000 00000000 00000000 00000000  ................
 0040 00000000 00000000 00000000 00000000  ................
 0050 00000000 00000000 00000000 00000000  ................
 0060 00000000 00000000 00000000 00000000  ................
 0070 00000000 00000000 00000000 00000000  ................
 0080 d01f0000 00000000 00000000 00000000  ................
 0090 00000000 00000000 00000000 00000000  ................
 00a0 00000000 00000000                    ........        

Disassembly of section .text:

0000000000001fd0 <start>:
    1fd0: 41 b0 02             mov    $0x2,%r8b
    1fd3: 49 c1 e0 18          shl    $0x18,%r8
    1fd7: 49 83 c8 04          or     $0x4,%r8
    1fdb: 4c 89 c0             mov    %r8,%rax
    1fde: bf 01 00 00 00       mov    $0x1,%edi
    1fe3: 48 be 00 20 00 00 00 movabs $0x2000,%rsi
    1fea: 00 00 00 
    1fed: ba 0d 00 00 00       mov    $0xd,%edx
    1ff2: 0f 05                syscall 
    1ff4: 49 83 e8 03          sub    $0x3,%r8
    1ff8: 4c 89 c0             mov    %r8,%rax
    1ffb: 48 31 ff             xor    %rdi,%rdi
    1ffe: 0f 05                syscall 

The bold items shows that all zeros have been removed from the expected parts but yikes, it seem there still are several zero bytes present in the code. These lines are: 

1fde: bf 01 00 00 00       mov    $0x1,%edi
1fe3: 48 be 00 20 00 00 00 movabs $0x2000,%rsi
1fea: 00 00 00 
1fed: ba 0d 00 00 00       mov    $0xd,%edx

Well, what happens here? It seems like moving 0x1 into edi moved plenty of zero bytes as well. Why? Well these are padding bytes because the edi register is 32 bits. The assembler therefore adds the 'missing' zeros, lets remove these zeros. We already know the xor trick so to produce 0x01 we just have to increment edi.

xor edi,edi
inc edi 

This sets up edi with 0x01 (std out) for the write systemcall. The code above takes care of line 1fde. The next line to look at is 1fed, as this is a similar issue, except that we can here use the high and low parts of the register instead of incrementing. As the moved number is the length of a string and there is no point in incrementing 14 times.

xor edx,edx 
mov dl, hello_world.len

The xor zeros out the whole edx register. We need to do this before using the lowest byte accessor of the register. If not there may be other bits set some where in the register and these produce an incorrect number. The above takes care of the zeros for line 1fed. 

Lets now address the final and much worse line 1fe3. Here movabs sets the address of the hello_world string into the rsi register. Apparently helloworld is placed at the data address 0x2000.

Contents of section .data:
 2000 48656c6c 6f20576f 726c6421 0a        Hello World!. 

This means that we need to mangle the code a bit. There are two approaches for this. One is to set the text using the converted ascii characters by moving these to rsi. Another is to use the push-n-pop address trick. Lets try the push -n- pop method.

; Zero byte free hello world assembly
global start
section .data
hello_world: db "Hello World!", 0x0a ; Defining the string to output
.len: equ $ - hello_world            ; Calculating the length of the helloworld                                                    ; string. 
                                     ; $ denotes the current address subtracting                                                    ; hello_world
                                     ; will produce the length of the                                                              ; hello_world+0x0a string.
section .text

start:
mov r8b, 0x02                        ; Unix system call class = 2
shl r8, 24                           ; Moving from 0x02 to 0x2000000 with left                                                      ; shift 24 times.
or r8, 0x04                          ; Adding the actual system call to the call                                                    ; class 0x2000000 to 0x2000004
                                     ; the 3 instructions above removes any zeros                                                  ; in the 0x200000 address and
                                     ; effectively places the 0x2000004 systemcall                                                  ; number in r8

jmp in                               ; trick to push the string address into rdi,                                                  ; eliminating the 2000 zeros
out:
pop rdi                              ; get the address of the helloworld string
mov rsi, rdi                         ; put hello world to the correct register for                                                  ; the syscall
mov rax, r8                          ; setting the system call 
xor rdi,rdi                          ; zeroing out rdi for setting the std write                                                    ; handle
inc rdi                              ; applying the write handle using no zero                                                      ; bytes
xor edx,edx                          ; zeroing out edx before setting the length of                                                ; the string
mov dl, hello_world.len              ; Add the length of the string to write  
syscall                              ; using the calculated length from the .len                                                    ; definition above
sub r8, 0x3                          ; as r8 contain the latest syscall 0x2000004                                                  ; so subtracting 3 will produce                                                                ; 0x2000001, which is the syscall for exit
mov rax, r8                          ; setting the system call 
xor rdi, rdi                         ; zeroing out rdi for the return value of the                                                  ; exit system call
syscall

in:
call out                             ; Call out labes as a function forcing the                                                    ; string below 
db hello_world                       ; to be pushed into rdi

Notice the 3 changes to the final program. Using the xor methods to produce zeros in the correct registers. The `jmp, call-n-pop` method to get the address of a constant (string) pushed into rdi using the language constructs instead of static memory addresses. There you go, the hello world zero byte shellcode. Let's check to see if we have excluded all zeros

$ gobjdump -sd a.out
a.out:     file format mach-o-x86-64

Contents of section .text:
 1fc2 41b00249 c1e01849 83c804eb 1f5f4889  A..I...I....._H.
 1fd2 fe4c89c0 4831ff48 ffc731d2 b20d0f05  .L..H1.H..1.....
 1fe2 4983e803 4c89c048 31ff0f05 e8dcffff  I...L..H1.......
 1ff2 ff48656c 6c6f2057 6f726c64 210a      .Hello World!.  
Contents of section .data:
 2000 48656c6c 6f20576f 726c6421 0a        Hello World!.   
Contents of section LC_THREAD.x86_THREAD_STATE64.0:
 0000 00000000 00000000 00000000 00000000  ................
 0010 00000000 00000000 00000000 00000000  ................
 0020 00000000 00000000 00000000 00000000  ................
 0030 00000000 00000000 00000000 00000000  ................
 0040 00000000 00000000 00000000 00000000  ................
 0050 00000000 00000000 00000000 00000000  ................
 0060 00000000 00000000 00000000 00000000  ................
 0070 00000000 00000000 00000000 00000000  ................
 0080 c21f0000 00000000 00000000 00000000  ................
 0090 00000000 00000000 00000000 00000000  ................
 00a0 00000000 00000000                    ........        

Disassembly of section .text:

0000000000001fc2 <start>:
    1fc2: 41 b0 02             mov    $0x2,%r8b
    1fc5: 49 c1 e0 18          shl    $0x18,%r8
    1fc9: 49 83 c8 04          or     $0x4,%r8
    1fcd: eb 1f                jmp    1fee <in>

0000000000001fcf <out>:
    1fcf: 5f                   pop    %rdi
    1fd0: 48 89 fe             mov    %rdi,%rsi
    1fd3: 4c 89 c0             mov    %r8,%rax
    1fd6: 48 31 ff             xor    %rdi,%rdi
    1fd9: 48 ff c7             inc    %rdi
    1fdc: 31 d2                xor    %edx,%edx
    1fde: b2 0d                mov    $0xd,%dl
    1fe0: 0f 05                syscall 
    1fe2: 49 83 e8 03          sub    $0x3,%r8
    1fe6: 4c 89 c0             mov    %r8,%rax
    1fe9: 48 31 ff             xor    %rdi,%rdi
    1fec: 0f 05                syscall 

0000000000001fee <in>:
    1fee: e8 dc ff ff ff       callq  1fcf <out>
    1ff3: 48                   rex.W
    1ff4: 65 6c                gs insb (%dx),%es:(%rdi)
    1ff6: 6c                   insb   (%dx),%es:(%rdi)
    1ff7: 6f                   outsl  %ds:(%rsi),(%dx)
    1ff8: 20 57 6f             and    %dl,0x6f(%rdi)
    1ffb: 72 6c                jb     2069 <hello_world+0x69>
    1ffd: 64 21 0a             and    %ecx,%fs:(%rdx)

It sure seems like this is the case. To extract the opcodes from the assembly dump we ca use a tool very similar to objdump. This is otool

otool -t a.out
a.out:
(__TEXT,__text) section
0000000000001fc2 41 b0 02 49 c1 e0 18 49 83 c8 04 eb 1f 5f 48 89 
0000000000001fd2 fe 4c 89 c0 48 31 ff 48 ff c7 31 d2 b2 0d 0f 05 
0000000000001fe2 49 83 e8 03 4c 89 c0 48 31 ff 0f 05 e8 dc ff ff 
0000000000001ff2 ff 48 65 6c 6c 6f 20 57 6f 72 6c 64 21 0a

Notice it dumps only the opcodes in this case. It is possible to get otool to show the disassembly as well. But right here we are interested in the raw machine bytes only.

$ otool -t a.out | cut -d ' ' -f 2- | grep ' ' 
41 b0 02 49 c1 e0 18 49 83 c8 04 eb 1f 5f 48 89 
fe 4c 89 c0 48 31 ff 48 ff c7 31 d2 b2 0d 0f 05 
49 83 e8 03 4c 89 c0 48 31 ff 0f 05 e8 dc ff ff 
ff 48 65 6c 6c 6f 20 57 6f 72 6c 64 21 0a

Produces only the machine code bytes. You can count the number of bytes using `wc -w` 

$ otool -t a.out | cut -d ' ' -f 2- | grep ' ' | wc -w
      62

62 bytes hello world shell code. We better run it to check if it works

$ nasm -f macho64 helloworld_zero.s 
$ ld helloworld_zero.o 
$ ./a.out 
Hello World!
$ otool -tv a.out 
a.out:
(__TEXT,__text) section
start:
0000000000001fc2 movb $0x2, %r8b
0000000000001fc5 shlq $0x18, %r8
0000000000001fc9 orq $0x4, %r8
0000000000001fcd jmp 0x1fee
out:
0000000000001fcf popq %rdi
0000000000001fd0 movq %rdi, %rsi
0000000000001fd3 movq %r8, %rax
0000000000001fd6 xorq %rdi, %rdi
0000000000001fd9 incq %rdi
0000000000001fdc xorl %edx, %edx
0000000000001fde movb $0xd, %dl
0000000000001fe0 syscall
0000000000001fe2 subq $0x3, %r8
0000000000001fe6 movq %r8, %rax
0000000000001fe9 xorq %rdi, %rdi
0000000000001fec syscall
in:
0000000000001fee callq 0x1fcf
0000000000001ff3 gs
0000000000001ff5 insb %dx, %es:(%rdi)
0000000000001ff6 insb %dx, %es:(%rdi)
0000000000001ff7 outsl (%rsi), %dx
0000000000001ff8 andb %dl, 0x6f(%rdi)
0000000000001ffb jb 0x2069
0000000000001ffd andl %ecx, %fs:(%rdx)

Seems to do the job just fine, and the executable is the program we expected. To extract the machine code using command line tools, you can experiment with this expression:

$ echo "char shellcode[] = $(otool -t a.out | cut -d ' ' -f 2- | grep ' ' | sed -e "s|[ ]|\\\x|g" -e "s|^|\"\\\x|g" -e "s|[\\]x$|\"|g")"
char shellcode[] = "\x41\xb0\x02\x49\xc1\xe0\x18\x49\x83\xc8\x04\xeb\x1f\x5f\x48\x89"
"\xfe\x4c\x89\xc0\x48\x31\xff\x48\xff\xc7\x31\xd2\xb2\x0d\x0f\x05"
"\x49\x83\xe8\x03\x4c\x89\xc0\x48\x31\xff\x0f\x05\xe8\xdc\xff\xff"
"\xff\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21\x0a"

It makes it a bit easier to copy and paste the shell code into your test C program. The command is just using sed format C character array syntax. Now you can easily copy & paste the shellcode into your test program. Or use even more sed magic to replace your current test shell code ;) The last part i leave to you.... Cheers...

References:-

No comments: