쿠...sal: hack

레이블이 hack인 게시물을 표시합니다. 모든 게시물 표시

[컴][안드로이드] .Smali 관련 도움되는 사이트들

smali 관련 도움이 될 사이트

출처 : ref. 1

Dalvik opcodes
Understanding the Dalvik bytecode with the Dedexer tool, 2009, 12월 2일
Dalvik Notes,
JASMIN USER GUIDE, 1996. 7월.
다른 disassembler, Dedexer : http://dedexer.sourceforge.net/

java code 를 smali 로 변환시켜 주는 Intellij plugin

ollide/intellij-java2smali

References

What's the best way to learn Smali (and how/when to use Dalvik VM opcodes)?, 2011. 3월 11일

[컴][디버그] windows 에서 process 에 debugging 을 위해 attach 하기 - WaitForDebugEvent

attach to the process

OpenProcess 로 handle 을 가져오고, DebugActiveProcess(pid) 로 process 에 attach 한다.
WaitForDebugEvent() 로 break 가 걸리면, process 가 멈추고, 이 함수에 lpDebugEvent 로 debug event 를 넘겨준다.
DebugActiveProcessStop() 은 detach 를 할 때 쓰인다.

아래는 PyDbg 를 이용한 source code 이다. 이미 실행되어진 process 의 pid 를 task-manager 로 알아낸 후에 입력을 해주면, 그 process 로 attach 를 하는 코드이다.

if kernel32.DebugActiveProcess(pid):
    self.debugger_active = True
    self.pid = int(pid)
    self.run() 
    
    while self.debugger_active == True:
        
        debug_event = DEBUG_EVENT()
        continue_status = DBG_CONTINUE

        if kernel32.WaitForDebugEvent(byref(debug_event), INFINTE):
            raw_input("press a key to continue...")
            self.debugger_active = False
            kernel32.ContinuewDebugEvent(\
                debug_event.dwProcessId,\
                debug_event.dwThreadId,\
                continue_status)

                
if kernel32.DebugActiveProcessStop(self.pid):
    print "[*] Finished debugging. Exiting..."
    return True
else:
    print "There was an error"
    return False

HANDLE WINAPI OpenProcess(
  _In_  DWORD dwDesiredAccess,
  _In_  BOOL bInheritHandle,
  _In_  DWORD dwProcessId
);

BOOL WINAPI DebugActiveProcess(
  _In_  DWORD dwProcessId
);

BOOL WINAPI WaitForDebugEvent(
  _Out_  LPDEBUG_EVENT lpDebugEvent,
  _In_   DWORD dwMilliseconds // INFINITE or maximum wait time
);

BOOL WINAPI ContinueDebugEvent(
  _In_  DWORD dwProcessId,
  _In_  DWORD dwThreadId,
  _In_  DWORD dwContinueStatus
);

BOOL WINAPI DebugActiveProcessStop(
  _In_  DWORD dwProcessId
);

DEBUG_EVENT structure

typedef struct _DEBUG_EVENT {
  DWORD dwDebugEventCode;
  DWORD dwProcessId;
  DWORD dwThreadId;
  union {
    EXCEPTION_DEBUG_INFO      Exception;
    CREATE_THREAD_DEBUG_INFO  CreateThread;
    CREATE_PROCESS_DEBUG_INFO CreateProcessInfo;
    EXIT_THREAD_DEBUG_INFO    ExitThread;
    EXIT_PROCESS_DEBUG_INFO   ExitProcess;
    LOAD_DLL_DEBUG_INFO       LoadDll;
    UNLOAD_DLL_DEBUG_INFO     UnloadDll;
    OUTPUT_DEBUG_STRING_INFO  DebugString;
    RIP_INFO                  RipInfo;
  } u;
} DEBUG_EVENT, *LPDEBUG_EVENT;

dwDebugEventCode : 어떤 종류의 event 로 인해 멈추게 되었는지 알려준다.

그리고 이 dwDebugEventCode 가 갖는 값에 따라 union 의 값도 결정된다. 예를 들면, dwDebugEventCode 가

EXCEPTION_DEBUG_EVENT (0x1) -> u.Exception, EXCEPTION_DEBUG_INFO structure.
CREATE_THREAD_DEBUG_EVENT (0x2) -> u.CreateThread, CREATE_THREAD_DEBUG_INFO structure.
…

이런 식이 된다.

Test

아래 debuggee_process.py 를 실행한 상태에서 debugger.py 를 실행해서 debuggee process 에 attach 를 하게 되면, system 에서 debug event 를 보내주게 된다. 이 debug event 들을 debugger 에서는 보여주게 되어 있고, 이중에 printf 함수가 실행될 때 printf 함수의 주소를 출력해 준다.

LOAD_DLL_DEBUG_EVENT

처음에 attach를 하게 되면 LOAD_DLL_DEBUG_EVENT 가 화면에 찍히게 된다. 이 이벤트가 이미 dll 이 load 가 끝난상태인데 왜 찍힐 까 생각했는데, ref. 1 에서 아래처럼 얘기하고 있다.

이 event 는 debuggee process(디버그 하고 있는 process) 가 LoadLibrary() 를 실행할 때에 발생하기도 하지만, PE loader 가 dll library 의 link 를 파악할 때도 발생한다.

그래서 아마도 attach 할 때 dll library 의 link 를 파악하는 과정이 있어서 event 가 발생하는 듯 하다.

Debug event 에 대한 설명은 ref. 1 을 참조하도록 하자.

debugger.py

#source from Gray Hat Python
#

memory_breakpoints = {}
kernel32 = windll.kernel32
pid = raw_input("Enter the PID of the process to attach to: ")

# attach
kernel32.DebugActiveProcess(int(pid))

# func_resolve
dll = "msvcrt.dll"
function = "printf"

handle  = kernel32.GetModuleHandleA(dll)
address = kernel32.GetProcAddress(handle, function)

kernel32.CloseHandle(handle)



# bp_set_mem
mbi = MEMORY_BASIC_INFORMATION()

size = 10
memory_breakpoints[address] = (address, size, mbi)



# run
debug_event    = DEBUG_EVENT()
continue_status = DBG_CONTINUE

while True :

    if kernel32.WaitForDebugEvent(byref(debug_event),100):
        # grab various information with regards to the current exception.
        thread_id = debug_event.dwThreadId
        h_thread = kernel32.OpenThread(THREAD_ALL_ACCESS, None, thread_id)
        
        context = CONTEXT()
        context.ContextFlags = CONTEXT_FULL | CONTEXT_DEBUG_REGISTERS
                        
        context = kernel32.GetThreadContext(h_thread, byref(context))

        
        
                   
        print "Event Code: %d Thread ID: %d" % \
            (debug_event.dwDebugEventCode,debug_event.dwThreadId)
        
        if debug_event.dwDebugEventCode == EXCEPTION_DEBUG_EVENT:
            exception = debug_event.u.Exception.ExceptionRecord.ExceptionCode
            exception_address = debug_event.u.Exception.ExceptionRecord.ExceptionAddress
            
            # call the internal handler for the exception event that just occured.
            if exception == EXCEPTION_ACCESS_VIOLATION:
                print "Access Violation Detected."

            elif exception == EXCEPTION_BREAKPOINT:
                print "[*] Exception address: 0x%08x" % exception_address

                # check if the breakpoint is one that we set
                if not memory_breakpoints.has_key(exception_address):
                    continue_status = DBG_CONTINUE

            elif exception == EXCEPTION_GUARD_PAGE:
                print "Guard Page Access Detected."

            elif exception == EXCEPTION_SINGLE_STEP:
                exception_handler_single_step()
            
        kernel32.ContinueDebugEvent(debug_event.dwProcessId,\
                                 debug_event.dwThreadId, continue_status)

debugee_process.py

from ctypes import *
import time

msvcrt = cdll.msvcrt
counter = 0

while 1:
    msvcrt.printf("Loop iteration %d!\n",counter)
    time.sleep(2)
    counter += 1

INT3 의 동작

INT3 interrupt 가 걸려서 WaitForDebugEvent() 에 control 을 넘겨주는 과정은 아래 경로를 참고하면 조금 이해가 될 것이다.

http://i5on9i.blogspot.kr/2013/04/int-3.html

References

[컴][리눅스] 간단한 hooking - wrapper function 사용하는 법

다이나믹 링커 를 이용한 후킹 / hooking with dynamic linker /

linux 에서 hooking 을 해보자. 여기서 하는 내용은 application 에 호출하는 libc 같은 shared library 의 함수 대신에 자신의 wrapper function 을 사용하는 방법을 설명한다. 여기서 replace 하는 function 은 application 에서 호출하는 shared library 까지 이다. shared library 에서 호출하는 function 을 다른 function 으로 replace 할 수 있는 방법은 아니다

Overview

개략적인 내용을 얘기하자면,

user program >> library >> system call

의 루틴으로 프로그램이 실행되는데, hooking 을 하는 것은 이 루틴을 아래처럼 바꾸는 것이다.[ref. 1, figure 2]

user program >> hook >> library >> system call

이것은 dynamic linker 가 executable(실행파일) 이 실행된 이후에 symbol 을 resolve 한다는 점을 이용하는 것이다. 그러니까, 원래 undefined symbol 을 실행파일이 가지고 있는데, 이 녀석을 resolve 하면 보통 우리가 흔히 쓰는 shared library 의 함수를 불러오게 되는데, 이 때 우리가 만든 shared library 를 먼저 검색하게 해서 우리의 함수를 호출하게 하는 것이다.

dynamic library 들을 이용하는 program을 컴파일 하면 binary 에 아래 2개의 list가 포함된다.

이 program 이 사용하는 library 의 list
undefined symbols 의 list

dynamic linker 가 단순히 그 library 들을 뒤지면서 그 symbol 을 가지고 있는 첫 library 를 이용하게 된다.[ref. 6]

그렇기 때문에 우리가 만약 우리의 wrapper function들을 가지고 있는 library 를 program이 호출하게 한다면, 프로그램의 undefined symbol 들은 우리의 wrapper function 으로 해석될 것이다.

아쉽게도, 이 방법으로는 internal library(libc 같은 library) 를 interpose 할 수 없다. 왜냐하면, internal library 의 함수는 runtime 이전에 symbol 이 resolved 되기 때문이다.[ref. 1]

구현과 관련해서는 ref. 1을 참고하는 것이 좀 더 쉬울 것이다.

Test program

이제, 간단한 malloc 을 사용하는 program 하나를 만들어 보자.

#include <malloc.h>
#include <stdlib.h>

int main(void)
{
    int *p = (int *)malloc(10);

    free(p); 
}

이 녀석의 compile 은 그냥 기존의 program 처럼 해 주면 된다.

$> gcc main.c -o app

이제 app 이 호출하는 malloc 이 우리가 만든 malloc 이 되도록 해보자.

LD_PRELOAD

프로그램이 실행되면 library 를 load 하는 것은 loader(여기서는 dynamic linker 가 될 것이다.) 가 담당하게 된다. 그러면 loader 가 어떻게 우리의 library 를 load 하게 만들 것인가? 여기에 사용되는 것이 LD_PRELOAD 라는 변수이다. LD_PRELOAD 에 정의된 값을 loader 가 가장 먼저 load 하게 되어 있다.

그런데 LD_PRELOAD 는 SUID permission bit 이 set 되어 있으면 무시된다. 왜냐하면, 이 방법으로 어떤 일을 할 지 모르기 때문에 보안상의 이유로 다른 user 나 group 이 이 방법을 사용하지 못하게 막는 것이다.

만약 app 에서 malloc() 을 호출한다고 하자. 보통 이 함수는 libc 에서 호출한다. 근데 우리가 이 malloc 에 대한 wrapper function 을 만들어서 우리의 malloc wrapper 를 기존의 malloc 대신에 호출하게 하고 싶다고 하자.

일단 우리의 malloc wrapper 를 만드는 것은 나중에 설명하고, malloc wrapper 를 가지고 있는 library 가 libmine.so 라고 하자. 이 경우에 아래처럼 실행하면 기존의 malloc 대신에 우리의 malloc wrapper 가 수행된다.

$> LD_PRELOAD=/home/libmine.so ./foo

만약 library 를 2개 이상 설정하고 싶다면, 아래처럼 실행하면 된다.

$> LD_PRELOAD=/home/libmine.so;/home/libmin2.so ./app

원래 함수 호출하는 법

wrapper function 의 original function 을 쓰려고 한다면 어떻게 해야 할까? 다시 말하면, malloc 의 wrapper 를 만들었는데, 내가 만든 wrapper 내부에서 system 의 malloc 을 호출(libc 의 malloc ) 하고 싶다면 어떻게 해야 할까?

원래 우리의 wrapper function 에서 바로 libc 의 malloc 을 부를 수는 없다. 왜냐하면 compiler 가 malloc 을 내 자신을 호출하는 줄 알기 때문이다.(재귀함수를 얘기하는 것이다.) 정확히 이야기 한다면, library 를 iterate 하면서 libmin.so 에서 malloc 을 찾아버리기 때문에, 더 이상 search 를 하지 않을 것이다.

우리는 그래서 다른 방법을 사용해야 한다. 이 방법은 처음 찾은 malloc 함수의 주소가 아니라, 2번째 찾은 malloc 함수의 주소를 가져와서 우리가 그 주소로 malloc 을 호출하는 것이다. 이 때 사용하는 것이 dlsym() 이다.

dlsym(RTLD_NEXT, "malloc");

dlsym()

dlsym : dynamic linker symbol lookup function

return : symbol(심볼) 의 address 를 return 해준다.

RTLD_NEXT option 에 의해, 이러면 malloc 을 제공하는 library 중에 2번째 library 를 택하게 된다.

RTLD_NEXT 는 GNU 에서 제공하는 녀석이어서

#define _GNU_SOURCE

가 필요하다.[ref. 2]

그러면 이제 wrapper 함수를 가지고 있는 libmine.so 를 만들어 보자.

libmine.so 만들기

#include <dlfcn.h>
void* malloc(size_t size){
 ...
 static void* (*my_malloc)(size_t) = NULL;
 ...
 my_malloc = dlsym(RTLD_NEXT, "malloc");
 ...
}

자세한 코드는 ref. 2 에 가면 볼 수 있다. 여기서는 대략적인 설명만 하도록 하자.

이 code 를 shared object 로 만들기 위해 ref.2 에서는 아래처럼 compile 을 하면 된다.

gcc -shared -fPIC -ldl libmine.c -o libmine.so

-shared -fPIC -l 옵션에 관련해서는 ref. 3 을 참고 하자.

근데 이 상황에서는

symbol lookup error : … undefined symbol: dlsym

이 발생한다.

그런데, ref. 4 에 따르면 Ubuntu 11.10 부터 ld 의 기본적인 동작이 바뀌었다고 한다. 그래서 command 를 아래와 같이 주어야 한다.

gcc -shared -Wl,--no-as-needed -ldl -fPIC libmine.c -o libmine.so

-Wl,--no-as-needed 는 linker 에 --no-as-needed 옵션을 주는 gcc 옵션이다.[ref.3]

--no-as-needed

--no-as-needed 는 --as-needed 를 원상복귀 시키는 option 인데, --as-needed 동작은 아래와 같다.

--as-needed 뒤에 dynamic library 이름을 적게 되는데, 이 command line 에 적힌 dynamic liabrary 들에 대한 ELF DT_NEEDED tag 들에 영향을 준다.[ref. 5]

linker 는 일반적으로 library 가 실제로 필요한지 여부를 떠나서 command line 에 적혀있는 dynamic library 에 대한 DT_NEEDED tag 를 더하게 된다.[ref. 5]

근데 --as-needed 를 설정하면 동작이 바뀐다. "일반적인 object file 에 있는 undefined symbol reference" 또는 "다른 dynamic library 에 있는 undefined symbol reference" 를 만족하는 library 에 대해서만 DT_NEEDED tag 가 방출된다.[ref. 5]

다시 말하면, 실제로 필요한 라이브러리들이 무엇인지 파악해서 그녀석들만 DT_NEEDED 에 넣는다. -lm option등을 이용해서 추가로 library 를 넣어도 실제로 쓰이지 않으면 DT_NEEDED 에 넣지 않는다.[ref. 9]

DT_NEEDED tag 는 DT_NEEDED 영역에 쓰여있는 entry 라고 보면 될 듯 하다. DT_NEEDED tag 에 dynamic linker 가 사용할 shared object(shared library 같은) 를 정의해 놓는다.[ref. 8]

DT_NEEDED

linux 에서 dynamic linking 을 지원하기 위해 ELF 에 DT_NEEDED section 을 둔다. 그래서 executable 이 실행될 때 비로서 dynamic linker 가 DT_NEEDED 에서 필요한 library 들을 보고 load 하고 이 library 를 뒤지면서, undefined symbol 을 resolve 하게 된다. 이런 작업은 어느 symbol 이 어디에 위치하고 있는지를 알려주지 않는다. 그래서 필요한 symbol 을 찾기 위해 전부 뒤져봐야 하기 때문에 시간소모가 많다.[ref. 6, 7]

그래서 이 문제에 대한 해결책으로 Solaris 에서 Dynamic binding 을 제공한다. Dynamic binding 에서는 ELF 내에 section 하나를 만들고, 여기에 "pointer 들의 list"를 저장해 놓는다. 이 list 의 pointer 하나가 DT_NEEDED entry 를 가리키게 된다. 각각의 pointer 는 object 의 symbol 에 대응된다. 그렇게 symbol 과 DT_NEEDED entry 사이에 관계를 만들게 된다.[ref. 6]

References

[컴][디버그] windows 에서 thread 의 context 를 가져오는 법 - snapshot

디버깅을 위한 winapi / windows 에서 thread context 얻기 / thread register 가져오기 / register 값 얻어오기 / how to retrieve thread context / winapi for debugging

소스 설명

process id 를 가지고 snapshot 을 얻는다.
snapshot 에서 thread id 를 얻는다.
thread id 를 이용해서 handle 을 얻는다.
이 handle 을 이용해서 context 를 얻는다.

source from : Gray hat python
edited by namh

thread_entry = THREADENTRY32()
thread_list = []

# http://msdn.microsoft.com/en-us/library/windows/desktop/ms682489(v=vs.85).aspx
snapshot = kernel32.CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, self.pid)
if snapshot is not None:
    thread_entry.dwSize = sizeof(thread_entry)
    success = kernel32.Thread32First(snapshot, byref(thread_entry))

    while success:

        if thread_entry.th32OwnerProcessID == self.pid:
            thread_id = thread_entry.th32ThreadID
            thread_list.append(thread_entry.th32ThreadID)
            
            # Get a thread context
            context = CONTEXT()
            context.ContextFlags = CONTEXT_FULL | CONTEXT_DEBUG_REGISTERS


            h_thread = kernel32.OpenThread(THREAD_ALL_ACCESS, None, thread_id)
            if kernel32.GetThreadContext(h_thread, byref(context)):
                kernel32.CloseHandle(h_thread)
                return context
            else:
                return False
        success = kernel32.Thread32Next(snapshot, byref(thread_entry))
        

    kernel32.CloseHandle(snapshot)
    return thread_list
else:
    return False

Process

CreateProcess()

BOOL WINAPI CreateProcess(
  _In_opt_     LPCTSTR lpApplicationName,
  _Inout_opt_  LPTSTR lpCommandLine,
  _In_opt_     LPSECURITY_ATTRIBUTES lpProcessAttributes,
  _In_opt_     LPSECURITY_ATTRIBUTES lpThreadAttributes,
  _In_         BOOL bInheritHandles,
  _In_         DWORD dwCreationFlags,
  _In_opt_     LPVOID lpEnvironment,
  _In_opt_     LPCTSTR lpCurrentDirectory,
  _In_         LPSTARTUPINFO lpStartupInfo,
  _Out_        LPPROCESS_INFORMATION lpProcessInformation
);

을 입힌 parameter 가 우리가 관심을 가져야 할 녀석들이다.

lpApplicationName : 실행할 application, 즉, 실행될 녀석의 path
lpCommandLine : 실행할 command line
dwCreateionFlags : 이 flag 로 priority class 와 process 의 생성을 control 할 수 있다.
이 flag로 실행할 process 를 debugged process 로 설정할 수 있다.(DEBUG_PROCESS 0x00000001)

OpenProcess

HANDLE WINAPI OpenProcess(
  _In_  DWORD dwDesiredAccess,
  _In_  BOOL bInheritHandle,
  _In_  DWORD dwProcessId
);

OpenProcess 는 이미 존재하고 있는 process 에 특정권한으로 접근하겠다고 요청하면 그에 해당하는 handle 를 돌려준다.

dwDesiredAccess : 열려고 하는 process 에 대해서 얻고 싶은 권한을 명시해 주면 된다.
debugging 을 위해서는 PROCESS_ALL_ACCESS 로 설정해야 한다.
dwProcessId : 우리가 handle을 얻고 싶어하는 process 의 PID

handle 은 WinNT.h 에 아래와 같이 정의되어 있다.[ref. 3]

typedef PVOID HANDLE;

thread

Traversing the Thread List 을 보면 아래 함수들을 어떻게 사용하는지에 대해 이해가 쉬울 것이다.

ThreadCotext 얻기

thread_id 얻기 : thread 을 traversing 하면서 thread_id 를 얻을 수 있다.
thread_handle 얻기 : 여기서 얻은 thread_id 를 가지고 OpenThread 를 하면, thread 의 handle 을 얻을 수 있다.
context 얻기 : 이 handle 을 가지고 GetThreadContext() 를 이용해서 Thread 와 관련된 값들(register 값들) 을 얻을 수 있다.

OpenThread

HANDLE WINAPI OpenThread(
  _In_  DWORD dwDesiredAccess,
  _In_  BOOL bInheritHandle,
  _In_  DWORD dwThreadId
);

CreateToolhelp32Snapshot function

HANDLE WINAPI CreateToolhelp32Snapshot(
  _In_  DWORD dwFlags,
  _In_  DWORD th32ProcessID
);

dwFlags 를 TH32CS_SNAPTHREAD(0x00000004) 로 하면 등록된 thread 의 snapshot 의 handle을 얻게 된다. 이 handle 이 Thread32First() 에 parameter 로 쓰인다.

아래 flag 에서만 th32ProcessID 가 사용된다. th32ProcessID 는 thread 를 가지고 있는 process 의 ID 를 적어준다.

TH32CS_SNAPMODULE
TH32CS_SNAPMODULE32
TH32CS_SNAPHEAPLIST
TH32CS_SNAPALL

Thread32First()

BOOL WINAPI Thread32First(
  _In_     HANDLE hSnapshot,
  _Inout_  LPTHREADENTRY32 lpte
);

thread 를 열거할 때 쓰이는 함수이다.(enumerate)
lpte 는 함수가 성공적으로 수행되면 값이 할당된다.

THREADENTRY32

typedef struct tagTHREADENTRY32 {
  DWORD dwSize;
  DWORD cntUsage;
  DWORD th32ThreadID;
  DWORD th32OwnerProcessID;
  LONG  tpBasePri;
  LONG  tpDeltaPri;
  DWORD dwFlags;
} THREADENTRY32, *PTHREADENTRY32;

parameters

dwSize : Thread32First() 를 수행하기 전에 sizeof(THREADENTRY32) 로 초기화 해야 한다.
th32ThreadID : thread 의 ID OpenThread() 로 얻은 thread 의 ID 를 사용하면 된다.
th32OwnerProcessID : thread 가 있는 process 의 ID

GetThreadContext

BOOL WINAPI GetThreadContext(
  _In_     HANDLE hThread,
  _Inout_  LPCONTEXT lpContext
);

SetThreadContext

BOOL WINAPI SetThreadContext(
  _In_  HANDLE hThread,
  _In_  const CONTEXT *lpContext
);

Context structure

typedef struct _CONTEXT {

//
// The flags values within this flag control the contents of
// a CONTEXT record.
//
// If the context record is used as an input parameter, then
// for each portion of the context record controlled by a flag
// whose value is set, it is assumed that that portion of the
// context record contains valid context. If the context record
// is being used to modify a threads context, then only that
// portion of the threads context will be modified.
//
// If the context record is used as an IN OUT parameter to capture
// the context of a thread, then only those portions of the thread's
// context corresponding to set flags will be returned.
//
// The context record is never used as an OUT only parameter.
//
DWORD ContextFlags;

//
// This section is specified/returned if CONTEXT_DEBUG_REGISTERS is
// set in ContextFlags. Note that CONTEXT_DEBUG_REGISTERS is NOT
// included in CONTEXT_FULL.
//
DWORD Dr0;
DWORD Dr1;
DWORD Dr2;
DWORD Dr3;
DWORD Dr6;
DWORD Dr7;

//
// This section is specified/returned if the
// ContextFlags word contians the flag CONTEXT_FLOATING_POINT.
//
FLOATING_SAVE_AREA FloatSave;

//
// This section is specified/returned if the
// ContextFlags word contians the flag CONTEXT_SEGMENTS.
//
DWORD SegGs;
DWORD SegFs;
DWORD SegEs;
DWORD SegDs;

//
// This section is specified/returned if the
// ContextFlags word contians the flag CONTEXT_INTEGER.
//
DWORD Edi;
DWORD Esi;
DWORD Ebx;
DWORD Edx;
DWORD Ecx;
DWORD Eax;

//
// This section is specified/returned if the
// ContextFlags word contians the flag CONTEXT_CONTROL.
//
DWORD Ebp;
DWORD Eip;
DWORD SegCs; // MUST BE SANITIZED
DWORD EFlags; // MUST BE SANITIZED
DWORD Esp;
DWORD SegSs;

//
// This section is specified/returned if the ContextFlags word
// contains the flag CONTEXT_EXTENDED_REGISTERS.
// The format and contexts are processor specific
//
BYTE ExtendedRegisters[MAXIMUM_SUPPORTED_EXTENSION];

} CONTEXT;

References

Windows Data Types, MSDN
OpenThread function, MSDN

[컴][디버그] INT 3 의 동작분석

int3 in linux / how to work INT3 interrupt / 인터럽트 핸들러

여기서는 INT3 interrupt 가 발생했을 때, 실제 어떻게 동작하는지 대략적으로 살펴보자.

INT3 이 발생하면, 정해진 handler (interrupt handler) 가 수행된다.

어떤 경로를 통해 interrupt handler 가 호출되는 지를 확인해 보도록 하자. os 부분은 source code 를 확인할 수 있는 linux 를 택했다.

<그림. interrupt call> 을 먼저 확인하자. interrupt call 의 그림을 중심으로 이야기를 풀어나가 보자.

전체적인 흐름은 밑에 "Linux 에서 INT3 동작 정리" 를 확인하자.

interrupt 동작방식

from operating-system concepts 10th edition

I/O 작업이 실행되기 위해서, device driver 가 적절한 register 들을 device controller 에 load 해야 한다.
그러면, device controller 는 이 registers 의 내용을 확인해서 어떤 동작을 할지 정한다.
controller 는 device 에서부터 local buffer 로 data 전송을 시작한다.
data 전송이 끝나면, device controller 가 device driver 에게 작업이 끝났다고 알려준다.
그리고 나서 device driver 가 os 의 다른 부분으로 control 을 넘겨준다.
이때 만약 read i/o 작업이라면 data 의 pointer 를 넘기는 식으로 결과 data 를 주기도 한다.
다른 작업에서는 status 값을 준다.
controller 가 작업이 끝난것을 device driver 에게 알려줄때 interrupt 를 사용하게 된다.

cpu 는 interrupt-request line 이라는 wire 를 갖는다. 그래서 cpu 가 매 instruction 을 수행할 때마다, 그것을 sense 한다.
그래서 만약 controller 가 signal 을 interrupt-request line 에 보냈다면,
cpu 가 그것을 인지하고, interrupt number 를 읽고, interrupt-handler routine 으로 jump 한다. 이때 이 interrupt number 가 interrupt vector 의 index 가 된다.
interrupt handler 가 작업중에 변경이 돼야 하는 state 를 저장하고,
interrupt 의 원인을 파악, 필요한 작업을 하고, 끝난후 state 를 돌려놓는다.
그리고 return_from_interrupt instruction 을 수행해서 CPU 를 interrupt 일어나기 이전의 execution state 로 돌려놓는다.

IDT

IDT 가 무엇인지 알아보자.

interrupt vector

interrupt vector 는 intrrupt handler 의 memory 주소이다.[ref. 4]
그리고 이 interrupt vector 를 여러 개 모아놓은 table 이 intrrupt vector table 이다.

IDT

Interrupt Descriptor Table(IDT) 는 interrupt vector table 을 구현해 놓은 것으로 x86 architecture 에서 쓰인다. 이 녀석은 cpu(processor) 가 intterupt 나 exception 발생 시에 쓰게 된다. IDT 는 총 256개의 interrupt vector 로 되어 있다.

exception 의 종류

exception 는 2개의 category 로 나눌 수 있다.

Hardware generated exceptions : Processor 가 만들어 내는 것(Faults, Traps, Aborts)
Software generated exceptions : int, int3 같은 programmed exceptions

IDT 에서 위치에 따른 interrupt 종류

IDT 는 아래 3가지 경우에 쓰인다.

software inturrupt (software exceptions)
hardware inturrupt ( hardware exceptions)
processor exceptions

첫 32개는 processor exception 을 위해 예약되어 있다.

0~31 : hardware generated exceptions[ref. 3]
32~47 : maskable interrupts(such as, IRQs)
48~255 : software interrupts

IDT 는 이름에서 알 수 있듯이 table 이다. 그럼 이 table 의 하나의 entry 는 어떻게 구성되어 있을까. 이 entry 들을 intel 에서는 gate 라고 하는 듯 하다. 이 gate 에서 알아보자.

intel gates

intel CPU 에서 제공하는 mode

real mode
protected mode

OS 들은 protected mode 를 사용해서 user process 가 critical register 에 접근하는 것을 막는다.

4가지 previliege level

ring0 : kernel 은 이 level 에서 실행된다. kernel 은 그래서 cpu 의 모든 register, 모든 hardware 와 memory 에 접근 가능하다.
ring1
ring2
ring3 : 보통 user application 이 실행되는 level

현재 수행되고 있는 program 의 privilege level 은 CPL register(Current Privilege Level Register)에 저장된다.

gates

protected mode 에서 IDT 는 8-byte의 descriptor 의 배열로 되어 있다. 이 8-byte descriptor 가 IDT 의 entry 가 된다.
이 descriptor 들은 셋 중에 하나다.

interrupt gates
trap gates
task gates

gate 는 그냥 intel 에서 정의한 struct 의 하나인데, 호출할 procedure 의 주소, privilege level 에 대한 정보와 같은 것들을 가지고 있다고 한다.[ref. 5]
1, 2 는 code 가 있는 memory loaction 을 가리킨다.
이 둘의 차이는

interrupt gates 는 hardware interrupt 를 위해 만들어진 녀석이라 interrupt 하는 것 이외에 다른 일은 불가능하다.
trap gates 는 software interrupt 나 exceptions 을 처리하는 데에 쓰인다.

는 것이다.

task gates 는 현재 task-stae 가 active 인 segment 를 switch 가 되게 만든다.

이렇게 task가 switch 를 할 때 hardware task switching mechanism 을 이용한다. 이것을 이용해서 processor 의 사용을 효과적으로 다른 프로그램이나, 쓰레드, 프로세스로 넘길 수 있다. 참고로 linux 에서는 이 task gates 를 사용하지 않는다.[ref. 6]

protected mode IDT 는 물리적인 memory 어느 곳에 상주하지만, 딱히 정해진 위치에 있지 않다. 그래서 이 녀석의 주소를 저장하고 있을 곳이 필요하다. 그 녀석이 IDTR 이다. 그리고, 이 register에 IDT 의 주소를 load 해 줄 때 쓰는 명령어가 LIDT 이다.

< IDT Gate descriptor / from: ref.2 Figure 6-2 >

IDTR

CPU에는 IDT 를 위한 register(IDTR)가 하나 있는데, 얘가 table 의 physical base address 주소와 length 를 가지고 있게 된다.

IDTR 은 base address 를 저장하는 부분 4byte 와 length(limit) 를 저장할 수 있는 부분 2byte 로 되어 있다. limit 은 IDT 의 마지막 1byte 의 주소를 알아내기 위한 값이다. 그래서 8N-1 의 계산을 하게 된다. 1개의 interrupt vector 가 있으면 마지막 byte 는 7 이 된다.

그래서 interrupt 가 발생하면 그 숫자에 8을 곱해서 base address 에 더해서 나온 주소(이 주소를 A라 하자.)에 해당 descriptor 가 있게 된다.

이 A 주소가 존재하는 주소인지에 대한 검사는 length를 가지고 하게 된다. 만약 주소 A가 너무 크면 exception 이 발생하고, 정상인 경우에는 주소 A에 있는 descriptor 를 불러오고 불러온 descriptor의 type 과 contents 에 따라 동작이 취해진다.

IDT 는 2KB(8 byte 의 entry 가 256) 의 크기를 갖는다. 하지만, IDT 는 더 작은 수의 descriptor 를 갖고 있을 수 있다. 왜냐하면, 발생할 것 같은 interrupt 나 exception 에 대한 descriptor 만 있으면 되기 때문이다. 단, 비어 있는 slot 은 'P' 가 '0' 으로 set 돼야 한다.[ref. 2]

IDT instructions

이 IDT 를 위한 instruction 이 2개가 있다.

LIDT : load IDT register, IDT register에 IDT의 base address 와 limit 을 불러오는 명령어. CPL 이 '0'일 때만 가능하다. 보통 OS 가 초기화될 때 한 번 호출된다.
SIDT : store IDT register, IDT register 에 있는 내용을 memory 로 copy 해 준다. CPL 에 상관없이 가능하다.

< IDT와 IDTR, ref. 5>

LINUX 에서 INT3 동작 정리

IDTR 을 통해서 interrupt vector 를 구하게 된다.(< 그림. IDT와 IDTR >)
이 interrupt vector 를 이용해서 IDT 에 있는 trap gate 를 보고 interrupt procedure 의 주소를 계산해 낸다.( < 그림. interrupt call >)
이 interrupt procedure 주소가 linux 에서는 intermediate handler 의 주소가 되고, 이 녀석을 통해서 ENTRY(int3) 이 실행될 것이다.
ENTRY(int3)이 실행되면, error_code 를 통해 do_int3() 이 호출된다.

이제 실제로 source code 로 구현된 부분을 보면서 어떻게 동작하는 지 확인 해 보자.

In Source code

Event

event 는 하드웨어에서 전기적인 신호가 감지되는 것이다. 이 신호를 받아서 cpu 가 수행하고 있던 instruction 을 멈추고 다른 instruction 을 수행하는 것이다. 즉, instruction 의 순서를 바꾸는 것이다.[ref. 6]

< interrupt call / from : ref. 2, figure 6-3 >

아래는 IDT 를 만드는 것과 관련된 linux source 이다. IDT 는 BIOS routine 에서 만들어지지만, linux OS 에서는 한 번 더 만든다. 그게 아래 코드이다.[ref. 6]

/linux/include/asm/system.h

#define _set_gate(gate_addr,type,dpl,addr) \
__asm__ __volatile__ ("movw %%dx,%%ax\n\t" \
 "movw %2,%%dx\n\t" \
 "movl %%eax,%0\n\t" \
 "movl %%edx,%1" \
 :"=m" (*((long *) (gate_addr))), \
  "=m" (*(1+(long *) (gate_addr))) \
 :"i" ((short) (0x8000+(dpl<<13)+(type<<8))), \
  "d" ((char *) (addr)),"a" (KERNEL_CS << 16) \
 :"ax","dx")

#define set_intr_gate(n,addr) \
 _set_gate(&idt[n],14,0,addr)

#define set_trap_gate(n,addr) \
 _set_gate(&idt[n],15,0,addr)

#define set_system_gate(n,addr) \
 _set_gate(&idt[n],15,3,addr)

#define set_call_gate(a,addr) \
 _set_gate(a,12,3,addr)

software interrupt 는 DPL field 가 3으로 되어 있다. 그러므로 INT3 의 경우도 DPL 이 3 이다.

linux/kernel/linux/arch/i386/kernel/traps.c

void trap_init(void)
{
 ...
 set_call_gate(&default_ldt,lcall7);
 set_trap_gate(0,&divide_error);
 set_trap_gate(1,&debug);
 set_trap_gate(2,&nmi);
 set_system_gate(3,&int3); /* int3-5 can be called from all */
 set_system_gate(4,&overflow);
 set_system_gate(5,&bounds);
 set_trap_gate(6,&invalid_op);
 set_trap_gate(7,&device_not_available);
 set_trap_gate(8,&double_fault);
 set_trap_gate(9,&coprocessor_segment_overrun);
 set_trap_gate(10,&invalid_TSS);
 set_trap_gate(11,&segment_not_present);
 set_trap_gate(12,&stack_segment);
 set_trap_gate(13,&general_protection);
 set_trap_gate(14,&page_fault);
 set_trap_gate(15,&spurious_interrupt_bug);
 set_trap_gate(16,&coprocessor_error);
 set_trap_gate(17,&alignment_check);
 ...
}

DO_VM86_ERROR( 3, SIGTRAP, "int3", int3, current)

#define DO_VM86_ERROR(trapnr, signr, str, name, tsk) \
asmlinkage void do_##name(struct pt_regs * regs, long error_code) \
{ \
 if (regs->eflags & VM_MASK) { \
  if (!handle_vm86_trap((struct vm86_regs *) regs, error_code, trapnr)) \
   return; \
  /* else fall through */ \
 } \
 tsk->tss.error_code = error_code; \
 tsk->tss.trap_no = trapnr; \
 force_sig(signr, tsk); \
 die_if_kernel(str,regs,error_code); \
}

Interrupt 발생부터 call interrupt procedure 까지

interrupt 가 발생해서 interrupt procedure 의 주소를 찾아서 수행하게 된다.

exception -----> intermediate Handler -----> Real Handler

linux 에서 대부분의 intermidate Handler 는 entry.S 에 정의되어 있다.

entry.S 에 정의된 int3 의 intermedate Handler 는 아래와 같다.

#define GET_CURRENT(reg) \
 movl $-8192, reg; \
 andl %esp, reg

ENTRY(int3)
 pushl $0
 pushl $ SYMBOL_NAME(do_int3)
 jmp error_code


error_code:
 pushl %ds
 pushl %eax
 xorl %eax,%eax
 pushl %ebp
 pushl %edi
 pushl %esi
 pushl %edx
 decl %eax                       # eax = -1
 pushl %ecx
 pushl %ebx
 cld
 movl %es,%ecx
 movl ORIG_EAX(%esp), %esi       # get the error code
 movl ES(%esp), %edi             # get the function address
 movl %eax, ORIG_EAX(%esp)
 movl %ecx, ES(%esp)
 movl %esp,%edx
 pushl %esi                      # push the error code
 pushl %edx                      # push the pt_regs pointer
 movl $(__KERNEL_DS),%edx
 movl %edx,%ds
 movl %edx,%es
 GET_CURRENT(%ebx)
 call *%edi                      # call do_int3
 addl $8,%esp
 jmp ret_from_exception

error_code에서 call *%edi (call do_int3)이전에 하는 일

do_int3 에 넘겨줄 register 값들을 push 를 통해 stack 에 넣는다.
stack 아래쪽에 %es, -1 을 넣는다.
error code 와 마지막 esp 를 추가로 stack 에 넣는다.
%ds, %es 에는 kernel data segment selector 를 넣고,
%ebx 에는 current process descriptor's address 를 넣는다.

여기서 부족한 부분은

LIDT 가 실행되는 시점
INT3 instruction 을 만난 후 interrupt vector 를 구하는 부분

이다. 이 부분은 차후에 보충하기로 하자.

References

http://forums.codeguru.com/showthread.php?370029-What-is-INT-3
Intel® 64 and IA-32 Architectures Developer's Manual: Vol. 3A
http://en.wikipedia.org/wiki/Interrupt_descriptor_table
http://en.wikipedia.org/wiki/Interrupt_vector
1장. 펜티엄 프로세서의 interrupt mechanism, Interrupt Mechanism and Application of Intel IA32 Architecture
Handling the Interrupt Descriptor Table
http://en.wikipedia.org/wiki/Direction_flag

[컴][디버그] Breakpoint 의 종류와 작동원리

process 를 멈춰서(halt) 확인할 수 있는 사항

variables
stack
arguments
memory locations

Breakpoints

3가지 breakpoint 가 있다.

soft breakpoints
1. one-shot breakpoint : 한 번 쓰이고 breakpoint list 에서 사라지는 것
2. persistent breakpoint : breakpoint list 에서 저장되어 있고, 계속해서 쓰이는 것
hardware breakpoints
memory breakpoints

soft breakpoints

우리가 debugging 할 때 흔히 쓰는 breakpoint 이다.
1-byte 의 instruction 이다.
process 의 실행을 멈추고 control 을 breakpoint exception handler 에 넘겨준다.

instruction 과 opcode 의 차이

~~instruction 은 MOV, ADD, INC 같은 녀석들이다. 이 녀석 들의 기계어가 바로 opcode(operation code) 이며 이 opcode 는 0x8c, 0x88 등의 숫자로 나타낸다.~~

댓글에서 꾸지람을 들어서 ㅜ.ㅜ instruction 과 opcode 에 대해 좀 더 알아봤다.

먼저, 간단한 instuction 에 대한 그림을 하나 보자.

<출처 : wiki, http://en.wikipedia.org/wiki/File:Mips32_addi.svg>

위에서 숫자(001000 0001 00010 0000000101011110) 가 우리가 알고 있는 기계어(Machine code) 이다. 이 기계어가 뜻하는 바가 그 밑에 적혀있는 것들이다. (OP Code, Addr 1, Addr 2, Immediate value)

위에 예제에서 보듯이 instruction 을 구성하는 binary 값들중에 operation(명령)을 나타내는 부분이 opcode 이다. 그래서 operation code 라고 하는 것 아닐까 생각된다.

그리고 우리가 흔히 assembler 라고 하는 것이 밑에 mnemonic 이라고 하는 부분에 있다.

결론은 이렇다. instruction 은 명령어에 대한 가장 추상적인 개념이고, 그 instruction 의 실체(기계어) 중 operation 부분이 opcode 이다. 그리고 이 opcode 와 parameter code 등에 대한 mnemonic 이 바로 우리가 아는 assembler 라고 할 수 있겠다.

INT3

soft breakpoint 를 만들기 위해서는 opcode 를 0xCC 로 바꿔야 한다.
예를 들어, 0x8BC3 (2-byte) 가 있다면 이중에 0x8B 1 byte 를 0xCC, interrupt 3(INT 3) instruction, 로 바꿔야 한다. 이 INT 3 instruction 이 halt CPU 를 하는 명령어 이다.

0x89E5 --> 0xCCE5

0x89E5 MOV EBP,ESP
라는 명령어가 있다면,
0xCCE5 로 만드는 것이다.

processor 가 0xCC 를 만나게 되면 execution 을 멈추고 INT3 event 를 발생시키게 된다.

debug-mode run

우리가 debugger에서 breakpoint 를 설정하고 debug mode 로 실행할 때 이런 일이 일어난다.

breakpoint 를 설정하면, debugger 가 첫byte 를 0xCC 로 바꾸고, 기존에 byte(opcode)는 어딘가에 저장해 놓는다.
debug mode 로 run 을 해서 CPU 가 0xCC 를 만나서 INT3 event 가 발생되면, debugger 가 그 interrupt 를 받게 된다.
그러면, debugger 가 instruction pointer(EIP register) 가 breakpoint list 에 있는 녀석을 가리키고 있는지를 검사한다.
그래서 breakpoint list 에 있는 녀석이라면, 아까 저장해 놓은 byte(opcode)를 그 주소(0xCC 가 써져 있는)에 다시 쓰게 된다.(write)
그래서 유저가 resume을 했을 때 opcode 는 다시 온전하게 수행이 된다.

soft breakpoint 와 CRC checksum

이런 soft breakpoint 에서 주의할 점 하나는 memory 에서 opcode 를 0xCC로 바꾸게 되면, cyclic redundancy check(CRC) checksum 도 같이 바뀌게 된다. 는 것이다.
CRC checksum 을 이용해 자신의 packet, file, memory 등의 data 가 변경되는가를 감시할 수 있다.
그리고 어떤 악성코드는 이 CRC checksum 이 변동되면 자기 자신을 kill 하는 것을 이용해 soft breakpoint 를 set 하지 못하게 만든다.

Hardware breakpoint

적은 수의 breakpoint 로 충분하다면 hardware breakpoint 가 알맞을 수 있다. 그리고 hardware breakpoint 는 soft breakpoint 와 다르게 software 의 변경을 가하지 않는다.
soft breakpoint 와 다르게 INT1 event 를 사용한다. 이 INT1 event 는 sigle step event(debugger 에서 next step 을 누를 때) 와 hardware breakpoint 를 위해 쓰인다.
hardware breakpoint 는 debug register 라는 것을 이용해 set 하게 된다.
일반적인 CPU 는 debug register(DR) 를 8개 가지고 있다. (DR0~DR7)

DR0~DR3 : breakpoint 들의 address 를 저장하기 위해 쓰인다.

DR4, DR5 는 예약되어 있고,

DR6는 status register 로 사용된다. 이 status register 는 breakpoint 의 type 을 알려준다.

DR7 은 on/off switch 역할을 한다. 그리고 다른 breakpoint 의 조건을 저장한다.

8개의 DR중에 DR0~DR3 만 breakpoint 의 address 를 저장하기 때문에, 실제로는 4개의 hardware breakpoint 를 만들 수 있다는 뜻이 된다.

DR7

DR7 에 특정 flag 를 설정해서 다음과 같은 조건에 호출되는 breakpoint 를 만들 수 있다.

특정주소의 instruction 이 수행됐을 때(0x00)
data 가 특정 address 에 write 될 때(0x01)
실행되지는 않지만, 특정 주소를 read 나 write 를 할 때(0x11)

Bits 0-7 은 DR0-DR3 의 on/off switch 역할을 하고, L, R field 는 scope를 나타낸다. local 인지, global 인지.
bit1 은 DR0-local,
bit2 는 DR0-global 이런 식으로 switch 역할을 한다.
switch 가 on 이 되면 enable 이 되는 것이다.
Bit 8-15 는 일반적인 debugging 목적으로 쓰이지 않는다.
Bit (16, 17) - (30, 31) 은 DR0-DR3 에 설정된 breakpoint의 Type과 length 를 나타낸다.
bit16,17 은 DR0의 type
bit18,19는 DR0의 length 이런 식으로 쓰인다.

자세한 사항은 아래 그림을 참고하자.

출처 : Gray Hat Python, Chpater 2. Figure 2-4

작동방식

CPU 가 instruction 을 execution 하기 전에 이 instruction address에 hardware breakpoint 가 걸려 있는지 확인한다.
그리고 operator 가 접근하려는 memory 가 hardware breakpoint 에 걸려있는지 확인한다.

제한

hardware breakpoint 의 제한은 4개밖에 없다는 것 이외에, read/write 에 대한 검사가 최대 4-byte 에서만 가능하다는 것이다.
이것을 극복하기 위해 memory breakpoint 가 존재한다.

Memory breakpoint

memory breakpoint 는 실제로 breakpoint 는 아니다. 개인적인 생각에는 우리가 흔히 생각하는 Page Fault 같은 것 같다.
memory 의 가장 작은 단위가 memory page 이다. 이 memory page 가 할당될 때, 이 page 는 특정 permission set 을 갖게 된다. 이런 permission 의 종류에는 아래 4가지가 있다.

Page execution
Page read
Page write
Guard page

OS 에서 이 permission 들을 page 에 할당하게 된다. page 는 이 permission 을 여러 개 가질 수 있다.
그리고, OS 에서 제공하는 함수를 통해서 이 page 가 어떤 permission 을 가졌는지 query 할 수 있고, permission 을 변경할 수도 있다.

Guard Page

여기서 볼 것은 Guard page 이다.

heap 을 stack 에서 분리하는 데에 유용하고, 어떤 한계선이상으로 memory 를 쓰지 않게 할 수 있다.
특정 메모리 영역을 접근했을 때 process 를 멈추는데 유용하다.

예를 들어, network 에 연결된 application 을 우리가 reverse engineering 할 때, packet 을 수신한 메모리 부분에 memory breakpoint(Guard Page) 를 걸어 놓을 수 있다.

이 breakpoint 가 application 이 수신한 packet 내용을 언제, 어떻게 이용되는지 알아낼 수 있게 해 준다.

그 memory page 에 대한 접근은 CPU 를 멈추고, guard page debugging exception 을 발생시킨다.

buffer memory 에 접근한 instruction 이 무엇인지를 조사할 수 있고, 무엇을 하는지 알아낼 수 있다.

Soft breakpoint 와 Guard Page

이런 방법을 이용하면, software 를 변경해야 하는 soft breakpoint 의 대신에 사용해서 software 를 변경하지 않고 execution 을 멈출 수 있다.

References

Gray Hat Python, Chapter 2, 2009년

[컴][디버그] Debugger 종류

black-box debugger 의 2개의 subclass

User-mode Debugger

user mode(ring 3) : user application 이 실행되는 processor mode 이다.
예(exmples)

Windows : WinDbg, OllyDbg
Linux : gdb

kernel mode(ring 0)
가장 높은 권한을 가진다.
os 의 core, drivers 그리고 low-level component 가 동작하는 곳이다.

user mode 에서 동작하는 application 에 대한 debugging 이 가능한 녀석이 user mode debugger 이고,
kernel mode 에서 동작하는 녀석에 대한 debugging 이 가능한 녀석이 kernel mode debugger 이다.

Intelligent Debugger

기존의 이런 debugger 들에 비해 좀 더 확장되고, script 작성이 가능한(scriptable) debugger

PyDbg
Immunity Debugger

References

Gray Hat Python, 2009, Justin Seitz

[컴][디버그]Immunity Debugger 이뮤니티 디버거 사용법 7 - PyCommand API 문서

http://i5on9i.blogspot.kr/2013/06/immunity-debugger.html

PyCommand API document 가 이제 제공되지 않는 듯 하다.
그럼 어쩔 수 없이 source code 를 봐야 한다.

immlib.py 는

<Immunity Debugger Installed Directory>\Libs\immlib.py

에 있다.
이 파일을 보면, Immunity Debugger 에서 사용할 수 있는 API 들을 볼 수 있다.

Hook

hook 과 관련된 정의(define) 는

<Immunity Debugger Installed Directory>\Libs\libhook.py

에서 확인할 수 있다.

[컴] DEP Data Execution Protection 이란?

DEP

OS 가 가지고 있는 보안관련 특성이다. Mac Os X, iOS, MS Windows, Linux, Android 들에서 제공한다.

application 이나 service 들을 non-executable memory region(heap 이나 stack 같은 영역) 에서 나온 실행코드(execution code)로 부터 보호한다.

buffer overflow 를 통해서 code 를 저장하는 명확한 exploit(certain exploit) 을 막는데 도움을 준다.

대부분의 exploits 는 shellcode 가 실행되기까지 shellcode 를 heap 또는 stack 에 저장하기 때문에 DEP 는 exploit 이 제대로 shellcode 를 실행하는 것을 방해한다.[ref. 3]

DEP는 2004 에 Linux 에 도입되었다. Windows 는 Windows XP Service Pack 2 에서 도입하였고, Mac OS 는 x86 으로 넘어가면서 도입하였다.

DEP 는 2가지모드로 작동한다.

hardware-enforced DEP
software-enforced DEP

hardware-enforced DEP

hardware-enforced DEP 는 CPU 를 위한 것이다. CPU가 nonexecutable 이라고 memory page 들을 표시할 수 있다.( 인텔의 x86 CPU 가 지원한다.)(ref. 1 에서 보다 자세한 설명을 볼 수 있다.)

NX bit 또는 XD bit 을 이용해서 Hardware-enforced DEP 를 구현한다.
processor 의 이 기능을 사용하려면, processor 가 PAE(Physical Address Extension) 모드로 작동하고 있어야 하는데, Windows 에서는 자동으로 PAE 모드를 설정하기 때문에 괜찮다고 한다.[ref. 1]
Widows Vist DEP 는 memory 의 일정영역을 data 만 가지고 있는 영역으로 표시한다. NX or XD bit 을 제공하는 processor 는 이 부분을 non-executable 이라고 판단한다.
Vista 부터 Windows Task Manager(작업관리자) 에서 특정 process 가 DEP 를 사용하고 있는지 여부를 알 수 있다.

< 작업관리자, DEP >

software-enforced DEP

hardware 기능이 없는 CPU에 대한 제한된 방어를 제공한다.
software-enforced DEP 는 data pages 에서 코드가 실행되는 것(exectution of code)을 막지 않지만, 대신에 SEH overwirte 를 막아준다.

References

http://support.microsoft.com/kb/875352
http://en.wikipedia.org/wiki/Data_Execution_Prevention
Gray Hat Python

[컴][디버그]Immunity Debugger 이뮤니티 디버거 사용법 6 - PyCommand

http://i5on9i.blogspot.kr/2013/06/immunity-debugger.html

Immunity Debugger 를 설치하면, PyCommand 라는 것을 사용할 수 있다. 그냥 python 을 사용할 수 있는데, debugger 의 기능을 python 의 module 로 만들어 놔서 다양한 batch 작업들을 할 수 있다.

자세한 사용법은 나중에 다루기로 하고, 여기서는 단순하게 Hello World 수준으로 PyCommand 를 실행 해 보자.

PyCommand 는 아래 그림에 보이는 command 창에서 손쉽게 실행시킬 수 있다. 이외에도 menu bar 에 보이는 python 그림 등을 이용해서 실행을 시킬 수도 있다.

실행방법

위에 보이는 command line 에서 앞에 !(느낌표) 를 붙여서 .py file 이름을 입력하면 실행이 된다.

!<script_name>

For example
>> !command

를 입력하면 command.py 가 실행된다. 뒤에 .py 를 붙이면 안된다.

Path

그럼 이 command.py 가 어디엔가 있어야 할 것이다. 이 녀석은 아래 path 에 넣어놔야 한다.

<Immunity Debugger Installed Path>\PyCommands 폴더 안에 .py 가 있어야 한다.

format

그러면 이 command.py 는 어떤 모습으로 되어 있어야 할까? 아래에 보이는 것이 가장 간단한 형식이다.

from immlib import * 
def main(args):
    
    
    return "return"

요약

이것들을 정리하면 아래와 같다.

PyCommand script 의 조건

def main(arg) 이 있어야 한다.

그리고 return string 을 해야 한다.

<Immunity Debugger Installed Path>\PyCommands 폴더 안에 .py 가 있어야 한다.

아래처럼 return value 는 command 창 아래에 status bar 에 표시된다. 또는 Alt+L 을 누르면 Log Window 가 뜬다. 이 녀석을 통해 확인할 수 있다.

from immlib import * 
def main(args):
    
    
    return "return"

이제 기본을 했다. 다음에는 좀 더 난이도 있는 PyCommand programming 을 해보자.

References

[컴][디버그] Immunity Debugger 이뮤니티 디버거 사용법 5 - 함수찾기

Immunity Debugger 이뮤니티 디버거 사용법 - 목차

함수는 이름으로 찾는다.
아래와 같은 방식으로 특정 모듈에서 함수를 찾을 수 있다.

context menu >> Search for >> Name(label) in current module
context menu >> Search for >> Name in all modules

[컴][os] mac os x 코드로 바라본 thread 의 설명

thread 에 대해 code 적으로 접근을 해 보고자 source code 를 찾는데, mac 의 source 가 단순히 접근하기 쉬워서 Mac source 로 분석하기로 했다. ^^;;;;

먼저 thread 에 대해 아는 내용을 적어 본다면.

각 process 는 각자 자신의 address space 를 갖는다. 그런데 thread 는 이 process내부에서 만들어지기 때문에 '같은 address space' 에서 여러 개의 thread 가 존재한다. 그로 인해 자연히 같은 address space 를 사용하게 된다.[ref. 1]

프로세스는 fork()하여 만든 자식 프로세스와 부모 프로세스가 각각 독립적인 메모리 공간(data, heap, stack 등)을 가지지만, 쓰레드는 다른 메모리 공간은 공유하며 stack만 독립적으로 가집니다.[ref.11]
그리고 thread 는 동작을 수행한다.(execution unit)

그러면 address space 를 갖는다는 의미는 무엇인가? 이것은 process 가 만들어지면서, virtual address memory 가 할당된다. 0~4GB(32bit cpu) 로 할당될 것이다.(불확실) 그때 이 범위 안에서 다시 일정부분을 thread 가 사용할 memory 로 할당해 주게 될 것이다. 이 안에는 call stack 도 있을 것이고, thread structure 도 있을 것이다. 이 것이 thread 라고 할 수 있겠다. 비유를 들어 메모리 관점에서만 설명을 한다면, process 는 os 에서 바라보는 memory 의 fragment 라고 본다면, thread 는 process 가 바라보는 memory 의 fragment 라고 할 수 있을 듯 하다.

그리고 thread 가 하나의 동작을 수행하기 위해서는 함수에 대한 pointer 를 가져야 할 것이고, thread 에게 processor 를 사용할 수 있도록 하는 과정도 필요할 것이다.

위에서 말한 부분들에 대해 간단하게 code 를 찾아서 확인해 보자.

Implementation

여기서는 Mac Os 의 pthread 소스를 가지고 확인 해 보자. 직접 소스를 보는 것이 더욱 도움이 되니, link 가 걸려있는 소스를 참고하도록 하자. 그리고 참고로 잘 모르고 막 적어놓은 경우도 있으니, 틀린 부분은 바로 잡아 주시면 감사하겠습니다. ^0^;;

참고로, mac os 는 unix-like 이기는 한데 Mach OS 의 위에서 만들어졌다. 그래서 Mach 의 thread 와 unix 의 thread 가 같이 공존한다. 이 둘에 대해서는 ref. 7 을 참고하도록 하자.

function pointer

thread 는 특정 동작을 실행하게 해 준다. 그럼 특정 동작을 어떻게 표현할 것인가? 하나의 statement 는 나아가서 하나의 instruction 은 하나의 동작이다. 근데 좀 더 복잡한 동작은 이 statement 와 instruction 들의 집합으로 만들어진다. 이 것이 function 이다. thread 에게 우리가 원하는 function() 을 할 수 있게, function pointer 를 가져야 한다.

이 function pointer 가 pthread_create() 할 때 인자(parameter) 로 넘겨주는 start_routine 이 된다.

call stack

call stack 이 필요하다. call stack 을 통해 갖는 장점은 ref. 4 을 참고하자. 이러한 이점을 살리기 위해 call stack 이 필요하다.

_pthread_create_pthread_onstack(attrs, &stack, &t) 에서 virtual memory 의 일정부분을 mapping 해서 stack 으로 잡는 듯 하다. vm_allocate() 은 실제로 allocate 을 하기 보다는 주소를 조정해서 다시 vm_map_enter() 를 실행한다.

만약 여기서 stack 을 만들지 않으면, bsdthread_create() 내부에서 다시 만들어준다.

struct thread_t

thread 도 하나의 data structure 를 갖는다. thread 를 관리하려고 한다면, 변수가 필요하다. 그래서 thread 에 대한 struct 인 thread_t 를 만들어야 한다.

type thread_t = mach_port_t(osfmk/mach/mach_types.defs)

이 thread_t 는 Mach 가 갖는 thread 이고, BSD thread 는 uthread 라는 struct 를 이용한다.[ref. 7]

결국 thread 의 특성은 이 uthread 의 특성에 좌우될 것이다.(Mach 의 thread struct 인 thread_t 는 좀 더 적은 기능을 가진다.[ref. 7])

이 bound_processor 는 cache miss 를 줄이는 방법에 하나인 듯 하다.[ref. 10]

scheduling

이 부분은 조금 더 분석이 필요하다. 하지만, thread_setrun() 을 통해 processor 에 thread 를 넘겨주는 부분이 scheduling 의 작용을 하지 않을까 생각된다.(불확실)

Source codes

// source 출처 : ref. 2


#include <stdio.h>
#include <pthread.h> 
main()  {
  pthread_t f2_thread, f1_thread; 
  void *f2(), *f1();
  int i1,i2;
  i1 = 1;
  i2 = 2;
  pthread_create(&f1_thread,NULL,f1,&i1);
  pthread_create(&f2_thread,NULL,f2,&i2);
  pthread_join(f1_thread,NULL);
  pthread_join(f2_thread,NULL);
}
void *f1(int *x){
  int i;
  i = *x;
  sleep(1);
  printf("f1: %d",i);
  pthread_exit(0); 
}
void *f2(int *x){
  int i;
  i = *x;
  sleep(1);
  printf("f2: %d",i);
  pthread_exit(0); 
}

#ifndef PTHREAD_MACH_CALL
#define PTHREAD_MACH_CALL(expr, ret) (ret) = (expr)
#endif

…
pthread_create()
_new_pthread_create_suspended(thread, attr, start_routine, arg, 0);
 mach_port_t kernel_thread = MACH_PORT_NULL;


 flags |= PTHREAD_START_CUSTOM;
 _pthread_create_pthread_onstack(attrs, &stack, &t)
  kr = vm_map(mach_task_self(), &stackaddr,
     attrs->stacksize + guardsize,
     vm_page_size-1,
     VM_MAKE_TAG(VM_MEMORY_STACK)| VM_FLAGS_ANYWHERE , MEMORY_OBJECT_NULL,
     0, FALSE, VM_PROT_DEFAULT, VM_PROT_ALL,
     VM_INHERIT_DEFAULT);
   kr = mach_vm_map(target_map, &map_addr, map_size, map_mask, flags,
        port, obj_offset, copy,
        cur_protection, max_protection, inheritance);
    vm_map_enter(target_map,
        &map_addr, map_size,
        (vm_map_offset_t)mask,
        flags,
        object, offset,
        copy,
        cur_protection, max_protection,
        inheritance);
        
  if (kr != KERN_SUCCESS)
   kr = vm_allocate(mach_task_self(),
       &stackaddr, attrs->stacksize + guardsize,
       VM_MAKE_TAG(VM_MEMORY_STACK)| VM_FLAGS_ANYWHERE);
 

 t->fun = start_routine; 
 __bsdthread_create(start_routine, arg, stack, t, flags))
  kret = thread_create(ctask, &th);
   thread_create_internal2(task, new_thread, FALSE);
    thread_create_internal(task, -1, (thread_continue_t)thread_bootstrap_return, TH_OPTION_NONE, &thread);
     machine_thread_create(new_thread, parent_task)
     ...
     ipc_thread_init(new_thread);
     queue_init(&new_thread->held_ulocks);


  if ((flags & PTHREAD_START_CUSTOM) == 0) {
   th_stacksize = (mach_vm_size_t)user_stack;  /* if it is custom them it is stacksize */
   th_allocsize = th_stacksize + PTH_DEFAULT_GUARDSIZE + p->p_pthsize;

   kret = mach_vm_map(vmap, &stackaddr,
         th_allocsize,
         page_size-1,
         VM_MAKE_TAG(VM_MEMORY_STACK)| VM_FLAGS_ANYWHERE , NULL,
         0, FALSE, VM_PROT_DEFAULT, VM_PROT_ALL,
         VM_INHERIT_DEFAULT);
       if (kret != KERN_SUCCESS)
        kret = mach_vm_allocate(vmap,
          &stackaddr, th_allocsize,
          VM_MAKE_TAG(VM_MEMORY_STACK)| VM_FLAGS_ANYWHERE);
   ...
  }
  
  if (kret != KERN_SUCCESS)
   return(ENOMEM);
  thread_reference(th);
  ...
  thread_set_wq_state64(th, (thread_state_t)ts64);
  ...
  kret = thread_resume(th);
   thread_mtx_lock(thread);
   ...
   if (thread->started)
     thread_wakeup_one(&thread->suspend_count);
    else {
     thread_start_internal(thread);
      
    }
   ...
   thread_mtx_unlock(thread);

thread_create_internal() 하면 machine_thread_create() 에서 cpu 에 따른 state 를 초기화 해준다.
그 후에 thread_start_internal() 에서 wait 하고 있는 state 를 없애 주고 나서, thread_setrun() 에서 thread_t’s bound_processor 의 run queue에 thread 를 할당(dispatch)해 준다.[ref. 9]

thread_start_internal()

 thread_start_internal(thread_t thread);
 

  clear_wait(thread, THREAD_AWAKENED);
   ret = clear_wait_internal(thread, result);
    thread_go(thread, wresult)
     thread_setrun(thread, SCHED_PREEMPT | SCHED_TAILQ);
  
  thread->started = TRUE;

Reference

http://www.programmerinterview.com/index.php/operating-systems/thread-vs-process/
http://en.wikipedia.org/wiki/Mach_(kernel)
https://github.com/KyleBenson/scripts/blob/master/teaching/pthread.c
http://en.wikipedia.org/wiki/Call_stack
cross reference, http://code.metager.de/source/xref/apple/xnu/osfmk/vm/vm_user.c
vm_map manpage
Mac os x and ios internals to the apple's core, Jonathan Levin, Wrox
Mac OS X Internals. A Systems Approach, 2006
thread_setrun(), Mac os x and ios internals to the apple's core, Jonathan Levin, Wrox
shared cache 를 위한 소프트웨어 테크닉
무결성, 쓰레드, 뮤텍스, 세마포어 에 관한 간단한 정리