现在的位置: 首页 > 综合 > 正文

Three Ways to Inject Your Code into Another Process

2013年08月24日 ⁄ 综合 ⁄ 共 22285字 ⁄ 字号 评论关闭
文章目录

文章不错 原文地址http://www.codeproject.com/KB/threads/winspy.aspx

第二种方法是我想要的,第二种方法应该可以实现第三种类似的结果的。

Contents

Introduction

 WinSpy

Several password spy tutorials have been posted to The Code Project, but all of them rely on Windows hooks. Is there any other way to make such a utility? Yes, there is. But first, let me review the problem briefly, just to make sure we're all on the same page.

To "read" the contents of any control - either belonging to your application or not - you generally send the WM_GETTEXT message to it. This also applies to edit controls, except in one special case. If the edit control belongs to another process and the ES_PASSWORD style is set, this approach fails. Only the process that "owns" the password control can get its contents via WM_GETTEXT. So, our problem reduces to the following: How to get

Collapse | Copy Code
::SendMessage( hPwdEdit, WM_GETTEXT, nMaxChars, psBuffer );

executed in the address space of another process.

In general, there are three possibilities to solve this problem:

  1. Put your code into a DLL; then, map the DLL to the remote process via windows hooks.
  2. Put your code into a DLL and map the DLL to the remote process using the CreateRemoteThread & LoadLibrary technique.
  3. Instead of writing a separate DLL, copy your code to the remote process directly - via WriteProcessMemory - and start its execution with CreateRemoteThread. A detailed description of this technique can be found here.

I. Windows Hooks

Demo applications: HookSpy and HookInjEx

The primary role of windows hooks is to monitor the message traffic of some thread. In general there are:

  1. Local hooks, where you monitor the message traffic of any thread belonging to your process.
  2. Remote hooks, which can be:
    1. thread-specific, to monitor the message traffic of a thread belonging to another process;
    2. system-wide, to monitor the message traffic for all threads currently running on the system.

If the hooked thread belongs to another process (cases 2a & 2b), your hook procedure must reside in a dynamic-link library (DLL). The system then maps the DLL containing the hook procedure into the address space of the hooked thread. Windows will map the entire DLL, not just the hook procedure. That is why Windows hooks can be used to inject code into another process's address space.

While I won't discuss hooks in this article further (take a look at the SetWindowHookEx API in MSDN for more details), let me give you two more hints that you won't find in the documentation, but might still be useful:

  1. After a successful call to SetWindowsHookEx, the system maps the DLL into the address space of the hooked thread automatically, but not necessary immediately. Because windows hooks are all about messages, the DLL isn't really mapped until an adequate event happens. For example:

    If you install a hook that monitors all nonqueued messages of some thread (WH_CALLWNDPROC), the DLL won't be mapped into the remote process until a message is actually sent to (some window of) the hooked thread. In other words, if UnhookWindowsHookEx is called before a message was sent to the hooked thread, the DLL will never be mapped into the remote process (although the call to SetWindowsHookEx itself succeeded). To force an immediate mapping, send an appropriate event to the concerned thread right after the call to SetWindowsHookEx.

    The same is true for unmapping the DLL after calling UnhookWindowsHookEx. The DLL isn't really unmapped until an adequate event happens.

  2. When you install hooks, they can affect the overall system performance (especially system-wide hooks). However, you can easily overcome this shortcoming if you use thread-specific hooks solely as a DLL mapping mechanism, and not to trap messages. Consider the following code snippet:
    Collapse | Copy Code
    BOOL APIENTRY DllMain( HANDLE hModule,
                           DWORD  ul_reason_for_call,
                           LPVOID lpReserved )
    {
        if( ul_reason_for_call == DLL_PROCESS_ATTACH )
        {
            // Increase reference count via LoadLibrary
            char lib_name[MAX_PATH]; 
            ::GetModuleFileName( hModule, lib_name, MAX_PATH );
            ::LoadLibrary( lib_name );
    
            // Safely remove hook
            ::UnhookWindowsHookEx( g_hHook );
        }    
        return TRUE;
    }
    

    So, what happens? First we map the DLL to the remote process via Windows hooks. Then, right after the DLL has actually been mapped, we unhook it. Normally, the DLL would be unmapped now, too, as soon as the first message to the hooked thread would arrive. The dodgy thing is we prevent this unmapping by increasing the DLLs reference count via LoadLibrary.

    The question that remains is: How to unload the DLL now, once we are finished? UnhookWindowsHookEx won't do it because we unhooked the thread already. You could do it this way:

    • Install another hook, just before you want to unmap the DLL;
    • Send a "special" message to the remote thread;
    • Catch this message in your hook procedure; in response, call FreeLibrary & UnhookWindowsHookEx.

    Now, hooks are used only while mapping/unmapping the DLL to/from the remote process; there is no influence on the performance of the "hooked" thread in the meantime. Put anohter way: We get a DLL mapping mechanism that doesn't interfere the target process more than the LoadLibrary technique discussed below does (see Section II.). However, opposed to the LoadLibrary technique, this solution works on both: WinNT and Win9x.

    But, when should one use this trick? Always when the DLL has to be present in the remote process for a longer period of time (i.e. if you subclass a control belonging to another process) and you want to interfere the target process as little as possible. I didn't use it in HookSpy because the DLL there is injected just for a moment - just long enough to get the password. I rather provided another example - HookInjEx - to demonstrate it. HookInjEx maps/unmaps a DLL into "explorer.exe", where it subclasses the Start button. More precisely: It swaps the left and right mouse clicks for the Start button.

You will find HookSpy and HookInjEx as well as their sources in the download package at the beginning of the article.

II. The CreateRemoteThread & LoadLibrary Technique

Demo application: LibSpy

In general, any process can load a DLL dynamically by using the LoadLibrary API. But, how do we force an external process to call this function? The answer is CreateRemoteThread.

Let's take a look at the declaration of the LoadLibrary and FreeLibrary APIs first:

Collapse | Copy Code
HINSTANCE LoadLibrary(
  LPCTSTR lpLibFileName   // address of filename of library module
);

BOOL FreeLibrary(
  HMODULE hLibModule      // handle to loaded library module
);

Now, compare them with the declaration of ThreadProc - the thread routine - passed to CreateRemoteThread:

Collapse | Copy Code
DWORD WINAPI ThreadProc(
  LPVOID lpParameter   // thread data
);

As you can see, all functions use the same calling convention and all accept a 32-bit parameter. Also, the size of the returned value is the same. In other words: We may pass a pointer to LoadLibrary/FreeLibrary as the thread routine to CreateRemoteThread.

However, there are two problems (see the description for CreateRemoteThread below):

  1. The lpStartAddress parameter in CreateRemoteThread must represent the starting address of the thread routine in the remote process.
  2. If lpParameter - the parameter passed to ThreadFunc - is interpreted as an ordinary 32-bit value (FreeLibrary interprets it as an HMODULE), everything is fine. However, if lpParameter is interpreted as a pointer (LoadLibraryA interprets it as a pointer to a char string), it must point to some data in the remote process.

The first problem is actually solved by itself. Both LoadLibrary and FreeLibray are functions residing in kernel32.dll. Because kernel32 is guaranteed to be present and at the same load address in every "normal" process (see Appendix A), the address of LoadLibrary/FreeLibray is the same in every process too. This ensures that a valid pointer is passed to the remote process.

The second problem is also easy to solve: Simply copy the DLL module name (needed by LoadLibrary) to the remote process via WriteProcessMemory.

So, to use the CreateRemoteThread & LoadLibrary technique, follow these steps:

  1. Retrieve a HANDLE to the remote process (OpenProcess).
  2. Allocate memory for the DLL name in the remote process (VirtualAllocEx).
  3. Write the DLL name, including full path, to the allocated memory (WriteProcessMemory).
  4. Map your DLL to the remote process via CreateRemoteThread & LoadLibrary.
  5. Wait until the remote thread terminates (WaitForSingleObject); this is until the call to LoadLibrary returns. Put another way, the thread will terminate as soon as our DllMain (called with reason DLL_PROCESS_ATTACH) returns.
  6. Retrieve the exit code of the remote thread (GetExitCodeThread). Note that this is the value returned by LoadLibrary, thus the base address (HMODULE) of our mapped DLL.
  7. Free the memory allocated in Step #2 (VirtualFreeEx).
  8. Unload the DLL from the remote process via CreateRemoteThread & FreeLibrary. Pass the HMODULE handle retreived in Step #6 to FreeLibrary (via lpParameter in CreateRemoteThread).
    Note: If your injected DLL spawns any new threads, be sure they are all terminated before unloading it.
  9. Wait until the thread terminates (WaitForSingleObject).

Also, don't forget to close all the handles once you are finished: To both threads, created in Steps #4 and #8; and the handle to the remote process, retrieved in Step #1.

Let's examine some parts of LibSpy's sources now, to see how the above steps are implemented in reality. For the sake of simplicity, error handling and unicode support are removed.

Collapse | Copy Code
HANDLE hThread;
char    szLibPath[_MAX_PATH];  // The name of our "LibSpy.dll" module
                               // (including full path!);
void*   pLibRemote;   // The address (in the remote process) where 
                      // szLibPath will be copied to;
DWORD   hLibModule;   // Base address of loaded module (==HMODULE);
HMODULE hKernel32 = ::GetModuleHandle("Kernel32");

// initialize szLibPath
//...

// 1. Allocate memory in the remote process for szLibPath
// 2. Write szLibPath to the allocated memory
pLibRemote = ::VirtualAllocEx( hProcess, NULL, sizeof(szLibPath),
                               MEM_COMMIT, PAGE_READWRITE );
::WriteProcessMemory( hProcess, pLibRemote, (void*)szLibPath,
                      sizeof(szLibPath), NULL );


// Load "LibSpy.dll" into the remote process
// (via CreateRemoteThread & LoadLibrary)
hThread = ::CreateRemoteThread( hProcess, NULL, 0,
            (LPTHREAD_START_ROUTINE) ::GetProcAddress( hKernel32,
                                       "LoadLibraryA" ),
             pLibRemote, 0, NULL );
::WaitForSingleObject( hThread, INFINITE );

// Get handle of the loaded module
::GetExitCodeThread( hThread, &hLibModule );

// Clean up
::CloseHandle( hThread );
::VirtualFreeEx( hProcess, pLibRemote, sizeof(szLibPath), MEM_RELEASE );

Assume our SendMessage - the code that we actually wanted to inject - was placed in DllMain (DLL_PROCESS_ATTACH), so it has already been executed by now. Then, it is time to unload the DLL from the target process:

Collapse | Copy Code
// Unload "LibSpy.dll" from the target process
// (via CreateRemoteThread & FreeLibrary)
hThread = ::CreateRemoteThread( hProcess, NULL, 0,
            (LPTHREAD_START_ROUTINE) ::GetProcAddress( hKernel32,
                                       "FreeLibrary" ),
            (void*)hLibModule, 0, NULL );
::WaitForSingleObject( hThread, INFINITE );

// Clean up
::CloseHandle( hThread );

Interprocess Communications

Until now, we only talked about how to inject the DLL to the remote process. However, in most situations the injected DLL will need to communicate with your original application in some way (recall that the DLL is mapped into some remote process now, not to our local application!). Take our Password Spy: The DLL has to know the handle to the control that actually contains the password. Obviously, this value can't be hardcoded into it at compile time. Similarly, once the DLL gets the password, it has to send it back to our application so we can display it appropriately.

Fortunately, there are many ways to deal with this situation: File Mapping, WM_COPYDATA, the Clipboard, and the sometimes very handy #pragma data_seg, to name just a few. I won't describe these techniques here because they are all well documented either in MSDN (see Interprocess Communications) or in other tutorials. Anyway, I used solely the #pragma data_seg in the LibSpy example.

You will find LibSpy and its sources in the download package at the beginning of the article.

III. The CreateRemoteThread & WriteProcessMemory Technique

Demo application: WinSpy

Another way to copy some code to another process's address space and then execute it in the context of this process involves the use of remote threads and the WriteProcessMemory API. Instead of writing a separate DLL, you copy the code to the remote process directly now - via WriteProcessMemory - and start its execution with CreateRemoteThread.

Let's take a look at the declaration of CreateRemoteThread first:

Collapse | Copy Code
HANDLE CreateRemoteThread(
  HANDLE hProcess,        // handle to process to create thread in
  LPSECURITY_ATTRIBUTES lpThreadAttributes,  // pointer to security
                                             // attributes
  DWORD dwStackSize,      // initial thread stack size, in bytes
  LPTHREAD_START_ROUTINE lpStartAddress,     // pointer to thread
                                             // function
  LPVOID lpParameter,     // argument for new thread
  DWORD dwCreationFlags,  // creation flags
  LPDWORD lpThreadId      // pointer to returned thread identifier
);

If you compare it to the declaration of CreateThread (MSDN), you will notice the following differences:

  • The hProcess parameter is additional in CreateRemoteThread. It is the handle to the process in which the thread is to be created.
  • The lpStartAddress parameter in CreateRemoteThread represents the starting address of the thread in the remote processes address space. The function must exist in the remote process, so we can't simply pass a pointer to the local ThreadFunc. We have to copy the code to the remote process first.
  • Similarly, the data pointed to by lpParameter must exist in the remote process, so we have to copy it there, too.

Now, we can summarize this technique in the following steps:

  1. Retrieve a HANDLE to the remote process (OpenProces).
  2. Allocate memory in the remote process's address space for injected data (VirtualAllocEx).
  3. Write a copy of the initialised INJDATA structure to the allocated memory (WriteProcessMemory).
  4. Allocate memory in the remote process's address space for injected code.
  5. Write a copy of ThreadFunc to the allocated memory.
  6. Start the remote copy of ThreadFunc via CreateRemoteThread.
  7. Wait until the remote thread terminates (WaitForSingleObject).
  8. Retrieve the result from the remote process (ReadProcessMemory or GetExitCodeThread).
  9. Free the memory allocated in Steps #2 and #4 (VirtualFreeEx).
  10. Close the handles retrieved in Steps #6 and #1 (CloseHandle).
  1. Additional caveats that ThreadFunc has to obey:

    1. ThreadFunc should not call any functions besides those in kernel32.dll and user32.dll; only kernel32 and user32 are, if present (note that user32 isn't mapped into every Win32 process!), guaranteed to be at the same load address in both the local and the target process (see

    Appendix A). If you need functions from other libraries, pass the addresses of LoadLibrary and GetProcAddress to the injected code, and let it go and get the rest itself. You could also use GetModuleHandle instead of LoadLibrary, if for one or another reason the debatable DLL is already mapped into the target process.
    Similarly, if you want to call your own subroutines from within ThreadFunc, copy each routine to the remote process individually and supply their addresses to ThreadFunc via INJDATA.

  2. Don't use any static strings. Rather pass all strings to ThreadFunc via INJDATA.
    Why? The compiler puts all static strings into the ".data" section of an executable and only references (=pointers) remain in the code. Then, the copy of ThreadFunc in the remote process would point to something that doesn't exist (at least not in its address space).
  3. Remove the /GZ compiler switch; it is set by default in debug builds (see Appendix B).
  4. Either declare ThreadFunc and AfterThreadFunc as static or disable incremental linking (see Appendix C).
  5. There must be less than a page-worth (4 Kb) of local variables in ThreadFunc (see Appendix D). Note that in debug builds some 10 bytes of the available 4 Kb are used for internal variables.
  6. If you have a switch block with more than three case statements, either split it up like this:

    Collapse | Copy Code
    switch( expression ) {
        case constant1: statement1; goto END;
        case constant2: statement2; goto END;
        case constant3: statement2; goto END;
    }
    switch( expression ) {
        case constant4: statement4; goto END;
        case constant5: statement5; goto END;
        case constant6: statement6; goto END;
    }
    END:
    

    or modify it into an if-else if sequence (see Appendix E).

You will almost certainly crash the target process if you don't play by those rules. Just remember: Don't assume anything in the target process is at the same address as it is in your process (see Appendix F).

GetWindowTextRemote(A/W)

All the functionality you need to get the password from a "remote" edit control is encapsulated in GetWindowTextRemot(A/W):

Collapse | Copy Code
int GetWindowTextRemoteA( HANDLE hProcess, HWND hWnd, LPSTR  lpString );
int GetWindowTextRemoteW( HANDLE hProcess, HWND hWnd, LPWSTR lpString );

Parameters

hProcess

Handle to the process the edit control belongs to.

hWnd

Handle to the edit control containing the password.

lpString

Pointer to the buffer that is to receive the text.

Return Value

The return value is the number of characters copied.

Let's examine some parts of its sources now - especially the injected data and code - to see how GetWindowTextRemote works. Again, unicode support is removed for the sake of simplicity.

INJDATA

Collapse | Copy Code
typedef LRESULT     (WINAPI *SENDMESSAGE)(HWND,UINT,WPARAM,LPARAM);

typedef struct {    
    HWND hwnd;                    // handle to edit control
    SENDMESSAGE  fnSendMessage;   // pointer to user32!SendMessageA

    char psText[128];    // buffer that is to receive the password
} INJDATA;

INJDATA is the data structure being injected into the remote process. However, before doing so the structure's pointer to SendMessageA is initialised in our application. The dodgy thing here is that user32.dll is (if present!) always mapped to the same address in every process; thus, the address of SendMessageA is always the same, too. This ensures that a valid pointer is passed to the remote process.

ThreadFunc

Collapse | Copy Code
static DWORD WINAPI ThreadFunc (INJDATA *pData) 
{
    pData->fnSendMessage( pData->hwnd, WM_GETTEXT,    // Get password
                          sizeof(pData->psText),
                          (LPARAM)pData->psText );  
    return 0;
}

// This function marks the memory address after ThreadFunc.
// int cbCodeSize = (PBYTE) AfterThreadFunc - (PBYTE) ThreadFunc.
static void AfterThreadFunc (void)
{
}

ThradFunc is the code executed by the remote thread. Point of interest:

  • Note how AfterThreadFunc is used to calculate the code size of ThreadFunc. In general this isn't the best idea, because the linker is free to change order of your functions (i.e. it could place ThreadFunc behind AfterThreadFunc). However, you can be pretty sure that in small projects, like our WinSpy is, the order of your functions will be preserved. If necessary, you could also use the /ORDER linker option to help you out; or yet better: Determine the size of ThreadFunc with a dissasembler.

How to Subclass a Remote Control With this Technique

Demo application: InjectEx

Let's explain something more complicated now: How to subclass a control belonging to another process with this technique?

First of all, note that you have to copy two functions to the remote process to accomplish this task:

  1. ThreadFunc, which actually subclasses the control in the remote process via SetWindowLong, and
  2. NewProc, the new window procedure of the subclassed control.

However, the main problem is how to pass data to the remote NewProc. Because NewProc is a callback function and thus has to conform to specific guidelines, we can't simply pass a pointer to INJDATA to it as an argument. Fortunately, there are other ways to solve this problem (I found two), but all rely on the assembly language. So, when I tried to preserve the assembly for the appendixes until now, it won't go without it this time.

Solution 1

Observe the following picture:

The virtual address space

Note that INJDATA is placed immediately before NewProc in the remote process? This way NewProc knows the memory location of INJDATA in the remote processes address space at compile time. More precisely: It knows the address of INJDATA relative to its own location, but that's actually all we need. Now NewProc might look like this:

Collapse | Copy Code
static LRESULT CALLBACK NewProc(
  HWND hwnd,       // handle to window
  UINT uMsg,       // message identifier
  WPARAM wParam,   // first message parameter
  LPARAM lParam )  // second message parameter
{
    INJDATA* pData = (INJDATA*) NewProc;  // pData points to
                                          // NewProc;
    pData--;              // now pData points to INJDATA;
                          // recall that INJDATA in the remote 
                          // process is immediately before NewProc;

    //-----------------------------
    // subclassing code goes here
    // ........
    //-----------------------------

    // call original window procedure;
    // fnOldProc (returned by SetWindowLong) was initialised
    // by (the remote) ThreadFunc and stored in (the remote) INJDATA;
    return pData->fnCallWindowProc( pData->fnOldProc, 
                                    hwnd,uMsg,wParam,lParam );
}

However, there is still a problem. Observe the first line:

Collapse | Copy Code
INJDATA* pData = (INJDATA*) NewProc;

This way, a hardcoded value (the memory location of the original NewProc in our process) will be arranged to pData. That is not quite what we want: The memory location of the "current" copy of NewProc in the remote process, regardless of to what location it is (NewProc) actually moved. In other words, we would need some kind of a "this pointer."

While there is no way to solve this in C/C++, it can be done with inline assembly. Consider the modified NewProc:

Collapse | Copy Code
static LRESULT CALLBACK NewProc(
  HWND hwnd,      // handle to window
  UINT uMsg,      // message identifier
  WPARAM wParam,  // first message parameter
  LPARAM lParam ) // second message parameter
{
    // calculate location of the INJDATA struct;
    // remember that INJDATA in the remote process
    // is placed right before NewProc;
    INJDATA* pData;
    _asm {
        call    dummy
dummy:
        pop     ecx         // <- ECX contains the current EIP
        sub     ecx, 9      // <- ECX contains the address of NewProc
        mov     pData, ecx
    }
    pData--;

    //-----------------------------
    // subclassing code goes here
    // ........
    //-----------------------------

    // call original window procedure
    return pData->fnCallWindowProc( pData->fnOldProc, 
                                    hwnd,uMsg,wParam,lParam );
}

So, what's going on? Virtually every processor has a special register that points to the memory location of the next instruction to be executed. That's the so-called instruction pointer, denoted EIP on 32-bit Intel and AMD processors. Because EIP is a special-purpose register, you can't access it programmatically as you can general purpose registers (EAX, EBX, etc). Put another way: There is no OpCode, with which you could address EIP and read or change its contents explicitly. However, EIP can still be changed (and is changed all the time) implicitly, by instructions such as JMP, CALL and RET. Let's, for example, explain how the subroutine CALL/RET mechanism works on 32-bit Intel and AMD processors:

 

When you call a subroutine (via CALL), the address of the subroutine is loaded into EIP. But, even before EIP is modified, its old value is automatically pushed onto the stack (for use later as a return instruction-pointer). At the end of a subroutine, the RET instruction automatically pops the top of the stack into EIP.

Now you know how EIP is modified via CALL and RET, but how to get its current value?
Well, remember that CALL pushes EIP onto the stack? So, in order to get its current value call a "dummy function" and pop the stack right thereafter. Let's explain the whole trick at our compiled NewProc:

Collapse | Copy Code
 Address   OpCode/Params   Decoded instruction
--------------------------------------------------
:00401000  55              push ebp            ; entry point of
                                               ; NewProc
:00401001  8BEC            mov ebp, esp
:00401003  51              push ecx
:00401004  E800000000      call 00401009       ; *a*    call dummy
:00401009  59              pop ecx             ; *b*
:0040100A  83E909          sub ecx, 00000009   ; *c*
:0040100D  894DFC          mov [ebp-04], ecx   ; mov pData, ECX
:00401010  8B45FC          mov eax, [ebp-04]
:00401013  83E814          sub eax, 00000014   ; pData--;
.....
.....
:0040102D  8BE5            mov esp, ebp
:0040102F  5D              pop ebp
:00401030  C21000          ret 0010
  1. A dummy function call; it just jumps to the next instruction and pushes EIP onto the stack.
  2. Pop the stack into ECX. ECX then holds EIP; this is exactly the address of the "pop ECX" instruction as well.
  3. Note that the "distance" between the entry point of NewProc and the "pop ECX" instruction is 9 bytes; thus, to calculate the address of NewProc, subtract 9 from ECX.

This way, NewProc can always calculate its own address, regardless of to what location it is actually moved! However, be aware that the distance between the entry point of NewProc and the "pop ECX" instruction might change as you change your compiler/linker options, and is thus different in release and debug builds, too. But, the point is that you still know the exact value at compile time:

  1. First, compile your function.
  2. Determine the correct distance with a disassembler.
  3. Finally, recompile with the correct distance.

That's the solution used in InjectEx. InjectEx, similarly as HookInjEx, swaps the left and right mouse clicks for the Start button.

Solution 2

Placing INJDATA right before NewProc in the remote processes address space isn't the only way to solve our problem. Consider the following variant of NewProc:

Collapse | Copy Code
static LRESULT CALLBACK NewProc(
HWND hwnd, // handle to window
UINT uMsg, // message identifier
WPARAM wParam, // first message parameter
LPARAM lParam ) // second message parameter
{
INJDATA* pData = 0xA0B0C0D0; // a dummy value

//-----------------------------
// subclassing code goes here
// ........
//-----------------------------

// call original window procedure
return pData->fnCallWindowProc( pData->fnOldProc,
hwnd,uMsg,wParam,lParam

抱歉!评论已关闭.