Low-level Implementation

Low-level Implementation Details of Wine's Low-level Implementation... Keyboard Wine now needs to know about your keyboard layout. This requirement comes from a need from many apps to have the correct scancodes available, since they read these directly, instead of just taking the characters returned by the X server. This means that Wine now needs to have a mapping from X keys to the scancodes these programs expect. On startup, Wine will try to recognize the active X layout by seeing if it matches any of the defined tables. If it does, everything is alright. If not, you need to define it. To do this, open the file dlls/x11drv/keyboard.c and take a look at the existing tables. Make a backup copy of it, especially if you don't use CVS. What you really would need to do, is find out which scancode each key needs to generate. Find it in the main_key_scan table, which looks like this: static const int main_key_scan[MAIN_LEN] = { /* this is my (102-key) keyboard layout, sorry if it doesn't quite match yours */ 0x29,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09,0x0A,0x0B,0x0C,0x0D, 0x10,0x11,0x12,0x13,0x14,0x15,0x16,0x17,0x18,0x19,0x1A,0x1B, 0x1E,0x1F,0x20,0x21,0x22,0x23,0x24,0x25,0x26,0x27,0x28,0x2B, 0x2C,0x2D,0x2E,0x2F,0x30,0x31,0x32,0x33,0x34,0x35, 0x56 /* the 102nd key (actually to the right of l-shift) */ }; Next, assign each scancode the characters imprinted on the keycaps. This was done (sort of) for the US 101-key keyboard, which you can find near the top in keyboard.c. It also shows that if there is no 102nd key, you can skip that. However, for most international 102-key keyboards, we have done it easy for you. The scancode layout for these already pretty much matches the physical layout in the main_key_scan, so all you need to do is to go through all the keys that generate characters on your main keyboard (except spacebar), and stuff those into an appropriate table. The only exception is that the 102nd key, which is usually to the left of the first key of the last line (usually Z), must be placed on a separate line after the last line. For example, my Norwegian keyboard looks like this § ! " # ¤ % & / ( ) = ? ` Back- | 1 2@ 3£ 4$ 5 6 7{ 8[ 9] 0} + \´ space Tab Q W E R T Y U I O P Å ^ ¨~ Enter Caps A S D F G H J K L Ø Æ * Lock ' Sh- > Z X C V B N M ; : _ Shift ift < , . - Ctrl Alt Spacebar AltGr Ctrl Note the 102nd key, which is the <> key, to the left of Z. The character to the right of the main character is the character generated by AltGr. This keyboard is defined as follows: static const char main_key_NO[MAIN_LEN][4] = { "|§","1!","2\"@","3#£","4¤$","5%","6&","7/{","8([","9)]","0=}","+?","\\´", "qQ","wW","eE","rR","tT","yY","uU","iI","oO","pP","åÅ","¨^~", "aA","sS","dD","fF","gG","hH","jJ","kK","lL","øØ","æÆ","'*", "zZ","xX","cC","vV","bB","nN","mM",",;",".:","-_", "<>" }; Except that " and \ needs to be quoted with a backslash, and that the 102nd key is on a separate line, it's pretty straightforward. After you have written such a table, you need to add it to the main_key_tab[] layout index table. This will look like this: static struct { WORD lang, ansi_codepage, oem_codepage; const char (*key)[MAIN_LEN][4]; } main_key_tab[]={ ... ... {MAKELANGID(LANG_NORWEGIAN,SUBLANG_DEFAULT), 1252, 865, &main_key_NO}, ... After you have added your table, recompile Wine and test that it works. If it fails to detect your table, try running WINEDEBUG=+key,+keyboard wine > key.log 2>&1 and look in the resulting key.log file to find the error messages it gives for your layout. Note that the LANG_* and SUBLANG_* definitions are in include/winnls.h, which you might need to know to find out which numbers your language is assigned, and find it in the WINEDEBUG output. The numbers will be (SUBLANG * 0x400 + LANG), so, for example the combination LANG_NORWEGIAN (0x14) and SUBLANG_DEFAULT (0x1) will be (in hex) 14 + 1*400 = 414, so since I'm Norwegian, I could look for 0414 in the WINEDEBUG output to find out why my keyboard won't detect. Once it works, submit it to the Wine project. If you use CVS, you will just have to do cvs -z3 diff -u dlls/x11drv/keyboard.c > layout.diff from your main Wine directory, then submit layout.diff to wine-patches@winehq.org along with a brief note of what it is. If you don't use CVS, you need to do diff -u the_backup_file_you_made dlls/x11drv/keyboard.c > layout.diff and submit it as explained above. If you did it right, it will be included in the next Wine release, and all the troublesome programs (especially remote-control programs) and games that use scancodes will be happily using your keyboard layout, and you won't get those annoying fixme messages either. Undocumented APIs Some background: On the i386 class of machines, stack entries are usually dword (4 bytes) in size, little-endian. The stack grows downward in memory. The stack pointer, maintained in the esp register, points to the last valid entry; thus, the operation of pushing a value onto the stack involves decrementing esp and then moving the value into the memory pointed to by esp (i.e., push p in assembly resembles *(--esp) = p; in C). Removing (popping) values off the stack is the reverse (i.e., pop p corresponds to p = *(esp++); in C). In the stdcall calling convention, arguments are pushed onto the stack right-to-left. For example, the C call myfunction(40, 20, 70, 30); is expressed in Intel assembly as: push 30 push 70 push 20 push 40 call myfunction The called function is responsible for removing the arguments off the stack. Thus, before the call to myfunction, the stack would look like: [local variable or temporary] [local variable or temporary] 30 70 20 esp -> 40 After the call returns, it should look like: [local variable or temporary] esp -> [local variable or temporary] To restore the stack to this state, the called function must know how many arguments to remove (which is the number of arguments it takes). This is a problem if the function is undocumented. One way to attempt to document the number of arguments each function takes is to create a wrapper around that function that detects the stack offset. Essentially, each wrapper assumes that the function will take a large number of arguments. The wrapper copies each of these arguments into its stack, calls the actual function, and then calculates the number of arguments by checking esp before and after the call. The main problem with this scheme is that the function must actually be called from another program. Many of these functions are seldom used. An attempt was made to aggressively query each function in a given library (ntdll.dll) by passing 64 arguments, all 0, to each function. Unfortunately, Windows NT quickly goes to a blue screen of death, even if the program is run from a non-administrator account. Another method that has been much more successful is to attempt to figure out how many arguments each function is removing from the stack. This instruction, ret hhll (where hhll is the number of bytes to remove, i.e. the number of arguments times 4), contains the bytes 0xc2 ll hh in memory. It is a reasonable assumption that few, if any, functions take more than 16 arguments; therefore, simply searching for hh == 0 && ll < 0x40 starting from the address of a function yields the correct number of arguments most of the time. Of course, this is not without errors. ret 00ll is not the only instruction that can have the byte sequence 0xc2 ll 0x0; for example, push 0x000040c2 has the byte sequence 0x68 0xc2 0x40 0x0 0x0, which matches the above. Properly, the utility should look for this sequence only on an instruction boundary; unfortunately, finding instruction boundaries on an i386 requires implementing a full disassembler -- quite a daunting task. Besides, the probability of having such a byte sequence that is not the actual return instruction is fairly low. Much more troublesome is the non-linear flow of a function. For example, consider the following two functions: somefunction1: jmp somefunction1_impl somefunction2: ret 0004 somefunction1_impl: ret 0008 In this case, we would incorrectly detect both somefunction1 and somefunction2 as taking only a single argument, whereas somefunction1 really takes two arguments. With these limitations in mind, it is possible to implement more stubs in Wine and, eventually, the functions themselves. Accelerators There are three differently sized accelerator structures exposed to the user: Accelerators in NE resources. This is also the internal layout of the global handle HACCEL (16 and 32) in Windows 95 and Wine. Exposed to the user as Win16 global handles HACCEL16 and HACCEL32 by the Win16/Win32 API. These are 5 bytes long, with no padding: BYTE fVirt; WORD key; WORD cmd; Accelerators in PE resources. They are exposed to the user only by direct accessing PE resources. These have a size of 8 bytes: BYTE fVirt; BYTE pad0; WORD key; WORD cmd; WORD pad1; Accelerators in the Win32 API. These are exposed to the user by the CopyAcceleratorTable and CreateAcceleratorTable functions in the Win32 API. These have a size of 6 bytes: BYTE fVirt; BYTE pad0; WORD key; WORD cmd; Why two types of accelerators in the Win32 API? We can only guess, but my best bet is that the Win32 resource compiler can/does not handle struct packing. Win32 ACCEL is defined using #pragma(2) for the compiler but without any packing for RC, so it will assume #pragma(4). Doing A Hardware Trace The primary reason to do this is to reverse engineer a hardware device for which you don't have documentation, but can get to work under Wine. This lot is aimed at parallel port devices, and in particular parallel port scanners which are now so cheap they are virtually being given away. The problem is that few manufactures will release any programming information which prevents drivers being written for Sane, and the traditional technique of using DOSemu to produce the traces does not work as the scanners invariably only have drivers for Windows. Presuming that you have compiled and installed wine the first thing to do is is to enable direct hardware access to your parallel port. To do this edit config (usually in ~/.wine/) and in the ports section add the following two lines read=0x378,0x379,0x37a,0x37c,0x77a write=0x378,x379,0x37a,0x37c,0x77a This adds the necessary access required for SPP/PS2/EPP/ECP parallel port on LPT1. You will need to adjust these number accordingly if your parallel port is on LPT2 or LPT0. When starting wine use the following command line, where XXXX is the program you need to run in order to access your scanner, and YYYY is the file your trace will be stored in: WINEDEBUG=+io wine XXXX 2> >(sed 's/^[^:]*:io:[^ ]* //' > YYYY) You will need large amounts of hard disk space (read hundreds of megabytes if you do a full page scan), and for reasonable performance a really fast processor and lots of RAM. You will need to postprocess the output into a more manageable format, using the shrink program. First you need to compile the source (which is located at the end of this section): cc shrink.c -o shrink Use the shrink program to reduce the physical size of the raw log as follows: cat log | shrink > log2 The trace has the basic form of XXXX > YY @ ZZZZ:ZZZZ where XXXX is the port in hexadecimal being accessed, YY is the data written (or read) from the port, and ZZZZ:ZZZZ is the address in memory of the instruction that accessed the port. The direction of the arrow indicates whether the data was written or read from the port. > data was written to the port < data was read from the port My basic tip for interpreting these logs is to pay close attention to the addresses of the IO instructions. Their grouping and sometimes proximity should reveal the presence of subroutines in the driver. By studying the different versions you should be able to work them out. For example consider the following section of trace from my UMAX Astra 600P 0x378 > 55 @ 0297:01ec 0x37a > 05 @ 0297:01f5 0x379 < 8f @ 0297:01fa 0x37a > 04 @ 0297:0211 0x378 > aa @ 0297:01ec 0x37a > 05 @ 0297:01f5 0x379 < 8f @ 0297:01fa 0x37a > 04 @ 0297:0211 0x378 > 00 @ 0297:01ec 0x37a > 05 @ 0297:01f5 0x379 < 8f @ 0297:01fa 0x37a > 04 @ 0297:0211 0x378 > 00 @ 0297:01ec 0x37a > 05 @ 0297:01f5 0x379 < 8f @ 0297:01fa 0x37a > 04 @ 0297:0211 0x378 > 00 @ 0297:01ec 0x37a > 05 @ 0297:01f5 0x379 < 8f @ 0297:01fa 0x37a > 04 @ 0297:0211 0x378 > 00 @ 0297:01ec 0x37a > 05 @ 0297:01f5 0x379 < 8f @ 0297:01fa 0x37a > 04 @ 0297:0211 As you can see there is a repeating structure starting at address 0297:01ec that consists of four io accesses on the parallel port. Looking at it the first io access writes a changing byte to the data port the second always writes the byte 0x05 to the control port, then a value which always seems to 0x8f is read from the status port at which point a byte 0x04 is written to the control port. By studying this and other sections of the trace we can write a C routine that emulates this, shown below with some macros to make reading/writing on the parallel port easier to read. #define r_dtr(x) inb(x) #define r_str(x) inb(x+1) #define r_ctr(x) inb(x+2) #define w_dtr(x,y) outb(y, x) #define w_str(x,y) outb(y, x+1) #define w_ctr(x,y) outb(y, x+2) /* Seems to be sending a command byte to the scanner */ int udpp_put(int udpp_base, unsigned char command) { int loop, value; w_dtr(udpp_base, command); w_ctr(udpp_base, 0x05); for (loop=0; loop < 10; loop++) if ((value = r_str(udpp_base)) & 0x80) { w_ctr(udpp_base, 0x04); return value & 0xf8; } return (value & 0xf8) | 0x01; } For the UMAX Astra 600P only seven such routines exist (well 14 really, seven for SPP and seven for EPP). Whether you choose to disassemble the driver at this point to verify the routines is your own choice. If you do, the address from the trace should help in locating them in the disassembly. You will probably then find it useful to write a script/perl/C program to analyse the logfile and decode them futher as this can reveal higher level grouping of the low level routines. For example from the logs from my UMAX Astra 600P when decoded further reveal (this is a small snippet) start: put: 55 8f put: aa 8f put: 00 8f put: 00 8f put: 00 8f put: c2 8f wait: ff get: af,87 wait: ff get: af,87 end: cc start: put: 55 8f put: aa 8f put: 00 8f put: 03 8f put: 05 8f put: 84 8f wait: ff From this it is easy to see that put routine is often grouped together in five successive calls sending information to the scanner. Once these are understood it should be possible to process the logs further to show the higher level routines in an easy to see format. Once the highest level format that you can derive from this process is understood, you then need to produce a series of scans varying only one parameter between them, so you can discover how to set the various parameters for the scanner. The following is the shrink.c program: /* Copyright David Campbell <campbell@torque.net> */ #include <stdio.h> #include <string.h> int main (void) { char buff[256], lastline[256] = ""; int count = 0; while (!feof (stdin)) { fgets (buff, sizeof (buff), stdin); if (strcmp (buff, lastline)) { if (count > 1) printf ("# Last line repeated %i times #\n", count); printf ("%s", buff); strcpy (lastline, buff); count = 1; } else count++; } return 0; }