lunedì 2 gennaio 2023

The Segmented Memory Model and How It Works in Windows x64

I created this post as part of my jouring in getting more acquainted with the Intel architecture. Segmentation is a very important topic in the Intel architecture, so here is my contribution. For my experiment I'll use a x64 Windows 10 running in a VM attached to a kernel debugger.

Mode of Operations

The first step is to identify the processor mode of operation. x64 supports various modes and memory models. Let's try to identify the current one. This information is stored in the 32-bit CR0 control register ([1]), under the flag PE stored at position 0 (position 0 is the least significant bit (LSB), that is, the right-most bit). If this bit is set, we are running in protected mode, otherwise we are running in real-address mode. Let's use the kernel debugger to perform this check as shown in Figure 1.

kd> .formats cr0
Evaluate expression:
  Hex:     00000000`80050031
  Decimal: 2147811377
  Decimal (unsigned) : 2147811377
  Octal:   0000000000020001200061
  Binary:  00000000 00000000 00000000 00000000 10000000 00000101 00000000 00110001
  Chars:   .......1
  Time:    ***** Invalid
  Float:   low -4.59246e-040 high 0
  Double:  1.06116e-314
Figure 1. Operation Mode Identification
The CR0.PE bit is set to 1, so we are running in protected mode using a segmented memory model (you might also notice that the CR0.PG bit, at position 31 is set, indicating that we are also using paging). We can also check the sub-mode operation by inspecting the IA32_EFER Machine Specific Register (MSR) (0xC0000080) ([2]), and checking the LME (bit position 8) and LMA (bit position 10) flags. You can see the result in Figure 2.

kd> rdmsr 0xC0000080
msr[c0000080] = 00000000`00000d01
kd> .formats 00000000`00000d01
Evaluate expression:
  Hex:     00000000`00000d01
  Decimal: 3329
  Decimal (unsigned) : 3329
  Octal:   0000000000000000006401
  Binary:  00000000 00000000 00000000 00000000 00000000 00000000 00001101 00000001
  Chars:   ........
  Time:    Thu Jan  1 01:55:29 1970
  Float:   low 4.66492e-042 high 0
  Double:  1.64474e-320
Figure 2. Operation Sub-Mode Identification
The IA32_EFER.LMA and IA32_EFER.LME bits are set, so we are running in IA-32e sub-mode (64-bit). This information will be used later in the text.

Segmented Memory Model

The Segmented Memory Model accesses the memory by using the segment concept. A segment provides information on how to translate a given address. According to the executed instruction, a different segment is involved (eg. for call instruction the code segment is used, instead, for the push and pop instructions the stack segment is used). The Intel architecture defines a total of six segment registers: CS, DS, ES, SS, GS, and FS. For example, the CS segment (code segment) is used when a call instruction is executed. Let's see how this works with a practical example, let's consider the instruction in Figure 3.

00007FFD42C7D5C1 | E8 1A000000  | call kernelbase.7FFD42C7D5E0
Figure 3. How Segmentation Works
The call instruction uses the value 1A000000 to specify the address of the function to execute. Since we are in a x64 bit operation mode, the value is RIP-relative, this explains why the function address in the disassembly is 0x7FFD42C7D5E0 (0x7FFD42C7D5C1 (RIP) + 0x1a (offset) + 0x05 (instruction size)). In addition to the mentioned value, the value of the CS segment is also used. The combination of the CS with the function address is called the logical address. The segment value is then used to translate the logical address into what is known as the virtual address (this process is described in the next section). Since our system is using paging, and additional translation step is performed to translate the virtual address into the physical address (this topic is not covered in this post). All the translation steps are represented in Figure 4.
Figure 4. Logical to Physical Address Translation


How Segmentation Works

The segment registers are 16-bit registers whose structure is reported in Figure 5.
Figure 5. Segment Selector Format


The Index field is used as an index in a table that contains information on all the available segments. The TI flag indicates which table must be used, and the Request Privilege Level (RPL) field specifies the protection level of the code requesting access to a specific segment. The possible protection level values are: 0, 1, 2 and 3, and are often represented as protection rings, where ring 0 is the most privileged (where the kernel mode code is executed) and ring 3 is the least privileged (where user mode code is executed).

The two tables that contain information on the segments are the Global Descriptor Table (GDT) and the Local Descriptor Table (LDT). The registers GDTR and LDTR contain the base address of the respective table. In the latest Windows versions, the LDT is no more used, so the TI flag will always be 0. The GDT is an array of segment descriptors, where each segment descriptor is typically represented by the 64-bit structure reported in Figure 6.
Figure 6. Segment Descriptor Format


Given the segment descriptor definition, we can now explain how the logical address to virtual address translation is performed. The Base field is added to the logical address in order to obtain the virtual address. This process is described in Figure 7.
Figure 7. Segment Descriptor Usage in Address Translation


A very important field is DPL. It indicates the privilege level of the code running in that segment, for example, a DPL value of 0 can execute privileged instructions such as CLI. Another relevant field is L. This field indicates if the segment is running in Long mode (if it is set to 1) or in compatibility mode (if it is set to 0). Figure 8 shows how to inspect the GDT and all the defined segments.
kd> rgdtr
gdtr=fffff804382f3fb0
kd> db fffff804382f3fb0 
fffff804`382f3fb0  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
fffff804`382f3fc0  00 00 00 00 00 9b 20 00-00 00 00 00 00 93 40 00  ...... .......@.
fffff804`382f3fd0  ff ff 00 00 00 fb cf 00-ff ff 00 00 00 f3 cf 00  ................
fffff804`382f3fe0  00 00 00 00 00 fb 20 00-00 00 00 00 00 00 00 00  ...... .........
fffff804`382f3ff0  67 00 00 20 2f 8b 00 38-04 f8 ff ff 00 00 00 00  g.. /..8........
fffff804`382f4000  00 3c 00 00 00 f3 40 00-00 00 00 00 00 00 00 00  .<....@.........
fffff804`382f4010  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
fffff804`382f4020  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
kd> dg 10 50
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0010 00000000`00000000 00000000`00000000 Code RE Ac 0 Nb By P  Lo 0000029b
0018 00000000`00000000 00000000`00000000 Data RW Ac 0 Bg By P  Nl 00000493
0020 00000000`00000000 00000000`ffffffff Code RE Ac 3 Bg Pg P  Nl 00000cfb
0028 00000000`00000000 00000000`ffffffff Data RW Ac 3 Bg Pg P  Nl 00000cf3
0030 00000000`00000000 00000000`00000000 Code RE Ac 3 Nb By P  Lo 000002fb
0038 00000000`00000000 00000000`00000000  0 Nb By Np Nl 00000000
0040 00000000`382f2000 00000000`00000067 TSS32 Busy 0 Nb By P  Nl 0000008b
0048 00000000`0000ffff 00000000`0000f804  0 Nb By Np Nl 00000000
0050 00000000`00000000 00000000`00003c00 Data RW Ac 3 Bg By P  Nl 000004f3
Figure 8. Dumping All Segments
The first two commands obtain the address of the GDT register and dump the memory value. The first non null entry is at offset 0x10 from the GDT base address (the first entry in the GDT is always null). To have a more readable view, we can use the dg command; it dumps all the segments and shows relevant information. There are various Code and Data segments, having as privilege 0 (kernel mode) and 3 (user mode).

In particular, there is a segment in user mode that is running in 32-bit compatibility mode (Long=0); its segment selector is 0x20. Similarly, there is a segment running in user mode as long mode (Long=0); its segment selector is 0x30.

Windows and the Flat Memory Model

You might have heard that Windows uses a flat memory model, but, we stated above that we are running in a segment memory model. What does it mean? By now, you know how a segment descriptor is used to compute the virtual address and we have also dumped all the segment descriptors defined in the system. You might have noticed that all the Code and Data segments have the Base address field to 0. This implies that Windows is not taking advantage of the segment concept, since having as Base always 0 has as result that the logical address is equal to the virtual address. This means that we are using a segmented memory model without using the segment concept. This mode is known as flat memory model. This statement is also reported by the Intel official documentation:

In 64-bit mode, segmentation is generally (but not completely) disabled, creating a flat 64-bit linear-address space. The processor treats the segment base of CS, DS, ES, SS as zero, creating a linear address that is equal to the effective address. The FS and GS segments are exceptions. These segment registers (which hold the segment base) can be used as additional base registers in linear address calculations. They facilitate addressing local data and certain operating system data structures. Note that the processor does not perform segment limit checks at runtime in 64-bit mode.

Decoding a Segment Register

Let's try decoding the value stored in a segment register. Let's consider the CS register, having value 0x33. This value in binary format is 00110011b. As described in Figure 5, bits 3-15 represent the index in the GDT table, which in this case have decimal value 6 (110b). To obtain the segment selector we have to multiply the index by the size of a segment descriptor, which is 8 bytes. Figure 9 shows this operation in the kernel debugger.

kd> .formats 0x33
Evaluate expression:
  Hex:     00000000`00000033
  Decimal: 51
  Decimal (unsigned) : 51
  Octal:   0000000000000000000063
  Binary:  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00110011
  Chars:   .......3
  Time:    Thu Jan  1 01:00:51 1970
  Float:   low 7.14662e-044 high 0
  Double:  2.51973e-322
kd> dq gdtr + (6 * 8) L1
fffff804`382f3fe0  0020fb00`00000000
Figure 9. Obtain the Segment Selector
The segment descriptor value is 0020fb00`00000000. Now, let's use the dg and dt commands to display the segment descriptor associated with index 6, by using the operation 6 * 8 = 48 (0x30). The result is reported in Figure 10.

kd> dg 30
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0030 00000000`00000000 00000000`00000000 Code RE Ac 3 Nb By P  Lo 000002fb
kd> dt nt!_KGDTENTRY64 fffff804`382f3fe0 -b
   +0x000 LimitLow         : 0
   +0x002 BaseLow          : 0
   +0x004 Bytes            : 
      +0x000 BaseMiddle       : 0 ''
      +0x001 Flags1           : 0xfb ''
      +0x002 Flags2           : 0x20 ' '
      +0x003 BaseHigh         : 0 ''
   +0x004 Bits             : 
      +0x000 BaseMiddle       : 0y00000000 (0)
      +0x000 Type             : 0y11011 (0x1b)
      +0x000 Dpl              : 0y11
      +0x000 Present          : 0y1
      +0x000 LimitHigh        : 0y0000
      +0x000 System           : 0y0
      +0x000 LongMode         : 0y1
      +0x000 DefaultBig       : 0y0
      +0x000 Granularity      : 0y0
      +0x000 BaseHigh         : 0y00000000 (0)
   +0x008 BaseUpper        : 0
   +0x00c MustBeZero       : 0
   +0x000 DataLow          : 0n9283176673312768
   +0x008 DataHigh         : 0n0
Figure 10. Dump of a Segment Descriptor
As you can see, the result is the same in both cases.

Experimenting With Kernel Mode and User Mode Code

Let's use windbg to inspect the segments of a piece of code running in kernel mode (Figure 11).

kd> r
rax=0000000000000003 rbx=fffff804382fde60 rcx=fffff804382fde60
rdx=fffff804382fde10 rsi=fffff80433b731a0 rdi=fffff80433b73190
rip=fffff80435414be5 rsp=fffff804382fdde8 rbp=0000000000000000
 r8=0000000000000003  r9=fffff804382fddf8 r10=0000000000000000
r11=fffff804382fddd0 r12=fffff80433b73100 r13=0000000000000000
r14=0000000000000100 r15=00000000ffffffff
iopl=0         nv up di ng nz na po nc
cs=0010  ss=0000  ds=002b  es=002b  fs=0053  gs=002b             efl=00040086
nt!DebugService2+0x5:
fffff804`35414be5 cc              int     3
Figure 11. 64-bit Kernel Mode Process Registers
As you can see, RIP points to kernel address, and the CS segment value is 0x10 that, according to the result from Figure 8, corresponds to a segment of type Code, with privilege 0 (the most privileged) and Long mode enabled. Now let's try the same experiment by analyzing a 64-bit user-mode process (Figure 12).
Figure 12. 64-bit User Mode Process Registers


The image shows a CS segment value of 0x33, that corresponds to a segment of type Code, with privilege 3 (the lowest privilege) and Long mode enabled. Finally, let's see an example of a 32-bit user-mode process running on a 64-bit OS (Figure 13).
Figure 13. 32-bit User Mode Process Registers


The image shows a CS with value 0x23, that corresponds to a segment of type Code, with privilege 3 and Long mode disabled. Since Long mode is disabled, this implies that the process is running in compatibility-mode (32-bit).

Segment Transition and Syscall

We mentioned that code running in kernel mode has a different CS value with DPL value 0. How is the segment transition performed? There are various ways to change the segment descriptor. One way is by using specific instructions that change the CS register, such as retf, which reads the new CS value from the stack. However, due to a lower DPL we can not use such a mechanism.

An alternative method is to use a call gate segment descriptor ([3]). However, this mechanism is not used in modern Windows OS, which prefers to use the syscall instruction. Among the various actions performed by this instruction, there is the change of the segment selector. But, how is the correct segment chosen? This information is obtained from the IA32_STAR (0xC0000081) MSR. Bit 32-47 are extracted and used as value for the new segment selector (which is 0x10 in case of transition to kernel mode). Let's use windbg to verify this aspect (Figure 14).

kd> rdmsr 0xC0000081
msr[c0000081] = 00230010`00000000
kd> .formats 00230010`00000000
Evaluate expression:
  Hex:     00230010`00000000
  Decimal: 9851692904349696
  Decimal (unsigned) : 9851692904349696
  Octal:   0000430001000000000000
  Binary:  00000000 00100011 00000000 00010000 00000000 00000000 00000000 00000000
  Chars:   .#......
  Time:    Sun Mar 21 11:08:10.434 1632 (UTC + 1:00)
  Float:   low 0 high 3.21426e-039
  Double:  5.28462e-308
kd> .formats 0y0000000000010000
Evaluate expression:
  Hex:     00000000`00000010
  Decimal: 16
  Decimal (unsigned) : 16
  Octal:   0000000000000000000020
  Binary:  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00010000
  Chars:   ........
  Time:    Thu Jan  1 01:00:16 1970
  Float:   low 2.24208e-044 high 0
  Double:  7.90505e-323
kd> dg 10
                                                    P Si Gr Pr Lo
Sel        Base              Limit          Type    l ze an es ng Flags
---- ----------------- ----------------- ---------- - -- -- -- -- --------
0010 00000000`00000000 00000000`00000000 Code RE Ac 0 Nb By P  Lo 0000029b
Figure 14. Transition to DPL 0 Via Syscall Instruction
We first read the IA32_STAR MSR and extract the bits related to the new CS, whose value is 00000000 00010000. Converting this value to hex results in 0x10, which is exactly the same value that we obtained when we inspected the CS register in kernel mode in the previous section.

Heaven's Gates Consideration

If you reached this point, you now have all the information to understand the concept behind the Heaven's Gate mechanism, which is used to transition from x64 to x86 code in order to run 32-bit binaries. Microsoft created a specific segment descriptor for this purpose, assigning to it the value 0x20. The privileges between the two segment descriptors are the same, and it is possible to perform the transition by using one of the many instructions that take into consideration the CS register, such as retf or a far call. A lot of documentation is written on this aspect, and Microsoft refers to this with the name Windows-on-Windows (WoW64).

Conclusion

Modern OS are executed in protected mode under a flat segmented memory model. In this post we analyzed how this model works and how it can be used to change privilege levels. If you want to know more, I invite you to read the references.

References

[1] - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3 (3A): System Programming Guide - Chapter 2.5 CONTROL REGISTERS
[2] - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 4: Model-Specific Registers - IA32_EFER
[3] - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3 (3A): System Programming Guide - Chapter 5.8.3 Call Gates
[4] - Call Gates' Ring Transitioning in IA-32 Mode
[5] - Bringing Call Gates Back
[6] - Windows Internals, Part 2, 7th Edition
[7] - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1
[8] - Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1: Basic Architecture