Cashe Missing for Fun and Profit
Essay by review • November 10, 2010 • Research Paper • 4,221 Words (17 Pages) • 2,127 Views
Abstract. Simultaneous multithreading ЎЄ put simply, the shar-ing of the execution resources of a superscalar processor betweenmultiple execution threads ЎЄ has recently become widespread viaits introduction (under the name ÐŽoHyper-ThreadingÐŽ±) into IntelPentium 4 processors. In this implementation, for reasons of ef-ficiency and economy of processor area, the sharing of processorresources between threads extends beyond the execution units; ofparticular concern is that the threads share access to the memorycaches.We demonstrate that this shared access to memory caches pro-vides not only an easily used high bandwidth covert channel be-tween threads, but also permits a malicious thread (operating, intheory, with limited privileges) to monitor the execution of anotherthread, allowing in many cases for theft of cryptographic keys.Finally, we provide some suggestions to processor designers, op-erating system vendors, and the authors of cryptographic software,of how this attack could be mitigated or eliminated entirely.1. IntroductionAs integrated circuit fabrication technologies have improved, provid-ing not only faster transistors but smaller transistors, processor design-ers have been met with two critical challenges. First, memory latencieshave increased dramatically in relative terms; and second, while it iseasy to spend extra transistors on building additional execution units,many programs have fairly limited instruction-level parallelism, whichlimits the extent to which additional execution resources can be uti-lized. Caches provide a partial solution to the first problem, whileout-of-order execution provides a partial solution to the second.In 1995, simultaneous multithreading was revived1in order to com-bat these two difficulties [12]. Where out-of-order execution allowsinstructions to be reordered (subject to maintaining architectural se-mantics) within a narrow window of perhaps a hundred instructions,Key words and phrases. Side channels, simultaneous multithreading, caching.1Simultaneous multithreading had existed since at least 1974 in theory [10], evenif it had not yet been shown to be practically feasible.
--------------------------------------------------------------------------------
Page 2
simultaneous multithreading allows instructions to be reordered acrossthreads; that is, rather than having the operating system perform con-text switches between two threads, it can schedule both threads simul-taneously on the same processor, and instructions will be interleaved,dramatically increasing the utilization of existing execution resources.On the 2.8 GHz Intel Pentium 4 with Hyper-Threading processor,with which the remainder of this paper is concerned2, the two threadsbeing executed on each processor share more than merely the execu-tion units; of particular concern to us, they share access to the memorycaches [8]. Caches have already been demonstrated to be cryptograph-ically dangerous: Many implementations of AES [9] are subject to tim-ing attacks arising from the non-constancy of S-box lookup timings [1].However, having caches shared between threads provides a vastly moredangerous avenue of attack.2. Covert communication via pagingTo see how shared caches can create a cryptographic side-channel, wefirst step back for a moment to a simpler problem ЎЄ covert channels [7]ЎЄ and one of the classic examples of such a channel: virtual memorypaging.Consider two processes, known as the Trojan process and the Spyprocess, operating at different privilege levels on a multilevel securesystem, but both with access to some large reference file (naturally, ona multilevel secure system this access would necessarily be read-only).The Trojan process now reads a subset of pages in this reference file,resulting in page faults which load the selected pages from disk intomemory. Once this is complete (or even in the middle of this operation)the Spy process reads every page of the reference file and measures thetime taken for each memory access. Attempts to read pages whichhave been previously read by the Trojan process will complete veryquickly, while those pages which have not already been read will incurthe (easily measurable) cost of a disk access. In this manner, the Trojanprocess can repeatedly communicate one bit of information to the Spyprocess in the time it takes for a page to be loaded from disk intomemory, up to a total number of bits equal to the size (in pages) of theshared reference file.2We examine the 2.8 GHz Intel Pentium 4 with Hyper-Threading processor forreasons of availability, but expect that the results in this paper will apply equallyto all processors with the same simultaneous multithreading and memory cachedesign.
--------------------------------------------------------------------------------
Page 3
CACHE MISSING FOR FUN AND PROFIT3If the two processes do not share any reference file, this approachwill not work, but instead an opposite approach may be taken: Insteadof faulting pages into memory, the Trojan process can fault pages outof memory. Assume that the Trojan and Spy processes each have anaddress space of more than half of the available system memory andthe operating system uses a least-recently-used page eviction strategy.To transmit a ÐŽooneÐŽ± bit, the Trojan process reads its entire addressspace; to transmit a ÐŽozeroÐŽ± bit, the Trojan process spins for the sameamount of time while only accessing a single page of memory. The Spyprocess now repeatedly measures the amount of time needed to readits entire address space. If the Trojan process was sending a ÐŽooneÐŽ±bit, then the operating system will have evicted pages owned by theSpy process from memory, and the necessary disk activity when thosepages are accessed will provide an easily measurable time difference.While this covert channel has far lower bandwidth than the previouschannel ЎЄ it operates at a fraction of a bit per second, compared to afew hundred bits per second ЎЄ it demonstrates how a shared cache canbe used as a covert channel, even if the two communicating processesdo not have shared access to any potentially cached data.3. L1 cache missingThe L1 data cache in the Pentium 4 consists of 128 cache lines of64 bytes each, organized into 32 4-way associative sets. This cache iscompletely shared between the two execution threads; as such, each ofthe 32 cache sets behaves in the same manner as the paging systemdiscussed in the previous section: The threads cannot communicate byloading data into the cache, since no data is shared between the twothreads3, but they can communicate via a timing channel by forcingeach otherЎЇs data out of the cache.A covert channel can therefore be constructed as follows: The Trojanprocess allocates an array of 2048 bytes,
...
...