





 In a perfect world, there would be a way to store bits that's very fast and can be had in almost arbitrarily large amounts for a reasonable cost. In this world
 — a maxim from engineering (or so I hear): "Good, fast, cheap — pick any
two."

Slide 3

- Textbook talks about four basic technologies for storing (lots of) bits:
  - SRAM pretty fast, but costly, so not feasible on a large scale.
  - DRAM significantly less expensive but also significantly slower.
  - "Flash memory" slower but cheaper still, but does have the problem of "wearing out".
  - Magnetic disks cheap enough to be about as big as is needed for most general-purpose computing, but far, far slower.

## Memory Hierarchy — Recap/Review/Revisited

- So where does "hierarchy" come in? Well ...
- Programs' use of memory mainly exhibits "locality" (in both time and space).
- So it's common to design systems in terms of a hierarchy, with each level larger but slower than the one above it, with the hope that we can store (a copy of) most-frequently-used data in an upper level of the hierarchy, where it's fast to get at, and access lower levels less frequently.
- Idea is that data moves up and down in this hierarchy as needed, all in a way that's invisible to application programs, *except* for effects on performance.





## Caching — A Bit More Detail, Continued

 But wait — if cache is smaller than what it's caching, how can this work? each cache element could potentially contain one of many pieces of data? So include in cache element a "tag" that says which one it contains, plus a "valid" bit.

- For writes, things are a bit more complicated similar idea applies, but must decide whether to write to lower levels immediately or wait. Writing immediately is easier but slower, probably enough so that it's worth the trouble to do something more complicated. More details in textbook.
- Overall, textbook (section 5.8) presents four questions that pretty much sum it up; adding one more ...



## Caching — Mapping Addresses to Cache Elements

- "Direct map" cache is simple each memory address maps to exactly one cache element.
- "Fully associative" cache is opposite extreme any memory address can map to any cache element.

Slide 8

• "Set associative" cache is in between — each memory element maps to a set of entries. Reasonable compromise between extremes?







- On a "cache miss", if appropriate cache elements are all in use, must pick one to replace. For direct mapping, trivial (only one choice); for the other two not so trivial.
- How to choose? goal should be to replace something that won't be needed again, and often approaches are based on temporal locality (if not used recently maybe won't be used again soon).
- For processor caches, hardware problem, various solutions exist; for virtual memory, software (O/S) problem, and again various solutions exist ("page replacement algorithms").

## Caching — How to Manage Writes

- One complication here is that if cache elements are more than one word, need to read old element, then change the word being written.
- And then write back immediately ("write-through"), or wait (write buffer or "write-back")? former is easier but could be quite slow; latter is more complicated but probably needed for acceptable performance.

