Cache is a small and very fast type of memory. It means a type of buffer between the processor and the main memory. The time access for cache is at least 10 times smaller than main memory. Because it has a very expensive technology, the size is limited. To achieve high performance the addresses need to be located most of the time in cache.
1. General Aspects
The terminology used for cache (Figure 1) efficiency is “Hit Rate” (Hit Rate - represents the rapport between the number of addresses accessed from cache and the total number of addresses accessed during that time). The antonym for “Hit Rate” is named “Miss Rate” which can be determined by Miss Rate = 1 – Hit Rate formulas.
Figure 1: Data Transfer
The following parameter evaluate the Cache performance:
Access -> read/write operation
Hit -> data was found in Cache
Miss -> data wasn’t found in Cache; accessed from a slower memory
Miss_rate = #misses / #accesses; Hit_rate = #hits / #accesses
Taccess(cache) = 1ns, Taccess(main memory)=10ns,
Hit_Rate=90% (from 100 accesses, 90 will be from cache)
Taccess (with cache) = 90*1 + 10*1 = 100ns
Taccess(without cache) = 100*10 = 1000ns
Speed up = 100/10 = 10
Increase Hit Rate -> Improve Cache Performance -> Lower Power Consumption -> Improve Speed Execution.
3. Cache Architectures
Cache granularity is given by “line” which has 4 parts (figure 2):
- Valid Bit: is set to 1 when a valid data is stored in cache.
- Dirty Bit: is set to 1 when data is changed and is not updated to main memory in the same time.
- Tag: this field tells which address is in that line.
- Data: the data fetched from main memory.
Figure 2: Data block
Direct Mapped – each block from main memory is mapped only to a unique cache block, no matter if that block is empty or not. This model of cache organization doesn’t require a replacement policy.
Fully Associative – each block from main memory can be mapped to any of cache blocks. If the cache is full then a replacement policy need to decide which data will be invalidated from cache for making room of this new data block.
Set Associative – the cache is split into many sets. The new data can be mapped only to the blocks of a certain set, determined by some bits from address field. In practice is used mostly because of a very efficient ratio between implementation and efficiency.
4. Cache Management
Cache Memory has a small size compared to main memory of any system. Knowing that the time access from cache is least 10 times smaller than external memory, fetching data from cache is a big advantage. The process of eliminating a data from cache for making room of a new one is called “Cache Replacement Algorithm”.
There are a lot of these types of algorithms that are implemented in cache controller. Some of them are:
- LRU (Least Recently Used) – Replace the data that has not been used for the longest time. LRU is implemented by a linked list.
- LFU (Least Frequently Used) – Every line of cache has a counter which is incremented every time that data is accessed from cache.
- FIFO (First In First Out) – It’s very simple to implement, but has a problem when physical memory is bigger (Belady’s Anomaly).
- ARC (Adaptive Replacement Cache) – combine the LRU and LFU solutions and dynamically adjust between them.
5. Cache Optimization
Using the algorithm for cache management performances can be meaningfully increased by using the advantages of 2 more principles:
a. Spatial Locality – The requested data from cache is located near the previous data used in physical memory.
b. Temporal Locality – The requested data from cache was recently used or reused.
Some solutions very often used for these 2 methods are prefetching, loop blocking and fusion, array padding and merging.
Based on text written by RarCod