(Abstract) Procedural abstraction reduces code size by replacing repeated code fragments with call instructions to a sub-routine that executes the repeated fragment. However, in order to build a subroutine, extra instructions are necessary to support the procedural call mechanism. In this paper, we present an operating system level technique which improves the space efficiency of a procedural abstraction-based code compaction technique. The call-related extra instructions are not used in the proposed technique because operating system routines implicitly supports the procedure call and return. The proposed technique consists of three execution modes including one applicable to ROM-based systems. The experimental results show the proposed technique reduces the code size significantly while increasing the execution time slightly.
(Abstract) Both the hardware cost and power consumption of computer systems heavily depend on the size of main memory, namely DRAM. This becomes important especially in tiny embedded systems (e.g., micro sensors) since they are produced in a large-scale and have to operate as long as possible, e.g., ten years. Although several methods have been developed to reduce the program code and data size, most of them need extra hardware devices, making them unsuitable for the tiny systems. For example, virtual memory system needs both MMU and TLB devices to execute large-size program on a small memory. This paper presents a software reproduction of the virtual memory system especially focusing on paging mechanism. In order to logically expand the physical memory space, the proposed method compacts, compresses, and swaps in/out heap memory blocks, which typically form over half of the whole memory size. A prototype implementation verifies that the proposed method can expand memory capacity by over twice. As a result, large size programs run in parallel with a reasonable overhead, comparable to that of hardware-based VM systems.
(Abstract) In order to alleviate the ever-increasing processor-memory performance gap of high-end parallel computers, on-chip compressed caches have been developed that can reduce the cache miss count and off-chip memory traffic by storing and transferring cache lines in a compressed form. However, we observed that their performance gain is often limited due to their use of the coarse-grained compressed cache line management which incurs internally fragmented space. In this paper, we present the fine-grained compressed cache line management which addresses the fragmentation problem, while avoiding an increase in the metadata size such as tag field and VM page table. Based on the SimpleScalar simulator with the SPEC benchmark suite, we show that over an existing compressed cache system the proposed cache organization can reduce the memory traffic by 15%, as it delivers compressed cache lines in a fine-grained way, and the cache miss count by 23%, as it stores up to three compressed cache lines in a physical cache line.
(Abstract) Cache and memory compression systems have been developed for improving memory system performance of high-performance parallel computers. Cache compression systems can reduce on-chip cache miss rate and off-chip memory traffic by storing and transferring cache lines in compressed form, while memory compression systems can expand main memory capacity by storing memory pages in compressed form. However, these systems have not been quantitatively evaluated on an identical condition, making it difficult to understand the performance of a new system relative to the existing systems. In this paper, we provide an identical execution-driven simulation environment for these systems. To the best of our knowledge, none has been evaluated the performance of cache and memory compression systems by using an execution-driven simulator. Experimental results show that cache compression systems reduce cache miss rate by 16% and memory traffic by 30%, while it expands memory capacity by less than 160%. The results also show that memory compression systems significantly expand memory capacity by over 270%. Based on these experimental analyses, we finally provide future research directions on the compression systems.