banner ad


Abstract

Recent technological advances in the development of flashmemory based devices have consolidated their leadership position as the preferred storage media in the embedded systems market and opened new vistas for deployment in enterprise-scale storage systems. Unlike hard disks, flash devices are free from any mechanical moving parts, have no seek or rotational delays and consume lower power. However, the internal idiosyncrasies of flash technology make its performance highly dependent on workload characteristics. The poor performance of random writes has been a cause of major concern which needs to be addressed to better utilize the potential of flash in enterprise-scale environments. We examine one of the important causes of this poor performance: the design of the Flash Translation Layer (FTL) which performs the virtual-to-physical address translations and hides the erase-before-write characteristics of flash. We propose a complete paradigm shift in the design of the core FTL engine from the existing techniques with our Demand-based Flash Translation Layer (DFTL) which selectively caches page-level address mappings. We develop a flash simulation framework called FlashSim. Our experimental evaluation with realistic enterprise-scale workloads endorses the utility of DFTL in enterprise-scale storage systems by demonstrating: (i) improved performance, (ii) reduced garbage collection overhead and (iii) better overload behavior compared to state-of-the-art FTL schemes. For example, a predominantly random-write dominant I/O trace from an OLTP application running at a large financial institution shows a 78% improvement in average response time (due to a 3-fold reduction in operations of the garbage collector), compared to a state-of-the-art FTL scheme. Even for the well-known read-dominant TPC-H benchmark, for which DFTL introduces additional overheads, we improve system response time by 56%.

Categories and Subject Descriptors D.4.2 [Storage Management]: Secondary Storage

General Terms Performance, Measurement

Keywords Flash Management, Flash Translation Layer, Storage System

1. Introduction

Hard disk drives have been the preferred media for data storage in enterprise-scale storage systems for several decades. The disk storage market totals approximately $34 billion annually and is continually on the rise [27]. However, there are several shortcomings inherent to hard disks that are becoming harder to overcome as wemove into faster and denser design regimes. Hard disks are significantly faster for sequential accesses than for random accesses and the gap continues to grow. This can severely limit the performance that hard disk based systems are able to offer to workloads with significant random access component or lack of locality. In an enterprise-scale system, consolidation can result in the multiplexing of unrelated workloads imparting randomness to their aggregate [6].

Alongside improvements in disk technology, significant advances have also been made in various forms of solid-state memory such as NAND flash, magnetic RAM (MRAM), phase-change memory (PRAM), and Ferroelectric RAM (FRAM). Solid-state memory offers several advantages over hard disks: lower and more predictable access latencies for random requests, smaller form factors, lower power consumption, lack of noise, and higher robustness to vibrations and temperature. In particular, recent improvements in the design and performance of NAND flash memory (simply flash henceforth) have resulted in it being employed in many embedded and consumer devices. Small form-factor hard disks have already been replaced by flash memory in some consumer devices like music players, PDAs, digital cameras.

Flash devices are significantly cheaper than main memory technologies that play a crucial role in improving the performance of disk-based systems via caching and buffering. Furthermore, as an optimistic trend, their price-per-byte is falling [21], which leads us to believe that flash devices would be an integral component of future enterprise-scale storage systems. This trend is already evident as major storage vendors have started producing flash-based large-scale storage systems, such as RamSan-500 from Texas Memory Systems, Symmetrix DMX-4 from EMC, etc. In fact, International Data Corporation has estimated that over 3 million Solid State Disks (SSDs) will be shipped into enterprise applications, creating 1.2 billion dollars in revenue by 2011 [27].

Using FlashMemory in Enterprise-scale Storage. Before enterprise-scale systems can transition to employing flashbased devices at a large-scale, certain challenges must be addressed. It has been reported that manufacturers are seeing return rates of 20-30% on SSD-based notebooks due to failures and lower than expected performance [4].While not directly indicative of flash performance in the enterprise, this is a cause for serious concern. Upon replacing hard disks with flash, certain managers of enterprise-scale applications are finding results that point to degraded performance. For example, recently Lee et al. [18] observed that "database servers would potentially suffer serious update performance degradation if they ran on a computing platform equipped with flash memory instead of hard disks." There are at least two important reasons behind this poor performance of flash for enterprise-scale workloads. First, unlike main memory devices (SRAMs and DRAMs), flash is not always superior in performance to a disk - in sequential accesses, disks might still outperform flash [18]. This points to the need for employing hybrid storage devices that exploit the complementary performance properties of these two storage media. While part of our overall goal, this is out of the scope of this paper. The second reason, the focus of our current research, has to do with the performance of flash-based devices for workloads with random writes. Recent research has focused on improving random write performance of flash by adding DRAM-backed buffers [21] or buffering requests to increase their sequentiality [16]. However, we focus on an intrinsic component of the flash, namely the Flash Translation Layer (FTL) to provide a solution for this poor performance.

The Flash Translation Layer. The FTL is one of the core engines in flash-based SSDs that maintains a mapping table of virtual addresses from upper layers (e.g., those coming from file systems) to physical addresses on the flash. It helps to emulate the functionality of a normal block device by exposing only read/write operations to the upper software layers and by hiding the presence of erase operations, something unique to flash-based systems. Flash-based systems possess an asymmetry in how they can read and write. While a flash device can read any of its pages (a unit of read/write), it may only write to one that is in a special state called erased Flashes are designed to allow erases at a much coarser spatial granularity than pages since page-level erases are extremely costly. As a typical example, a 16GB flash product from Micron [23] has 2KB pages while the erase blocks are 128KB . This results in an important idiosyncrasy of updates in flash. Clearly, in-place updates would require an erase-per-update, causing performance to degrade. To get around this, FTLs implement out-of-place-updates. An out-of-place update: (i) chooses an already erased page, (ii) writes to it, (iii) invalidates the previous version of the page in question, and (iv) updates its mapping table to reflect this change. These out-of-place updates bring about the need for the FTL to employ a garbage collection (GC) mechanism. The role of the GC is to reclaim invalid pages within blocks by erasing the blocks (and if needed relocating any valid pages within them to new locations). Evidently, FTL crucially affects flash performance.

One of the main difficulties the FTL faces in ensuring high performance is the severely constrained size of the onflash SRAM-based cache where it stores its mapping table. For example, a 16GB flash device requires at least 32MB SRAM to be able to map all its pages. With growing size of SSDs, this SRAMsize is unlikely to scale proportionally due to the higher price/byte of SRAM. This prohibits FTLs from keeping virtual-to-physical address mappings for all pages on flash (page-level mapping). On the other hand, a blocklevel mapping, can lead to increased: (i) space wastage (due to internal fragmentation) and (ii) performance degradation (due to GC-induced overheads). To counter these difficulties, state-of-the-art FTLs take the middle approach of using a hybrid of page-level and block-level mappings and are primarily based on the following main idea (we explain the intricacies of individual FTLs in Section 2): most of the blocks (called Data Blocks) are mapped at the block level, while a small number of blocks called "update" blocks are mapped at the page level and are used for recording updates to pages in the data blocks.

As we will argue in this paper, various variants of hybrid FTL fail to offer good enough performance for enterprise-scale workloads. First, these hybrid schemes suffer from poor garbage collection behavior. Second, they often come with a number of workload-specific tunable parameters (for optimizing performance) that may be hard to set. Finally and most importantly, they do not properly exploit the temporal locality in accesses that most enterprise-scale workloads are known to exhibit. Even the small SRAM available on flash devices can thus effectively store the mappings in use at a given time while the rest could be stored on the flash device itself. Our thesis in this paper is that such a page-level FTL, based purely on exploiting such temporal locality, can outperform hybrid FTL schemes and also provide a easier-to-implement solution devoid of tunable parameters.

Research Contributions. This paper makes the following specific contributions:

. . .Continue to read rest of article (PDF).


Dr. Bhuvan Urgaonkar, PhD has over 15 years of experience in the field of Software Engineering and Computers. His work includes research in computer systems software, distributed computing (including systems such as Zookeeper, Redis, Memcached, Cassandra, Kafka), datacenters, cloud computing, storage systems, energy efficiency of computers and datacenters, big data (including systems such as Hadoop, Spark). He serves as an expert / technical consultant with multiple firms helping them (i) understand technical content related to state of the art products in areas such as content distribution, distributed computing, datacenter design, among others and (ii) interpret patents in these areas and connections between them and state of the art products and services. Services are available to law firms, government agencies, schools, firms / corporations, and hospitals.

©Copyright - All Rights Reserved

DO NOT REPRODUCE WITHOUT WRITTEN PERMISSION BY AUTHOR.