Abstract - Unlike the use of DRAM for caching or buffering, certain idiosyncrasies of NAND Flash-based solid-state drives (SSDs) make their integration into existing systems non-trivial. Flash memory suffers from limits on its reliability, is an order of magnitude more expensive than the magnetic hard disk drives (HDDs), and can sometimes be as slow as the HDD (due to excessive garbage collection (GC) induced by high intensity of random writes). Given these trade-offs between HDDs and SSDs in terms of cost, performance, and lifetime, the current consensus among several storage experts is to view SSDs not as a replacement for HDD but rather as a complementary device within the high-performance storage hierarchy. We design and evaluate such a hybrid system called HybridStore to provide: (a) Hybrid-Plan: improved capacity planning technique to administrators with the overall goal of operating within cost-budgets and (b) HybridDyn: improved performance/lifetime guarantees during episodes of deviations from expected workloads through two novel mechanisms: write-regulation and fragmentation busting. As an illustrative example of HybridStore's efficacy, HybridPlan is able to find the most cost-effective storage configuration for a large scale workload of Microsoft Research and suggest one MLC SSD with ten 7.2K RPM HDDs instead of fourteen 7.2K RPM HDDs only. HybridDyn is able to reduce the average response time for an enterprise scale random-write dominant workload by about 71% as compared to a HDD-based system.
Hard disk drives (HDDs) have been the preferred media for data storage in high-performance and enterprise-scale storage systems for several decades. However, there are several shortcomings inherent to HDDs. First, designers of HDDs are finding it increasingly difficult to further improve the RPM due to problems of dealing with the resulting increase in power consumption and temperature . Second, any further improvement in storage density-another way to increase the IDR-is increasingly harder to achieve and requires significant technological breakthroughs such as perpendicular recording . Third, perhaps most serious, despite a variety of techniques employing caching, pre-fetching, scheduling, write-buffering, and those based on improving parallelism via replication (e.g., RAID), the mechanical movement involved in the operation of HDDs can severely limit the performance that hard disk based systems are able to offer to workloads with significant randomness and/or lack of locality.
Alongside improvements in HDD technology, significant advances have also been made in various forms of solidstate memory such as NAND flash, STT-RAM, phase-change memory (PCM), and Ferroelectric RAM (FRAM). Solid-state memory offers several advantages over hard disks: lower access latencies for random requests, lower power consumption, lack of noise, and higher robustness to vibrations and temperature. In particular, recent improvements in the design and performance of NAND flash memory (simply flash henceforth) have resulted in its becoming popular in many embedded and consumer devices.
Table I presents a comparison of the performance, lifetime, and cost of representative DRAM, SSD, and HDD. There are several important implications of how these properties compare with each other. First, it is evident that there exists a huge gap between the Cost/GB of HDDs and SSDs. Second, unlike HDD, SSDs possess an asymmetry between the speeds at which reads and writes may be performed. As a result, the throughput a SSD offers for a write-dominant workload is lower than for a read-dominant workload. Third, flash technology restricts the locations on which writes may be performed-a flash location must be erased before it can be written-leading to the need for a garbage collector (GC) for/within an SSD. Certain workload characteristics (in particular, the presence of randomness) increase the fragmentation of data stored in flash memory, i.e., logically consecutive sectors become spread over physically non-consecutive blocks on flash. This exacerbates GC overheads, thereby significantly slowing down the SSD . Furthermore, this slowdown is non-trivial to anticipate. A given set of random writes may themselves experience good throughput, but increase fragmentation, thereby degrading the performance of requests (read or write) arriving much later in future. Finally, to further complicate matters, unlike HDDs, SSDs have a lifetime that is limited by the number of erases performed.
From the above description, it should be clear that SSDs are fairly complex devices . Their peculiar properties related to cost, performance, and lifetime make it difficult for a storage system designer to neatly fit them between HDD and DRAM. As has been observed in other recent research, under certain workload conditions, an SSD can perform worse than the HDD  and in certain SSDs, read throughput can be slower than write throughput for small random workload patterns . The SSD's lifetime limit calls for careful design to gainfully utilize them in conjunction with HDDs in the enterprise. The degrading lifetime with increased write-intensity may result in premature replacement of these devices, adding to deployment, procurement, and administrative costs. Finally, the low throughput offered by SSDs to random write-dominated workloads, which are frequently encountered in enterprise-scale systems , necessitates intelligent partitioning of data in such hybrid environments while ensuring that the management costs do not overwhelm the performance improvements.
This paper makes the following specific contributions.
The rest of this paper is organized as follows. Section II provides a bird's eye-view of the overall HybridStore architecture and how its two components, HybridPlan and HybridDyn, interact and discuss relevant related work. In Section III and IV, we describe the capacity planner and dynamic controller for HybridStore. Then we extensively evaluate HybridPlan and HybridDyn in Section V and VI respectively. Finally, we present concluding remarks in Section VII.
Figure 1 depicts the interaction between various components of HybridStore. Besides the storage hardware (HDDs, SSDs, and I/O buses) shown in the figure, HybridStore consists of two major software components. The first of these is a longterm resource provisioner called HybridPlan. We envision HybridPlan to be a tool that would enable storage administrators to provision both kinds of devices in cost-effective ways. The decision-making of HybridPlan would occur at coarse time-scales (months to years) corresponding to when procurement and deployment decisions are made. HybridPlan employs a ILP solver engine based on mathematical formulations to make its provisioning decisions. HybridPlan is intended to costeffectively provision devices to allow HybridStore to (i) adhere to the performance needs of hosted workloads and (ii) meet useful lifetime requirements specified by the administrator, under these workload assumptions.
The second component of HybridStore is a dynamic controller (HybridDyn) that operates at significantly finer timescales (milliseconds to hours). HybridDyn employs statistical models for performance of SSD and HDD to make dynamic request partitioning decisions-these decisions are made at request-level granularity (milliseconds to seconds). Additionally, it employs novel techniques for data management within the SSD (write regulation, and fragmentation busting) that operate at the granularity of several minutes to hours. Intuitively, the components of HybridDyn operate collectively to take corrective data management decisions in HybridStore to adhere to desired performance and lifetime needs despite (i) provisioning errors made by HybridPlan and (ii) deviations in workload characteristics and device behavior.
In a recent work from Microsoft Research, Narayanan et al.  examined the role of SSDs in enterprise storage systems using a number of real data center traces available to them. Their work explores the cost-benefit trade-offs of various SSD and HDD configurations flash and disk capacities/ configurations for these real traces. However, there are several key differences between our contributions. First, our work, in particular HybridPlan, is much more general and can be used to target any type of devices including STT-RAM and PCM. In this work, we focus only on flash since it is the only mature technology with concrete and meaningful numbers for cost and performance. Second, we have developed a data classification strategy which can be used to decide partitioning of workloads amongst the chosen devices. Third, while they admit that flash wear-out needs to be considered while using it as a write buffer, they do not explore any specific ways of doing this. We incorporate this in the form of lifetime budgets in HybridPlan and our dynamic workload partitioning (HybridDyn) employs a variety of techniques to adhere to these budgets. Finally, our study goes beyond capacity planning-HybridDyn employs a combination of model-driven as well as reactive techniques to operate our hybrid system under given performance/lifetime budgets despite varying workloads. Closest to our work is a recent paper by Guerra et al.  and we consider it highly complementary with similar results and insights. There are differences in our performance modeling approaches. Additionally, we consider lifetime constraints and include power costs in our formulation.
Dr. Bhuvan Urgaonkar, PhD has over 15 years of experience in the field of Software Engineering and Computers. His work includes research in computer systems software, distributed computing (including systems such as Zookeeper, Redis, Memcached, Cassandra, Kafka), datacenters, cloud computing, storage systems, energy efficiency of computers and datacenters, big data (including systems such as Hadoop, Spark). He serves as an expert / technical consultant with multiple firms helping them (i) understand technical content related to state of the art products in areas such as content distribution, distributed computing, datacenter design, among others and (ii) interpret patents in these areas and connections between them and state of the art products and services. Services are available to law firms, government agencies, schools, firms / corporations, and hospitals.
©Copyright - All Rights Reserved
DO NOT REPRODUCE WITHOUT WRITTEN PERMISSION BY AUTHOR.