Emerging energy-aware initiatives (such as billing of power usage based on de-coupling between electricity sales and utility profits/fixed-cost recovery) render current capacity planning practices based on heavy over-provisioning of power infrastructure unprofitable for data centers. We explore a combination of statistical multiplexing techniques (including controlled under-provisioning and overbooking) to improve the utilization of the power hierarchy in a data center. Our techniques are built upon a measurement-driven profiling and prediction technique to characterize key statistical properties of the power needs of hosted workloads and their aggregates. As a representative result from our evaluation on a prototype data center, by accurately identifying the worst-case needs of hosted workloads, our technique is able to safely operate 2.5 times as many servers running copies of the e-commerce benchmark TPC-W as allowed by the prevalent practice of using face-plate ratings. Exploiting statistical multiplexing among the power usage of these servers along with controlled under-provisioning by 10% based on tails of power profiles offers a further gain of 100% over face-plate provisioning. Reactive techniques implemented in the Xen VMM running on our servers dynamically modulate CPU DVFS-states to ensure power draw below safe limits despite aggressive provisioning. Finally, information captured in our profiles also provides ways of controlling application performance degradation despite the above under-provisioning: the 95th percentile of TPC-W session response time only grew from 1.59 sec to 1.78 sec.
To accommodate modern resource-intensive high-performance applications, large-scale data centers have grown at a rapid pace in a variety of domains ranging from research labs and academic groups to industry. The fast-growing power consumption of these platforms is a major concern due to its implications on the cost and efficiency of these platforms as well as the well-being of our environment. By 2005, the energy required to power and cool data center equipment accounted for about 1.2% of total U.S. electricity consumption according to a report released by the Lawrence Berkeley National Laboratory and sponsored by chip manufacturer AMD. Gartner, the IT research and advisory company, estimates that by 2010, about half of the Forbes Global 2000 companies will spend more on energy than on hardware such as servers . Furthermore, Gartner estimates that the manufacture, use, and disposal of IT equipment, a large share of which results from data centers, accounts for 2% of global CO2 emissions which is equivalent to the aviation industry.
The rapid rise in the number of data centers and the growing size of their hardware-base (especially the number of servers) are the primary causes of their increasing power needs. As an example, a New York Times article in June 2006 reported that Google had approximately 8,000 servers catering to about 70 Million Web pages in 2001, with the number growing to 100,000 by 2003. Their estimate put the total number of Google servers (spread over 25 data centers) to be around 450,000 at the time. Similarly, Microsoft's Internet services were being housed in around 200,000 servers with the number expected to hit 800,000 by 2011. While the power consumption per-unit of hardware has contributed a much smaller percentage to this growth, continuing miniaturization at multiple levels (ranging from chips, servers, racks, to room) within the data center has necessitated the procurement of higher capacity cooling systems to deal with the growing power densities.
These trends have severe implications on the total cost of operation (TCO)-deployment-time costs as well as a variety of recurring costs-of a data center. The impact on TCO due to higher bills paid to the electricity provider are easy to appreciate. At a rating of around 250 Watts for a server (which is on the lower end of peak power for a dual CPU system, with 1-2 GB memory, a disk, and a network card), a data center with 20K-40K servers consumes between 5-10 Megawatts just towards powering these servers (discounting cooling system costs.) Assuming a cost of 10c/KWH, this amounts to $4.38M-$8.76Mannually expended towards keeping these servers constantly powered for a single such data center. Higher power consumption, however, also hurts the TCO in other less obvious ways.
Existing practices for capacity planning of the power infrastructure within data centers employ significant degrees of over-provisioning atmultiple levels of the spatial hierarchy, ranging fromthe power supplies within servers , Power Distribution Units (PDUs) supplying power to servers, storage equipment, etc., to even higher-level Uninterrupted Power Supply (UPS) sub-stations . This overprovisioning is done to ensure uninterrupted and reliable operation even during episodes of excessive power draw as well as to accommodate future upgrades/additions to the computational/storage/networking equipment pool in the data center.
Such over-provisioning can prove unprofitable to the data center in two primary ways. The first and more significant reason emerges from ongoing efforts to promote energy-efficient operation and discourage wasteful over-provisioning of power supply via novel billing mechanisms. Currently, electricity providers make more money by selling more electricity and they make less money by helping their customers use less energy. Consequently, this disincentive impairs their willingness and ability to promote energy efficiency, despite its benefits to consumers' bills, electrical reliability, national security and the environment. This realization has led to proposals to remove the disincentives via "de-coupling," which means implementing a regulatory rate policy that breaks the link between electricity sales, on the one hand, and utility profits and fixed-cost recovery, on the other . According to a USNews article in May 2008, while California pioneered de-coupling as early as the 1970s, Connecticut, Idaho, New York, and Vermont had also chosen de-coupling by 2007, and a dozen other states now are considering it . Consequently, in the near future, a data center operating significantly below its provisioned power capacity can expect to pay higher recurring electricity costs. A second, perhaps comparatively minor, concern arises due to the excess costs expended towards procuring power infrastructure with higher capacity (including a larger number of power supply elements) than needed.
Figure 1: Illustration of the evolution of power capacity and demand in a hypothetical data center. Also shown is the evolution of provisioned capacity based on a prevalent practice such as using the face-plate ratings of devices.
Decisions related to the provisioning of power infrastructure must be made not only at installation time but on a recurring basis to cope with upgrades. Figure 1 illustrates the evolution of the power demand and capacity in a data center. As shown, there are two "head-rooms" between power demand and capacity. The first head-room H1 is intended to ensure the data center can accommodate foreseeable additions/upgrades (as shown by the curve labeled "Peak Power Consumption (Faceplate)" in Figure 1) to its hardware base. The second head-room H2, our focus in this research, results due to current capacity planning practices that significantly over-estimate the power needs of the data center. A number of recent studies on power usage in data centers provide evidence of such over-provisioning at various power elements [15, 16, 24]. Building upon these insights, we plan to carefully understand the power usage of data center workloads in order to develop a provisioning technique that addresses the headroom H2.
While provisioning closer to demand reduces both installation/upgrade as well as recurring costs, it does so at the risk of increased episodes of degraded performance/availability. This degradation can occur due to one or more of the following: (i) a subset of the hardware may simply not get powered up due to insufficient power supply (as happened with an ill-provisioned $2.3 Million Dell cluster at the University at Buffalo, where two-thirds of machines could not be powered on till a $20,000 electrical system upgrade was undertaken ), (ii) one or more fuses give way during an episode of surge in power drawn disrupting the operation of applications hosted on associated servers, and (iii) the thermal system, faced with constrained power supply, triggers shut/slow down of some devices. Any improvements in power provisioning must carefully trade-off the resulting cost savings against such performance degradation. Additionally, to realize such improvements, a data center must employ mechanisms that prevent (make statistically negligible) episodes of types (i)-(iii). In this paper, we develop a system that effectively provisions power while addressing these concerns.
In general terms, it is our contention that understanding the power usage behavior of hosted applications and employing techniques that are aware of these idiosyncrasies can allow a data center to make more informed provisioning decisions compared to existing techniques. To validate these ideas, we explore a combination of several complementary approaches.
Research Contributions. The contribution of our research is three-fold.
Dr. Bhuvan Urgaonkar, PhD has over 15 years of experience in the field of Software Engineering and Computers. His work includes research in computer systems software, distributed computing (including systems such as Zookeeper, Redis, Memcached, Cassandra, Kafka), datacenters, cloud computing, storage systems, energy efficiency of computers and datacenters, big data (including systems such as Hadoop, Spark). He serves as an expert / technical consultant with multiple firms helping them (i) understand technical content related to state of the art products in areas such as content distribution, distributed computing, datacenter design, among others and (ii) interpret patents in these areas and connections between them and state of the art products and services. Services are available to law firms, government agencies, schools, firms / corporations, and hospitals.
©Copyright - All Rights Reserved
DO NOT REPRODUCE WITHOUT WRITTEN PERMISSION BY AUTHOR.