Dynamic capacity provisioning is a useful technique for handling the multi-time-scale variations seen in Internet workloads. In this article, we propose a novel dynamic provisioning technique for multi-tier Internet applications that employs (1) a flexible queuing model to determine how much of the resources to allocate to each tier of the application, and (2) a combination of predictive and reactive methods that determine when to provision these resources, both at large and small time scales. We propose a novel data center architecture based on virtual machine monitors to reduce provisioning overheads. Our experiments on a forty-machine Xen/Linux-based hosting platform demonstrate the responsiveness of our technique in handling dynamic workloads. In one scenario where a flash crowd caused the workload of a three-tier application to double, our technique was able to double the application capacity within five minutes, thus maintaining responsetime targets. Our technique also reduced the overhead of switching servers across applications from several minutes to less than a second, while meeting the performance targets of residual sessions.
An Internet hosting platform is a server farm that runs a distributed application such as an online retail store or an online brokerage site. Typical Internet applications employ a multi-tier architecture, with each tier providing a certain functionality. Such applications tend to see dynamically varying workloads that contain long-term variations such as time-of-day effects as well as short-term fluctuations due to flash crowds. Predicting the peak workload of an Internet application, and capacity provisioning based on these worst case estimates, is notoriously difficult. There are numerous documented examples of Internet applications that faced an outage due to an unexpected overload. For instance, the normally well-provisioned Amazon.com site suffered a forty-minute downtime due to an overload during the popular holiday season in November 2000 [Amazon 2000].
Given the difficulties in predicting peak Internet workloads, an application needs to employ a combination of dynamic provisioning and request policing to handle workload variations. Dynamic provisioning enables additional resources-such as servers-to be allocated to an application on-the-fly, to handle workload increases, while policing enables the application to temporarily turn away excess requests while additional resources are being provisioned.
In this article, we focus on dynamic resource provisioning of Internet applications that employ a multi-tier architecture. We argue that (1) provisioning of multi-tier applications raises new challenges not addressed by prior work on provisioning single-tier applications, and (2) agile, proactive provisioning techniques are necessary to handle both long-term and short-term workload fluctuations seen by Internet applications. To address these issues, we present a novel provisioning technique based on a combination of predictive and reactive mechanisms.
Dynamic provisioning of resources-allocation and deallocation of servers to replicated applications-has been studied in the context of single-tier applications, of which clustered HTTP servers are the most common example. The notion of hot spares and their allocation to cluster-based applications on-demand was first proposed by Fox et al. . The Muse project proposed a utility-based approach based on an economic theory for allocation and deallocation of servers to clustered Web servers [Chase and Doyle 2001]. A model-based approach for resource provisioning in single-tier Web servers was proposed by Doyle et al. . Kamra et al.  model a multitier e-commerce application as a single M/GI/1 server and present a PI-controller-based admission control for maintaining response time targets. Whereas multi-tier Internet applications have been studied in the context of SEDA [Welsh and Culler 2003; Welsh et al. 2001], the effort focused on admission control issues to maintain target response times and did not explicitly consider provisioning issues.
It is nontrivial to extend provisioning mechanisms designed for single-tier applications to multi-tier scenarios. To understand why, we consider two strawman approaches that are simple extensions of the above single-tier methods and demonstrate their limitations for multi-tier applications.
Since many single-tier provisioning mechanisms have already been proposed, a straightforward extension is to employ such an approach at each tier of the application. This enables provisioning decisions to be made independently at each tier, based on local observations. Thus, our first strawman is to provision additional servers at a tier when the incoming request rate at that tier exceeds the currently provisioned capacity; this can be inferred by monitoring queue lengths, tier-specific response times, or request drop rates. We refer to this approach as independent per-tier provisioning.
Example 1. Consider the three-tier Internet application depicted in Figure 1(a). Initially, let us assume that one server each is allocated to the three tiers, and this enables the application to service 15, 10, and 10.5 requests/second at each tier (since a user request may impose different demands at different tiers, the provisioned capacity at each tier may be different). Let the incoming request rate be 14 requests/second. Given these capacities, all requests are let in through the first tier, and 4 requests/second are dropped at the second tier. Due to these drops, the third tier sees a reduced request rate of 10 requests/second and is able to service them all. The effective good-put is therefore 10 requests/second. Since request drops are only seen at the second tier, this tier is perceived to be the bottleneck. The provisioning algorithm at that tier will allocate an additional server, doubling its effective capacity to 20 requests/second. At this point, the first two tiers are able to service all incoming requests and the third tier now sees a request rate of 14 requests/second (see Figure 1(b)). Since its capacity is only 10.5 requests/second, it drops 3.5 requests/second. Thus, the bottleneck shifts to the third tier, and the effective good-put only increases from 10 to 10.5 requests/second.
This simple example demonstrates that increasing the number of servers allocated to the bottleneck tier does not necessarily increase the effective good-put of the application. Instead, it may merely shift the bottleneck to a downstream tier. Although the provisioning mechanism at this downstream tier will subsequently increase its capacity, such shifting bottlenecks may require a number of independent provisioning steps at various tiers before the effective application capacity is actually increased. In the worst case, upto k provisioning steps, one at each tier, may be necessary in a k-tier application. Since allocation of servers to a tier entails overheads of several minutes or more [Chase and Doyle 2001], and since Internet workloads may spike suddenly, independent per-tier provisioning may be simply too slow to effectively respond to such workload dynamics.
Our second strawman models the multi-tier application as a black box and allocates additional servers whenever the observed response time exceeds a threshold.
Example 2. Consider the three-tier application from Example 1 with tierspecific capacities of 15, 20, and 10.5 requests/second as depicted in Figure 1(b). We ignore admission control issues in this example. Since the incoming request rate is 14 requests/second, the first two tiers are able to serve all requests, while the third saturates, causing request queues to buildup at this tier. This queue buildup increases the end-to-end response time of the application beyond the threshold. Thus, as in the single-tier case, a black box approach can successfully detect when additional servers need to be provisioned for the multi-tier application.
However, determining how many servers to provision, and where, is far more complex for multi-tier applications. First, since the application is treated as a black box, the provisioning mechanism can only detect an increase in end-to-end response times but cannot determine which tier is responsible for this increase. Second, for single-tier applications, an application model is used to determine how many servers are necessary to service all incoming requests with a certain response time threshold [Doyle et al. 2003]. Extending such models to multi-tier applications is nontrivial, since each tier has different characteristics. In a typical e-commerce application, for instance, this implies collectively modeling the effects of HTTP servers, Java application servers, and database servers-a complex task. Third, not all tiers of the application may be replicable. For instance, the database tier is typically difficult to replicate on-the-fly. In the previous example, if the third tier is a nonreplicable database, the black box approach, which has no knowledge of individual tiers, will incorrectly signal the need to provision additional servers, when the correct action is to trigger request policing and let no more than 10.5 requests/second into the "black box."
This example demonstrates that due to the very nature of multi-tier applications, it is not possible to treat them as a black box for provisioning purposes. Knowledge of the number of tiers, their current capacities, and constraints on the degree of replication at each tier is essential for making proper provisioning decisions.
Both examples expose the limitations of using variants of single-tier provisioning methods for multi-tier applications. This article presents a multi-tier provisioning technique that overcomes these limitations.
Dr. Bhuvan Urgaonkar, PhD has over 15 years of experience in the field of Software Engineering and Computers. His work includes research in computer systems software, distributed computing (including systems such as Zookeeper, Redis, Memcached, Cassandra, Kafka), datacenters, cloud computing, storage systems, energy efficiency of computers and datacenters, big data (including systems such as Hadoop, Spark). He serves as an expert / technical consultant with multiple firms helping them (i) understand technical content related to state of the art products in areas such as content distribution, distributed computing, datacenter design, among others and (ii) interpret patents in these areas and connections between them and state of the art products and services. Services are available to law firms, government agencies, schools, firms / corporations, and hospitals.
©Copyright - All Rights Reserved
DO NOT REPRODUCE WITHOUT WRITTEN PERMISSION BY AUTHOR.