Since many Internet applications employ a multi-tier architecture, in this paper, we focus on the problem of analytically modeling the behavior of such applications. We present a model based on a network of queues, where the queues represent different tiers of the application. Our model is sufficiently general to capture (i) the behavior of tiers with significantly different performance characteristics and (ii) application idiosyncrasies such as session-based workloads, concurrency limits, and caching at intermediate tiers. We validate our model using real multi-tier applications running on a Linux server cluster. Our experiments indicate that our model faithfully captures the performance of these applications for a number of workloads and configurations. For a variety of scenarios, including those with caching at one of the application tiers, the average response times predicted by our model were within the 95% confidence intervals of the observed average response times. Our experiments also demonstrate the utility of the model for dynamic capacity provisioning, performance prediction, bottleneck identification, and session policing. In one scenario, where the request arrival rate increased from less than 1500 to nearly 4200 requests/min, a dynamic provisioning technique employing our model was able to maintain response time targets by increasing the capacity of two of the application tiers by factors of 2 and 3.5, respectively.
C.4 [Performance of Systems]: Modeling Techniques
Measurement, Performance, Experimentation
Queuing model, MVA algorithm, Internet application
Internet applications such as online news, retail, and financial sites have become commonplace in recent years. Modern Internet applications are complex software systems that employ a multi-tier architecture and are replicated or distributed on a cluster of servers. Each tier provides a certain functionality to its preceding tier and makes use of the functionality provided by its successor to carry out its part of the overall request processing. For instance, a typical e-commerce application consists of three tiers-a front-end Web tier that is responsible for HTTP processing, a middle tier Java enterprise server that implements core application functionality, and a backend database that stores product catalogs and user orders. In this example, incoming requests undergo HTTP processing, processing by Java application server, and trigger queries or transactions at the database.
This paper focuses on analytically modeling the behavior of multitier Internet applications. Such a model is important for the following reasons: (i) capacity provisioning, which enables a server farm to determine how much capacity to allocate to an application in order for it to service its peak workload; (ii) performance prediction, which enables the response time of the application to be determined for a given workload and a given hardware and software configuration, (iii) application configuration, which enables various configuration parameters of the application to be determined for a certain performance goal, (iv) bottleneck identification and tuning, which enables system bottlenecks to be identified for purposes of tuning, and (v) request policing, which enables the application to turn away excess requests during transient overloads.
Modeling of single-tier applications such as vanilla Web servers (e.g., Apache) is well studied [4, 12, 17]. In contrast, modeling of multi-tier applications is less well studied, even though this flexible architecture is widely used for constructing Internet applications and services. Extending single-tier models to multi-tier scenarios is non-trivial due to the following reasons. First, various application tiers such as Web, Java, and database servers have vastly different performance characteristics and collectively modeling their behavior is a difficult task. Further, in a multi-tier application, (i) there may be concurrency limits at one or more tiers, and (iii) caching may be employed at intermediate tiers-all of which complicate the performance modeling. Finally, modern Internet workloads are session-based, where each session comprises a sequence of requests with think-times in between. For instance, a session at an online retailer comprises the sequence of user requests to browse the product catalog and to make a purchase. Sessions are stateful from the perspective of the application, an aspect that must be incorporated into the model. The design of an analytical model that can capture the impact of these factors is the focus of this paper.
This paper presents a model of a multi-tier Internet application based on a network of queues, where the queues represent different tiers of the application. Our model can handle applications with an arbitrary number of tiers and those with significantly different performance characteristics. A key contribution of our work is that the complex task of modeling a multi-tier application is reduced to the modeling of request processing at individual tiers and the flow of requests across tiers. Our model is inherently designed to handle session-based workloads and can account for application idiosyncrasies such as caching effects and concurrency limits at each tier.
We validate the model using two open-source multi-tier applications running on a Linux-based server cluster. We demonstrate the ability of our model to accurately capture the effects of a number of commonly used techniques such as query caching at the database tier and class-based service differentiation. For a variety of scenarios, including an online auction application employing query caching at its database tier, the average response times predicted by our model were within the 95% confidence intervals of the observed average response times. We conduct a detailed experimental study using our prototype to demonstrate the utility of our model for the purposes of dynamic provisioning, response time prediction, application configuration, and request policing. Our experiments demonstrate the ability of our model to correctly identify bottlenecks in the system and the shifting of bottlenecks due to variations in the Internet workload. In one scenario, where the arrival rate to an application increased from 1500 to nearly 4200 requests/min, our model was able to continue meeting response time targets by successfully identifying the two bottleneck tiers and increasing their capacity by factors of 2 and 3.5, respectively.
The remainder of this paper is structured as follows. Section 2 provides an overview of multi-tier applications and related work. We describe our model in Sections 3 and 4. Sections 6 and 7 present experimental validation of the model and an illustration of its applications respectively. Finally, Section 8 presents our conclusions.
This section provides an overview of multi-tier applications and the underlying server platform assumed in our work. We also discuss related work in the area.
Modern Internet applications are designed using multiple tiers (the terms Internet application and service are used interchangeably in this paper). A multi-tier architecture provides a flexible, modular approach for designing such applications. Each application tier provides certain functionality to its preceding tier and uses the functionality provided by its successor to carry out its part of the overall request processing. The various tiers participate in the processing of each incoming request during its lifetime in the system. Depending on the processing demand, a tier may be replicated using clustering techniques. In such an event, a dispatcher is used at each replicated tier to distribute requests among the replicas for the purpose of load balancing. Figure 1 depicts a three-tier application where the first two tiers are replicated, while the third one is not. Such an architecture is commonly employed by e-commerce applications where a clustered Web server and a clustered Java application server constitute the first two tiers, and the third tier consists of a non-replicable database.1
The workload of an Internet application is assumed to be session based, where a session consists of a succession of requests issued by a client with think times in between. If a session is stateful, which is often the case, successive requests will need to be serviced by the same server at each tier, and the dispatcher will need account for this server state when redirecting requests.
As shown in Figure 1, each application employs a sentry that polices incoming sessions to an application's server pool-incoming sessions are subjected to admission control at the sentry to ensure that the contracted performance guarantees are met; excess sessions are turned away during overloads.
We assume that Internet applications typically run on a server cluster that is commonly referred to as a data center. In this work, we assume that each tier of an application (or each replica of a tier) runs on a separate server. This is referred to as dedicated hosting, where each application runs on a subset of the servers and a server is allocated to at most one application tier at any given time. Unlike shared hosting where multiple small applications share each server, dedicated hosting is used for running large clustered applications where server sharing is infeasible due to the workload demand imposed on each individual application.
Given an Internet application, we assume that it specifies its desired performance requirement in the form of a service-level agreement (SLA). The SLA assumed in this work is a bound on the average response time that is acceptable to the application. For instance, the application SLA may specify that the average response time should not exceed one second regardless of the workload.
Dr. Bhuvan Urgaonkar, PhD has over 15 years of experience in the field of Software Engineering and Computers. His work includes research in computer systems software, distributed computing (including systems such as Zookeeper, Redis, Memcached, Cassandra, Kafka), datacenters, cloud computing, storage systems, energy efficiency of computers and datacenters, big data (including systems such as Hadoop, Spark). He serves as an expert / technical consultant with multiple firms helping them (i) understand technical content related to state of the art products in areas such as content distribution, distributed computing, datacenter design, among others and (ii) interpret patents in these areas and connections between them and state of the art products and services. Services are available to law firms, government agencies, schools, firms / corporations, and hospitals.
©Copyright - All Rights Reserved
DO NOT REPRODUCE WITHOUT WRITTEN PERMISSION BY AUTHOR.