For showcasing the utility of the CloudScale Environment tools and the CloudScale Method, we decided to create a fictitious company with an application that we will drive from a mock-up state to a fully scalable and cost-effective version using the CloudScale project results as tools to achieve this goal easily and efficiently. To that end we took the existing TPC-W definition of an electronic commerce composed of a detailed requirements definitions as well as benchmarking metrics, and which has a legacy monolithic implementation. The progress of the implementation will be divided into three incremental phases. Phase 0 is the legacy system, Phase 1 a modernized version using current technologies for scalability, Phase 2 will provide with a fully and elastically scalable implementation, and the Phase 3 will be focused in the reduction of the operational costs of the entire system. For a full description of the requirements evolution see the Scalability Storyline.
This Phases will rely on the CloudScale tools for monitoring the application, analysing scalability bottlenecks, assist in the creation of scalable architectures, and project the behaviour, resource consumption and operational costs of the application under different architectural scenarios and loads.
For working versions see Results.
In the initial phase (Phase 0) we start with a legacy implementation (year 1999) of the TPC-W requirements that provides a fully functional and working example, but contains many architectural and scalability issues. In the next phase (Phase 1) we will fix the legacy implementation and modernize it introducing frameworks that are nowadays often used in development of web based systems. The deployment, however, will still be based on the local infrastructure and will be very similar to the deployment used for the legacy system. In Phase 2 we will start improving scalability by deploying the system on the private and public cloud providers (IaaS and PaaS), using cloud services and scalable/performance techniques. The cloud migration will be supported by the CloudScale tools, by optimizing the deployment in the cloud, and spotting potential scalability issues that could exist in the current system but have not been noticed due to limited deployment. In the last phase the focus will be on optimizing the cost of the system under various workloads with use of cloud provider services: database and storage services, application services, content delivery services, etc. The process of cost optimization will be initially done using the tools cost predictions (Analyser) and later the best alternatives will be deployed and its real-cost (based on the test-run) will be compared.
Phase 0 (Legacy)
The initial version of TPC-W application has been developed by theUniversity of Wisconsin. The TPC-W implementation is based on Java Servlets Framework with support for MySQL databases. The implementation is fully consistent with the official TPC-W specification version 1.0.1, except that it does not provide a PGE. It contains a basic implementation of RBEs and generation scripts for data and images based on specification requirements.
The implementation is dated back to 2000 and was oriented only to fulfil the specification requirements, while neglecting security, scalability and architectural issues that would arise in a real world environment. The application is designed with no separation of presentation and data layers, which turns out in hardly manageable code where HTML views, SQL queries and business logic are combined in the same components. The user inputs (usernames, passwords, addresses, …) are not validated, which leaves the system vulnerable to SQL injection attacks and the connections to the database are handled internally which create different scalability issues in the connection pool, from application not being stateless, to limiting the requests on the server side, while not taking into the account the current state of the database cluster.
Because of all this issues, the implementation is very appropriate for testing basic features of the CloudScale tool-chain and making improvements based on the gained knowledge. The output of this action will be an improved version that would perform better, without making radical changes to the core. In the next phase we will concentrate on modernizing the application with new technologies and programming methodologies.
Issues: Scalability, Security, Modularization, Persistence layers
Legacy: Very old implementation of TPC-W, which contains many architectural and scalability issues. This version is very appropriate to test the tools capability to spot large range of problems.
Improved: Based on the analysis, we will correct the main scalability issues in the code. The architectural design will be addressed in the modernized version.
Phase 1 (Modernization)
In the phase 1 we will completely rewrite the core parts of the legacy implementation of TPC-W to modernize the system with the use of new technology and newer approach to developing web based applications. The aim of this phase is to design the application in a way that will be possible to migrate it to cloud systems in later phases.
The core framework to be used is Spring Framework, which provides a comprehensive programming and configuration model for modern Java-based enterprise applications. The architecture will be based on Model-View-Controller (MVC) pattern, which efficiently separates business logic (Controller) from representational (View) and data (Model) layer. For accessing data in relational databases, Hibernate will be integrated, which is an object-relational mapping (ORM) library for Java language that will provide a framework for mapping an object-oriented domain model to a traditional relational database. For the purposes of testing the system also with NoSQL databases, data access objects (DAO) will be implemented that will enable transparent use of SQL or NoSQL database. The approach to support both is usually not recommended, especially if the domain is not fixed. However in our case the core business logic will not change, therefore we do not expect greater difficulties to maintain both versions of DAOs.
The system will be deployed on the internal infrastructure, similar to the legacy version. We will then measure the system performance, deployed on different deployment configurations (ie. different databases and application servers). However, since the scaling capabilities come in the next phase, we will not cluster the servers and databases in this phase.
SQL: Version based on SQL databases. This version will support large range of available SQL databases; the tests will be made on MySQL and PostgreSQL.
NoSQL: Version supporting document-based NoSQL databases, in particular MongoDB.
Milestones: We plan to complete this phase in year 1. All additional improvements will be carried out during following two years.
Phase 2 (Migration)
In the phase 2 the system will be migrated to different cloud environments. The application will be configured in such way, that it will be possible to deploy it on IaaS as well as on the PaaS infrastructure. The initial deployment on the IaaS will be based purely on the computing nodes of the infrastructure layer, without using any external services that are provided by the cloud provider, while the second version will be deployed on the PaaS.
The aim of this phase is to produce versions with which the CloudScale tool’s ability to analyse scalable systems will be validated. The IaaS version will be deployed on auto-scalable nodes, testing the CloudScale tools’ ability to adapt the system elastically, while the PaaS version is intended to validate the CloudScale tools’ ability to consider black-box runtime containers, for which only typical behaviour is known.
Public IaaS: This version will be deployed on the AWS/EC2 IaaS cloud.
Private IaaS: This version will be deployed on the private OpenStack cloud.
Public PaaS: This version will be deployed on the AWS BeanStalk PaaS service.
Milestones: We plan to complete this phase in year 2.
Phase 3 (Optimization)
After the Phase 2, the showcases will ran in the public/private cloud: IaaS and PaaS, while not considering the cost of the system. The deployment shall be scalable and being able to serve high loads, however the resources needed to accomplish this might be high; the IaaS solution will only use computing nodes (VMs) and will not integrate any of the available services in the cloud that could on the long run make the TOC lower.
The aim in the last phase is, with help of the CloudScale tools, to lower the costs while keeping the quality of services unaffected. The CloudScale Environment will need to suggest the appropriate improvements, regarding the usage prediction taking into account the services that are provided by public/private clouds.
Public IaaS: The focus will be to lower the cost of the system running in the AWS cloud using provided services for storage (i.e. AWS S3), database (i.e. DynamoDB) and load-balancing (i.e. AWS Elastic Load Balancer).
Public PaaS: This version will be deployed on AWS BeanStalk.
Private IaaS: The hybrid cloud will be prepared combining computing nodes deployed on private cloud and cloud-based services available in public clouds (AWS).
Milestones: We plan to complete this phase in year 3.