It had been a great pleasure working with the team at Eclipse whilst delivering this project.
The project aim was to create an Amazon Web Services based, fully automated infrastructure platform to host SAP Commerce e-commerce websites. The platform must be constructed in a way that it can be used on other cloud service providers with little effort later on.
Developers and testers needed the ability to get code and features through testing pipelines much quicker. In addition, production websites need to handle peak demand seamlessly.
As well as delivering a platform, existing teams were to be assisted with picking up new tools, technologies and concepts to enable on-going support of the platform.
The final solution comprised of many components. These components are outlined below.
Infrastructure as Code
Writing infrastructure as code was key to this solution. This is what enabled infrastructure to be provisioned in a reliable and repeatable way at the click of a button. By taking advantage of Terraform module sources it was possible to define a collection of infrastructure objects (such as subnets, route tables, gateways) in a single place but allow variables (such as name, CIDR ranges) to be passed in depending on the environment being built. As a result all infrastructure met defined standards, human error is vastly reduced and development/production parity was achieved.
By taking advantage of Amazon Web Services availability zones and infrastructure as code, all production environments were highly available and could withstand the loss of an Amazon data centre without any downtime.
To allow software (SAP Commerce in particular) to run in a dynamically scaled environment there were a few challenges to overcome. There needed to be a way to start SAP Commerce very quickly in ‘scale-up’ situations additionally the state of any running SAP Commerce instances had to be externalised in case of scale-down.
Docker was chosen to containerise software. Containerisation enabled the application and all dependencies, configurations, etc to be packed into an image that can be started very quickly. By using SAP Commerce ‘aspects’ a single Docker image can be capable of running in multiple environments in multiple modes. A single image can be promoted all the way through the testing pipeline just by using tags.
To enable service-level auto scaling, auto healing, multi-tenant clustering and service health checks, Kubernetes was implemented. This allowed the platform to meet the scaling requirements. The healing features of Kubernetes allowed the platform to be more resilient to virtual machine failure or network outages resulting in a higher service availability.
Technologies & Tools
The initial cloud provider chosen was Amazon. Amazon Web Services (AWS) is a mature cloud service with endless offerings. It is effortless to build highly-available and highly-performant infrastructure stacks. Most tools (such as Terraform and Kops below) offer excellent support for AWS. By using the many AWS services available such as IAM and availability zones it was possible to create secure and resilient infrastructure.
Terraform was chosen to build foundation and networking infrastructure. Terraform has excellent Amazon Web Services support and code can easily be ported to work with other cloud providers too - including OpenStack for managing resources on-premises. Terraform made it very easy to meet the project requirements to stay cloud agnostic and to fully automate infrastructure.
Containerisation was chosen to help simplify development. Rather than pushing a codebase that may have a complete different set of steps to deploy depending on target environment, containerisation makes that a single image that may take the target environment as a parameter. Additionally, all dependencies and libraries required to run the application are packed into the container, meaning that the same container can be ran locally, on tin or in the cloud with minimal effort. Docker was chosen specifically due to it being well proven and mature.
Kubernetes is fast becoming an industry standard for container orchestration. It can also run on any cloud provider or even on-premises on tin. Native support for service-level auto-scaling and ‘cluster autoscaler’ add-on for the scaling of underlying virtual machines allowed scaling requirements to be met. The many different types of services (deployments, statefulsets, daemonsets), specifications (disruption budgets, affinities) and probes (liveness, readiness) made it possible to build a platform that is resilient against hypervisor, network or even data centre failures.
The result of this project allows Eclipse to offer their customers an improved hosting service. Development environments can now be provisioned rapidly and easily decommissioned when not in use to match project demands. Automatic scaling allows customers websites to seamlessly handle high load during sales and events whilst running economically during quiet periods. The platform allows for zero-downtime code deployments and platform updates. High availability is achieved by always running across multiple availability zones (data centres) in addition to having automatic health checks and repairs.