Redundancy in cloud architecture ensures that any individual failure has a fallback within the architecture. That means in the event of a disturbance to IT operations, business can continue as normal. To make sure that they’re covered, businesses should be sure to look at four key areas: hardware, processes, network, and geography.
There’s a multitude of occurrences that can threaten a business’s IT operations. These include natural disasters, power outages, sabotage, and just plain old human error. The results of these types of catastrophes often mean periods where high availability is nearly impossible or worse, like long-lasting outages or permanent data loss. It’s vital to have redundancy built into your network architecture to protect against these outcomes. Since some failure at all levels of a system is inevitable, the task is to design the architecture so that any individual failure has a fallback within the architecture and can be detected and mitigated without human intervention.
As a metaphor for redundancy, we don’t need to look any further than nature. Ants, for instance, are particularly adept at ensuring redundancy. Ants create paths along branches and leaves that connect their nests with various food sources. With the pheromones secreted by other members of the colony as guideposts, ants follow the same path over and over again—until there’s an obstacle in the way, like say, a hungry lizard. Without missing a beat, the ants re-route their trail along a different network of branches and leaves to carry food to their nest.
Making sure that a business stays up-and-running and serving customers means you need to plan and act like the ants. One of the benefits of cloud computing is that the Cloud helps facilitate this type of readiness which typically focuses on four levels of redundancy.
Hardware-Level Redundancy Typically, when we speak about hardware redundancy, we’re talking about fault tolerance of the machinery that’s running the software on which the business is reliant. That software can include contact center applications, collaboration tools, or databases of customers. In this age where cloud computing is not a new thing and is ubiquitous with conducting modern business, somewhere, somehow hardware is involved.
How you incorporate hardware redundancy into your network architecture is primarily driven by cost. Either you co-locate your own devices with a service provider or pay one of the big boys, like AWS or Google to design, build, and maintain your environment. Frequently the approach is a mix of both based upon the cloud providers expertise and the needs of the business. Either way, both solutions come with an assurance of uptime. Whether your service provider manages the hardware themselves or is using one of the big Platform-as-a-Service (PaaS) providers, you need to think about process redundancy and how processes use and share resources.
Process Redundancy Processes within a digital architecture need to be available for a business to run. Obviously, some processes are more critical to the operation of a business than others. To understand this, it’s usually best to map out each of the processes within a company and determine which require the highest availability and which are less critical to the business. While it’s tempting to think that all processes should have the highest availability, this means providing a fully redundant solution for every process node in the architecture. This can quickly add to significant costs. Businesses need to assess the importance of each service to the company and then determine the appropriate level of service reliability.
Network Redundancy One can easily compare network redundancy to the example with the ants. Are there multiple routes to the internet? If one carrier becomes unavailable for some reason, can another carrier pick up the load and handle the traffic? A business should understand the service level agreements from a provider, particularly as they pertain to bandwidth, redundancy, and availability. It’s important to note that sometimes using different providers doesn’t mean you’re getting network redundancy, In fact, some providers may only deliver the last mile of connectivity and rely on the same backbone as every other carrier.
Geographic Redundancy Geographic redundancy replicates data between two (or more) physically disparate locations and is essential for working in the Cloud. Ideally, the locations of the data centers are separated by enough space that a natural disaster or other event affecting one place will not also affect the other.
In particular, data centers should be highly available, and network traffic should be split amongst locations for geo-redundancy. You should talk to your service providers about what is required to continue service in the case of the failure of a data center either due to power, network, or some physical event. The more automation, the better, but fully automatic systems have a higher cost, so it shouldn’t surprise you to learn that there are manual steps involved too. The important thing is to make sure there’s a redundancy plan in place to keep your IT operations running smoothly 24/7.