SRE may be hiding in DevOps' shadow but it’s equally important for digitally capable organizations
By BreakFree Solutions Staff
Most modern companies that utilize technology to run their business know how important it is to implement DevOps, which seeks to optimize management domains for digital product development. It focuses on configuration, release, and deployment to get things into production fast.
But before DevOps was a household name, it was being implemented by large scale tech companies like Google. With the increased velocity of deployments, those technologists realized they needed an equivalent improvement in the category of operations management if they wanted to keep up the pace and quality. DevOps left room for another system to operationalize quality management, incident reporting, and other important domains for operating a large-scale value-driven system.
That’s where SRE comes in. SRE (site reliability engineering) is not an opposite, antidote, or rival of DevOps. You could think of it instead as DevOps' little brother. It doesn’t get all the same attention and glory, but it’s just as valued in the family unit!
SRE aims to answer the question its older brother DevOps left for to be determined: What can be changed about digital operations to operationalize for scale, speed, and all the qualities of a digitally capable organization?
In all seriousness, SRE is simply a sub-category of DevOps, a value-based practice that combines aspects of software engineering and systems administration. It aims to provide reliable, scalable, and highly available systems by applying best practices from the software engineering field to the challenges faced by system administrators.
Google gets the credit for defining and popularizing SRE as we know it today. It was developed in the early 2000s to manage the search engine's enormous systems and has been evolving ever since.
SRE teams are responsible for building and maintaining the systems that power Google's businesses. These teams work to prevent outages, ensure fast response times, and recover quickly from incidents—as well as improve the service overall.
The Central SRE Challenge
Because Google wrote the book on SRE, most principles of the approach are taught with Google in mind. But other companies who need to employ SRE, however large, successful, and technologically advanced they may be, are not Google.
Like DevOps, SRE is not a one-size-fits-all approach. It’s the opposite actually, with high variability, making it incredibly important to show discernment when deciding where to apply SRE. You must approach SRE with that in mind, devoting time to thoughtfully determine where it makes sense to apply SRE and where it doesn't.
When we operationalize SRE during client engagements, we take the principles and ideas that Google championed, along with our own experiences and new findings, and pragmatically apply this combination of principles to a given situation. To do this we do deep current state discovery, using tools like value stream mapping to determine what SRE principles would give an organization the most impact on their specific goals.
To learn more about SRE and how it can be applied in your organization, contact us today. We’d be happy to chat with you about your specific needs and see if SRE is a good fit for you.
Comments