Categories
Site Reliability Engineering

Establishing SRE Foundations: A Step-by-Step Guide to Introducing Site Reliability Engineering in Software Delivery Organizations

Improve Your Service Scalability and Reliability with SRE “The techniques and principles of SRE are not only clearly defined here, but also the rationale behind them is explained in a way that will stick. This is not some dry definition, this is practical, usable understanding. . . . I can whole-heartedly recommend this book without […]

Categories
Site Reliability Engineering

Site Reliability Engineer (SRE) Roles and Responsibilities

This post defines the roles and responsibilities of a site reliability engineer and shows how SRE can improve the resilience of your people, processes, and technology. Software development is getting faster and more complex – frustrating IT operations teams more than ever. So, DevOps gained popularity in order to combat siloed workflows, decreased collaboration, and […]

Categories
Agile Development Site Reliability Engineering

Differences Between Site Reliability Engineer vs. Software Engineer vs. Cloud Engineer vs. DevOps Engineer

The article compares how the roles of software engineers, DevOps engineers, site reliability engineers, and cloud engineers are different from each other. The evolution of software engineering over the last decade has led to the emergence of numerous job roles. So, how different is a software engineer, DevOps engineer, site reliability engineer, and cloud engineer […]

Categories
Site Reliability Engineering

Developing a data driven tool to estimate the cost of incidents

Data Driveness is one of our core values at HelloFresh. We are proud of taking decisions based on strong evidence and not on gut feelings. This includes of course how we assess the impact on incidents affecting our services. But what is considered an incident in HelloFresh? An incident is considered everything having an impact […]

Categories
Site Reliability Engineering

When to Alert on What?

We have all been there when we go into a new company or team, looking at their stack and especially alerting, and something feels a bit … off? This is an honest attempt to categorize alerting into some bands or levels, to make it easier to reason about where things are and have a glimpse […]

Categories
Site Reliability Engineering

A New Definition of Reliability

As organizations build up their own reliability practices, a good definition of reliability itself is the one that leads them to impactful priorities and consistent results. It’s no surprise that organizations with software products are prioritizing reliability as feature #1. In the software space, when we talk about “reliability” we’re referring to site reliability engineering. Google invented […]



Earn Free Bitcoin
%d bloggers like this: