UI UX
Our team's updates increased SLI conversion for new Enterprise users by 17% in the five months after release, compared to the previous time period.
Honeycomb's software is property of Hound Technology, Inc. This case study is not sponsored by Honeycomb.
Honeycomb's definitive stance on Service Level Objectives (SLOs)arrow_outward is they're an essential complement to traditional alerts; a keystone of great observability with the power to align software companies around the business impact of their application's UX.
With an SLO, stakeholders come together to define a critical user journey (i.e. a retail application might choose a 'checkout' flow), identify the associated services that journey touches, and agree upon the target reliability for those services. For example: "Over a period of 28 days, 98% of users can purchase an item without errors." For this SLO, the error budget, or tolerance for failures, is 2%. When the budget burns at a rate that would deplete it before the time period, teams are notified with Burn Rate Alerts. After being notified, SREs are empowered to troubleshoot the problem, reallocate resources, halt deployments, and/or begin a discussion about service reliability expectations.
Honeycomb's SLO list view
Once configured, customers really see the value of SLOs. Our research revealed that getting to that value was the issue. During discovery, we gathered feedback from users as well as Customer Success and Sales Teams, collected usage data from Honeycomb and Amplitude, and observed user behavior in Fullstory sessions. We learned that users found SLOs difficult to configure, which meant teams were missing out on their magic. Armed with boatloads of qualitative and quantitative insights, we synthesized and grouped the friction by theme:
Journey mapping exercise to align the team around what the current SLO creation process looked like and where pain points resided
A Service Level Indicator (SLI) is a foundational element of an SLO; the mechanism by which success or failure is evaluated. Expressed as a formula, an SLI in Honeycomb looks like: Successful events ÷ Total (qualified) events × 100.
Our discovery showed that SLIs were a major deterrent for users to get started with SLOs, and following an impact/effort exercise to assess each friction theme, we felt confident that making them easier to understand, create, and manage would move the needle on SLO adoption.
Our ideation centered around the classic UX principle, "don't make me think" (coined by Steve Krugarrow_outward). While this tenet applies to all of our work, it was especially important for SLIs given their complicated nature and current state in the product. We sought to reduce the amount of overhead required to create an SLI, which we hypothesized would result in increased SLO conversion.
Over the course of H2 2024, our team conceived and shipped several MVPs to General Availability that directly addressed SLI friction, the results of which improved SLI creation and SLO adoption:
To alleviate the pressure of creating an SLI from scratch, we introduced templates for the most common indicators.
Using the new Build Query mode, users can create their SLI using the Query Builder language and interface they're already familiar with.
We added a "Format" button that automatically cleans up an expression by adding line breaks and block indentation.
We brought SLI creation and edit capabilities to a modal window that sits above the SLO creation page, so that users never have to leave their flow.
We saw some fascinating results over the next several months:
What might Honeycomb SLOs look like in the future? The team continues to think about how we can make SLO creation simpler and recognizes opportunities for additional enhancements:
Features and images are conceptual and exploratory, for demonstration purposes only.There may be value for users to create an SLO by simply summarizing what they want.
SLO history to gather context about error budget resets, SLI updates, and target tweaks.
Critical User Journey and SLO creation from the Service Map, where users gather context about service health and interdependencies.