Cloud overspend: five common-sense approaches

In part one of our blog series on overspending in the cloud, we explored the genesis of cloud overspend and delved into why it exists. In this article, we will highlight four approaches you can take with your engineering and IT teams to bring your budgets in line and make money in the cloud.

Engineers love to improve their efficiency. If you help them understand the rubric by which they should measure efficiency, and provide continuous visibility with a goal to shoot for, they will usually do the right thing.

Habit 1: Develop stable unit-cost metrics

When handling cloud costs in various parts of your team, it is easy to get spooked by dollar amounts. For example, imagine your e-commerce tech team, responsible for shopping carts and credit card payments in your platform, seeing an operational cost increase of $100,000 for November, up from a normal monthly cost of $90,000. This creates a highly visible incident involving the finance team, but you later find out that this is just because you had additional traffic on Black Friday and cleared sales that far eclipsed this additional operational cost.

From this example, it is clear that the growth of operational costs without business context can easily be misinterpreted. Additionally, it is often difficult for business values to be modeled in evaluating operational costs and what they mean.

We can solve this by introducing a stable metric that bakes context with operational costs: a unit cost. Then, you can use these unit costs as an indicator of spending health rather than focusing on the pure dollar amounts of spend generated by each project.

In our example above, consider a unit cost for the shopping cart service. Suppose the service costs $40,000 per month and typically manages 3 million carts. Dividing these, the unit cost is $0.013 per cart.

For the payments service, suppose the service costs $30,000 per month and typically processes 2 million card transactions a month. Dividing these, the unit cost is $0.015 per credit card transaction.

Developing unit costs in this way and monitoring them religiously gives you a wide variety of long-term benefits:

Cost optimization goals defined in terms of unit costs are much more specific and clear than goals to cut overall costs by certain dollar amounts.
Seasonal increases in business activity will typically leave unit costs stable as the numerator and denominator scale proportionally.
Engineers gain valuable business visibility or context they most likely never had before and can factor business context into technical decisions.

Habit 2: Monitor day-by-day cost movements for your stack

A great habit to start with your team is to use your daily scrum meeting to review the daily costs for the stack and how those have moved since the previous day. If they have noticeably moved, spend five minutes with everyone in attendance to identify the reasons for the growth. Every day, you should have a consistent dashboard visible to all that helps identify deltas and drive conversation around them.

The goal is not necessarily to identify spending growth and jump to action. New business ventures, seasonality, architectures, and simple human errors can generate considerable amounts of spend. Sometimes, all your team will accomplish is identify the source of the spending, why it occurred, and accept it as a worthwhile expense for the business. You may also expose a greater problem or a small cleanup task that can be assigned to a team member for immediate follow-up.

By adopting this habit, you are leveling up your FinOps readiness by creating a habit of reviewing and understanding costs, which otherwise are likely to be ignored until a critical problem arises.

Habit 3: Identify “created and never deleted” patterns

The most insidious sources of spend growth in an application stack tend to all cluster around resources that are created and never deleted. If left unchecked, this generates constant day-over-day spending growth of an application’s resources yet never creates an “incident” of spending suddenly spiking up. So it helps to identify this pattern and respond by enacting processes and policies to make sure resources eventually get deleted and spending is removed.

If you have trouble identifying some of these sources, you can go deeper by doing a top-to-bottom review of your entire application. Talk about what the system is doing and the resources required. Identify what relies on those systems and resources and ask yourself who would be affected if that part of the system were to be downsized or deleted. It’s often the case that business needs change, and parts of the system that made sense when they were created end up completely losing meaning after months or years have passed.

Habit 4: Identify opportunities to replace classic compute with managed services or serverless

Managed cloud technologies are a boon to technology companies throughout the industry. Cloud providers generally provide easy ways to implement well-known solutions to common problems, such as databases, pub/sub patterns, or simply running your applications.

For example, if your applications are running on bare compute instances, you can consider adopting container solutions that abstract away the compute resources that applications run on, such as GCP Cloud Run or AWS Fargate. Another great example is identifying that a manually operated RabbitMQ cluster could be reimplemented using GCP Cloud Pub/Sub or AWS SNS. Identifying these scenarios and calling them out is a great way to help your finance team understand the possibilities for cost optimization that they wouldn’t usually be able to detect from the outside.

Make a habit of identifying and pursuing these rearchitecting efforts to optimize both direct cloud costs as well as operational complexity. If you have a “keep the lights on” or “tech debt” backlog, that’s a great place to put these projects, so long as you’re burning down that list over time.

Ternary is here to help you build these habits

When trying to build habits around FinOps, such as the ones in this article, we often find that creating visibility and observability around cloud cost is a project that involves a lot of upfront effort before you can even reap any benefits or savings. Additionally, the tools in the space aren’t designed for people in product teams who just want a quick way to see the spending in your purview.

Ternary provides a single pane of glass that Finance and Engineering can use as a shared language to discuss costs. It’s the dashboard that you can look at every day during your scrum meeting. It’s easy to set up. You can start showing it to your team members within just an hour of activating your billing data inside our platform. And it’s built by a team that has felt the pain of wanting to action FinOps but needing to do a huge buildout just to start having the conversations across teams that are recommended by the practice.

To get started on your FinOps journey with a trusted partner who has felt this pain before, contact us.

CIO/CTO Corner: overspend in the cloud, part 2

Habit 1: Develop stable unit-cost metrics

Habit 2: Monitor day-by-day cost movements for your stack

Habit 3: Identify “created and never deleted” patterns

Habit 4: Identify opportunities to replace classic compute with managed services or serverless

Ternary is here to help you build these habits