The FinOps Foundation has recently published a comprehensive paper titled “FinOps for AI Overview.” This paper serves as an essential guide for organizations aiming to stay ahead in the evolving landscape of AI and FinOps. It addresses the challenges and opportunities that AI, particularly generative AI (GenAI) services like large language models (LLMs), present for FinOps practitioners and those struggling with cloud financial management in the AI era.
As organizations move from pilot uses of AI into costly product implementations, it’s critical to understand and manage the associated costs proactively.
Key highlights from the paper
- Emergence of AI: The paper emphasizes that while AI adoption offers significant advantages, it introduces unique challenges for FinOps teams. These include the need to grasp new terminologies, collaborate with diverse stakeholders, and comprehend novel spending and discounting models. Additionally, managing specialized services, optimizing GPU instances, and handling specialized data ingestion requirements are highlighted as critical areas of focus.
- Managing AI costs: A significant portion of the paper is dedicated to strategies for managing costs and usage related to the adoption of text-based LLM GenAI systems. It provides insights into extending existing FinOps practices to accommodate the unique demands of AI services.
- Challenges unique to AI: The document delves into specific challenges that AI brings to the forefront, such as:
- The rapid impact of AI costs on diverse cross-functional teams
- The volatility in infrastructure markets due to GPU scarcity
- The complexity of achieving FinOps goals like understanding usage and quantifying business value in the context of AI
- Best practices and recommendations: To navigate these challenges, the paper offers best practices, including:
- Understanding AI services: Develop a foundational grasp of AI services and the personas involved in GenAI systems.
- Measuring business value: Apply techniques to assess the business impact of AI initiatives.
- Implementing incrementally: Adopt a “Crawl, Walk, Run” approach to managing AI costs, emphasizing gradual and structured adoption.
- Identifying key performance indicators (KPIs): Use suggested metrics to monitor and evaluate AI-related financial performance.
This paper not only provides a thorough understanding of the financial implications of AI adoption but also equips teams with actionable strategies to manage and optimize these costs effectively. To delve deeper into these insights and equip your FinOps team with the knowledge to manage AI-related costs proficiently, we encourage you to read the full paper on the FinOps Foundation’s website.
FAQ: FinOps for AI
What is FinOps for AI, and why is it important?
FinOps for AI extends traditional FinOps practices to address the unique cost management challenges and opportunities presented by generative AI (GenAI) services. It’s crucial because GenAI introduces specialized services, GPU infrastructure considerations, unique data ingestion requirements, and broader impact across diverse teams, requiring a tailored approach to cost optimization and business value quantification.
How are AI services similar to other cloud services, from a FinOps perspective?
Many fundamental FinOps principles still apply. The equation of Basic Price * Quantity = Cost remains valid. AI service costs appear in cloud billing data alongside other cloud costs. Tagging/labeling is often possible, and many AI service components are eligible for commitment-based discounts, as with traditional cloud services. These commonalities allow FinOps practitioners to leverage existing practices as a starting point.
What makes managing AI service costs different from managing traditional cloud costs?
GenAI introduces complexities like inconsistent pricing models, rapidly changing SKUs (sometimes without tagging capabilities), unfamiliar service names and types, and token-based billing. Scarcity of AI infrastructure, particularly GPU-based resources, requires capacity management techniques. Moreover, engineering teams may be inexperienced with AI services, and understanding the total cost of ownership (TCO) for AI use cases can be challenging due to continuous training requirements.
Who are the key personas that FinOps teams need to engage with when managing AI costs?
Besides the core personas (Engineering, Finance, Line of Business owners, and Procurement), FinOps teams need to engage with data scientists, data engineers, software engineers (including prompt engineers), business analysts, DevOps engineers, product managers, and end users. Because GenAI is relatively new, these personas may require additional support from the FinOps team regarding cost awareness and management.
How does the “Crawl, Walk, Run” framework apply to FinOps for AI?
The “Crawl, Walk, Run” framework provides a phased approach to adopting AI. “Crawl” focuses on learning, prototyping, and minimal viable product (MVP) development with a “fail fast” mentality and limited cost investments. “Walk” involves integrating the solution into simple business processes with minimal nonfunctional requirements. “Run” entails powering core business processes with AI, incorporating constant cost monitoring and optimization, and implementing a higher level of nonfunctional requirements, with a focus on total ROI.
What are some examples of KPIs that are important when considering FinOps for AI?
Important KPIs include cost per inference, training cost efficiency, token consumption metrics, resource utilization efficiency, anomaly detection rate, return on investment (ROI) for AI initiatives, cost per API call, time to achieve business value, and time to first prompt (developer agility). These metrics help track the efficiency and value derived from AI investments.
What are some of the regulatory and compliance considerations for FinOps teams managing AI costs?
Key considerations include data privacy regulations (like the European Union’s General Data Protection Regulation and California’s Consumer Privacy Act), intellectual property (IP) and licensing, AI bias mitigation and ethical compliance, sector-specific regulatory considerations (e.g., from HIPAA or FINRA), data-retention policies, environmental regulations, and emerging AI-specific regulations (like the EU AI Act). Compliance with these regulations is an important factor to consider in the total cost of AI deployment.
How do existing FinOps capabilities map to AI cost management, and what are the key differences?
While many FinOps capabilities remain relevant, AI introduces nuances. Data Ingestion poses greater uncertainty and requires higher skill. Allocation must account for complex architectures and multi-agent workloads. Reporting & Analytics requires specific AI metrics and stakeholder engagement. Anomaly Detection needs more frequent monitoring. Planning & Estimating faces challenges estimating model outputs. These differences highlight the need for adaptations in existing practices.