The emergence of generative AI, sparked by the launch of ChatGPT at the end of 2022, has brought new business opportunities and sparked the imagination of many companies. Across various fields, there have been numerous attempts to revolutionize the customer experience using generative AI technology. Regardless of their size, companies are either launching new services by actively leveraging APIs of commercial models or by developing products using open-source models like Llama. Additionally, new services that aim to offer novel interface experiences, such as Arc Search, Perplexity, and Liner, have also emerged.
However, most AI companies struggle financially, failing to balance revenue and costs.[1] While failing to make successful business results, generative AI technology comes with a huge cost. For instance, Stability AI, which offers image generation services, is expected to incur approximately $96 million in annual costs while generating only about $60 million in revenue this year. This financial imbalance often leads to restructuring. Also, startups like Inflection AI, which launched a personal AI assistant chatbot, and Tome AI[2], which automatically generates presentations, face immense AI costs while earning minimal revenue, and this raises concerns about the sustainability of the generative AI businesses. The AI industry requires massive investments in server infrastructure, data acquisition, and recruiting advanced technical talent. So it's easy to find startups struggling with operational challenges even before achieving concrete business results domestically.[3]
In this context, companies fail when they don't seriously consider budget costs when revenue projections are uncertain. Generative AI can incur costs that exceed expectations depending on the adoption method, user base, and scale of interactions. Therefore, the cost should be the primary consideration when assessing the sustainability of an AI business from the outset. It is crucial to manage the performance and cost of AI models through visible metrics from the beginning and to predict expenses in line with the business growth pace. In addition, efforts, such as establishing cost guardrails, are necessary to control excessive spending. This article explores how to calculate and manage costs to successfully sustain an AI business.
The first essential step is understanding which AI service implementation method best suits your business. Building AI services using commercial solutions can generally be divided into three main approaches.
The first approach is using Third-Party Vendor's Closed Source Services. This involves turnkey AI solutions provided by AI companies, such as OpenAI's ChatGPT or Naver's HyperCLOVA X. These services offer guaranteed performance models packaged as managed services, allowing quick adoption. Also, robust customer support is their chief characteristic. However, customization to fit specific business needs can be challenging because the AI service is used in a closed-source format. Additionally, if a business becomes locked into using this software, it may increase dependency on the vendor's technology.
The second approach is Third-Party Hosted Open Source Services. This involves using open-source models tailored to business needs but hosted by a third-party company. For instance, there are HuggingGPT, Lambda, and South Korean company Kakao's KoGPT. Unlike closed-source services, the structure and training methods of the models are available for all, allowing for flexible customization and development. However, the quality of the AI model can vary significantly depending on the open-source model and its application.
The third approach is preferred by organizations that opt for a Do-It-Yourself (DIY) Method of building and deploying AI models. This involves using infrastructure and tools that hyperscalers provide, including AWS Sagemaker, GCP Vertex, and Azure AI, to develop in-house models. This method offers complete control over the model, but the quality, accuracy, performance, and development timeline depend on the organization's capabilities. While there is also the option to Self-Host Workloads, developing a LLM from scratch is very expensive, so this approach is typically excluded when discussing LLM-based services.
<Table 1> Pros and Cons of each AI Service
AI service | Pros | Cons |
---|---|---|
Third-Party Vendor's Closed Source Services | Rapid implementation High-quality models Consistent and reliable customer support Security advantage: Vendors ensure their models do not allow data breaches or unauthorized access (However, if the vendor's security is inadequate, there is a risk of severe reputation damage or fines under data protection laws) |
Limited customization options Vendor lock-in Bias/Privacy risks High cost due to proprietary technology High dependency on the vendor for AI technology updates |
Third-Party Hosted Open-Source Services | Flexible customization/control Compliance with privacy/security regulations Community support More economical than closed-source services |
Quality, accuracy, performance: User-managed/improved Requires a high level of technical expertise Time-consuming to achieve desired outcomes Inefficient support channels |
DIY on Cloud Provider | Full control over the model Compliance with privacy/security regulations is possible Budget can be adjusted and managed Easy to integrate with other cloud services |
Quality, accuracy, performance: User responsibility Requires significant expertise in AI and cloud management Long time to market |
When comparing the cost factors of the three approaches based on a company's situation: The first, Closed Source Service, typically bases its costs on the number of tokens, calculated by the number of words or symbols processed during input/output or API calls. Calculating tokens varies by language, and the costs increase with usage. However, for companies with limited technical personnel, this model can be more economical in the long run and it is suitable for companies lacking deployment capabilities. Also, due to the importance of auditing security and regulatory requirements, the security protocols and certifications provided by closed-source services can be efficient for industries like finance and medical.[4] The second approach, Third-Party Hosted Open-Source Service, is significantly cheaper than closed-source services. However, costs can vary depending on how well the service is tuned and managed to fit the business. Due to its broad scalability, it requires personnel with specialized knowledge. Considering the labor costs of such skilled professionals, this approach may require a larger budget for certain companies. The third approach, where a company hosts the service directly on its infrastructure (DIY), involves paying for the computing resources necessary to run AI models, especially GPUs. There may also be licensing fees for the LLM itself, requiring more time and skilled personnel[5] than other methods.
When applying and using commercial AI, choosing the right AI model and accurately predicting costs are essential considerations. It's important to determine which performance is required, such as whether voice recognition is needed, if image processing is required, or if text summarization and data analysis are essential. Also, understanding the complexity of the AI model, whether it needs to be as simple as a website chatbot or requires a higher level of sophistication and intelligence, should be factored into the budget calculation.
Moreover, the choice of development approach is crucial in model development. A straightforward approach can be taken to calculate the total cost using the formula 'Service Usage * Usage Fee = Cost.' However, the specifics of usage and fees will vary depending on the type of service used (e.g., SaaS-AI, PaaS, IaaS), the model size, the type of tasks the model is applied to, token-based costs, and the duration of service usage. These factors will cause the cost calculation formula to differ slightly depending on the pricing model.
One of the methods influencing LLM training and development is prompt tuning. This can be flexibly optimized using manual, automated, or guardrail approaches and varying quantities and ratios depending on the chosen method. The examples provided in prompts may include zero-shot and few-shot learning, depending on whether static or dynamic prompt inputs are provided. During model training, you can decide whether to train the model without providing any information or with information, which incur costs in providing information. Retrieval-Augmented Generation (RAG) involves searching for facts to include in the prompt input and provides a series of steps for factual correspondence. Fine-tuning addresses how to handle the model quality, bias, iterations, and hallucinations across different stages, such as POC, development, and product operation. Reinforcement Learning with Human Feedback (RLHF) involves adjusting reward functions for specific outputs and scaling human involvement over time. Training a scratch model includes generating a custom model through training and iterations, considering trade-offs between reuse or improvement and the costs associated with new training and quality/accuracy. Various methods to enhance AI performance, such as transformations during the learning process and input/output handling, incur costs across all aspects. Therefore, it's essential to determine the required level of performance and the corresponding learning methods. Once the learning method is selected, it is crucial to identify the cost factors involved, how much they are used, and the associated expenses to accurately predict overall expenditure. Although this process can be complex and cumbersome, every aspect is directly linked to costs.
Once you have selected an AI solution deployment method, it's crucial to understand the cost factors associated with each deployment solution and the scope of ownership and intellectual property rights for effective budget management.
Firstly, the primary cost driver for these services is the usage of application APIs. Usage is measured by 'token in/out,' corresponding to the number of text words processed. Additional costs may arise when using plugins or other models. Since many commercially available LLM models are based on English, using Korean inputs may require more tokens. Therefore, it is important to verify what data the commercial AI model primarily holds and how it tokenizes the relevant language. While the base model typically incurs minimal training costs due to the limited extent of prompt tuning, attempting high-level fine-tuning can be significantly expensive. From the perspective of labor costs, technical expertise is less necessary than other systems, typically requiring management at the PaaS level of SKU management.
Secondly, the primary cost factors for third-party hosted open-source models are the usage or rental time of infrastructure GPU/RAM. Costs are incurred based on model training, data processing volume, and storage capacity. However, since a pre-trained model can be further trained, the overall cost is moderate among the three options. Achieving optimal performance may require significant development costs and skilled personnel, and additional SKU costs may arise for elements such as models, data, and API calls. This can lead to the risk of budget overruns, requiring careful management of computing resources. Moreover, more specialized personnel and technical expertise are needed for customization compared to closed-source services.
Thirdly, in the DIY approach using cloud systems, the primary cost drivers are the usage or rental time of infrastructure GPU/RAM. This method requires a high level of diverse technical expertise, as you need to manage Software Key Users (SKUs) for all elements, including hardware, software, modeling, model training, APIs, and licenses. While there is the greatest degree of freedom in managing model development costs, the costs can vary significantly depending on the model's performance and use case since the model must be trained from scratch. However, choosing this approach provides complete control over all aspects of the AI system and full ownership and control over data and intellectual property. By identifying and understanding these cost drivers, more informed decisions can be made to align with the project's scope, technical requirements, and budget.
A strategy is needed to reduce costs associated with AI in line with the direction of LLM model usage. To develop this strategy, it's crucial to plan the AI system deployment strategy while considering the impact on existing spending commitments and capacity. According to FinOps, when using third-party vendor closed-source AI services, as demand and operational costs increase, the pricing structure can fluctuate and potentially rise.[6] The cost of the latest versions of LLM/plugins can be 5-20 times higher than previous versions and context lengths, so making wise decisions based on the workload is necessary. Capacity commitment discounts vary depending on time, amount, and duration and should be examined in detail according to existing pricing agreements. Training and fine-tuning costs for third-party hosted open-source models for POC have rapidly decreased. Also, hardware prices have stabilized for small- to medium-scale deployments, but in large-scale deployments, GPU resources are still limited, requiring capacity assurance. Moreover, the unit cost for the latest CPU/GPU/TPU generations is higher when considering the same GB RAM/core unit ratios as previous generations. Using serverless and GPU-managed middleware services in a DIY cloud system approach can help reduce costs and minimize the required technical expertise and effort. By leveraging pre-built model templates and recommended computer optimization guidelines, organizations can lower basic costs and avoid unnecessary expenses, thereby reducing the total cost.
There are also important considerations regarding the deployment method. The cost difference between the most expensive major vendors and fully optimized platforms can range from 30 to 200 times. Particularly for enterprises, this significant cost disparity requires careful deliberation. A technology like input prompt optimization simplifies input and can lead to a 15-25% cost reduction. Furthermore, optimizing technology, platforms, configurations, scalability, and purchasing methods together has resulted in greater cost savings than using a single strategy alone. As automation continues to advance, there are additional opportunities to reduce costs associated with fine-tuning, synthetic data generation, and RAG functionalities.
The various technical layers and components of AI-based products and services can significantly impact costs. When using commercial AI models like ChatGPT or Claude, costs are typically calculated based on the total number of API requests. In contrast, when deploying AI services based on open-source models via cloud systems, costs must be calculated considering GPU usage, RAM, storage, and data processing costs per unit provided. Engineering teams must evaluate the costs and performance across different engineering layers, depending on the deployment options and develop efficient strategies for building and allocating resources.
For more accurate AI cost predictions, it is advisable to create long-term plans based on the traffic capacity of AI models used by various business teams across the organization. Estimating AI costs is just a projection and should be regularly monitored and adjusted based on the actual expenditures. Continuously tracking technological changes and AI vendor pricing policies also aids in making more accurate cost predictions.
To enhance the efficiency of AI system operations and manage costs effectively, it's essential to establish guardrails. These guardrails help align the organization's vision with product goals and mitigate risks. Organizations can create an environment where costs are better predicted and controlled by setting guardrails that consider budget limits, technical specifications, relationships with AI vendors, security and compliance, and operational metrics. The table below can serve as a reference for establishing guardrails to effectively manage AI costs:
<Table 2> Example of Scopes and Tasks for Cost Guardrails
Scope | Tasks |
---|---|
Budget |
Fix the amount spent or the number of API operations for POC. Send real-time alerts when the set threshold is approached or exceeded. Distribute reports containing predicted costs, actual costs, and quantities. |
Technology | Select the appropriate deployment model and control usage. Implement pre-approval and strict authorization settings. Carefully scale AI workloads (GPUs can be very expensive). |
Vendor Relationship |
Assess the dependency on and potential lock-in with vendors. Ensure sufficient trust in future strategies. |
Security and Compliance |
Manage role-based data sharing violations and permissions. Conduct training for AI providers to protect the company's sensitive data. |
Testing and Operations |
Monitor usage and costs against set thresholds. Build pipelines to handle ethical issues such as bias and safety. Track performance metrics such as accuracy, latency, and speed. Strengthen cost management and optimization policies. |
Ultimately, defining the success of AI services and determining the best way to achieve it requires making cost estimation an objective metric and working to control it while striving for success. AI services are evaluated based on the processing speed, accuracy, and response quality. Effectively managing costs per task, GPU costs, and other expenses is essential to balancing performance and cost. Engineering goals, such as improving the speed and accuracy, reducing training costs, and lowering the cost per token, should be established. It is crucial to set appropriate engineering goals for the various factors that affect AI services and for team members to work towards achieving them.
AI technology continuously evolves with new updates added daily, enhancing the potential across all industries. However, as the economy worsens, AI companies are required to demonstrate sustainability from the early stages of their business. In a still-maturing AI field, proving the viability of self-sustained growth is essential for securing additional investment. This situation is often tied to the practices of many startups, which may overestimate business outcomes or underestimate costs.
Proactively considering the strategies and costs needed to create and operate AI products in the early stages of the business is crucial for reducing risks and enhancing the chances of survival. Companies that fail to achieve tangible growth may be forced out of the market, yet investment in AI is expected to continue expanding. To build a robust business model that can withstand competitive market conditions and technological dependencies, it is essential to maintain technological agility while effectively controlling costs and responding proactively to changes.
References
[1] Samsung SDS Insight Report, "2023 Domestic AI Adoption and Utilization Survey," Dec 08, 2023.
[2] Deloitte, “TrustID Generative AI Analysis”, August 2023.
[3] McKinsey, “Why AI-enabled customer service is key to scaling telco personalization”, Oct 24, 2023.
[4] Businesswire, “86 Percent of Consumers Will Leave a Brand They Trusted After Only Two Poor Customer Experiences”, Feb 02, 2022.
[5] HelloT, AI Utilization in the Telecom Industry Expected to Increase Tenfold by 2032, Feb 12, 2024.
[6] McKinsey, “How AI is helping revolutionize telco service operations”, Feb 25, 2022.
[7] The Economist, “Why fintech won't kill banks”, Jul 17, 2015.
[8] Deloitte Insights, "The Evolving Banking Industry in the Age of AI," Aug 2023.
[9] Tech42, "'Internet Banks' on the Rise? No.1 KakaoBank and No.2 Toss Rank as Top Banking Apps," Nov 07, 2022.
[10] The Economist, "Hoping for a Fintech as Profitable as Samsung Electronics Overseas," Oct 04, 2023.
[11] ZDnet Korea, "Shinhan Bank Creates Small Business Ecosystem with Delivery App 'DDangyo'," Oct 19, 2023.
[12] Voicebot, “Bank of America’s AI Assistant Erica Passes 1.5 Billion Interactions”, Oct 19, 2023.
[13] Korea Economic Daily, "Is Fintech Only for Gen-Z? Senior Users Also Represent Key Customers," Jun 14, 2022.
▶ This content is protected by the Copyright Act and is owned by the author or creator.
▶ Secondary processing and commercial use of the content without the author/creator's permission is prohibited.
Senior Program Manager at SAP France
Majored in computer science in Korea and worked as a developer in LG and Fujitsu Korea for 7 years,
Moved to Paris, France in 1998 and worked as a development manager and program manager at Business Objects, and is working as a product/program manager in Engineering UX Division at SAP.