Costwiz: Saving cost for LinkedIn enterprise on Azure


Authors: Deven Walia, Vivek Subramaniam, Simon Desowza, and Karthik Subramanian

Cloud services have completely changed the way we approach infrastructure management. It’s now much easier to manage large infra requirements that have traditionally demanded an amalgamation of teams like DBA, Infra-SRE, Onprem-SMEs, network managers, and access control managers working together. However, the ease of these processes can lead to over-provisioning and under-utilization of cloud resources, resulting in increased operating expenses. Without careful monitoring and accountability in place, organizations risk getting swept away by soaring costs, compromising their ability to enhance the member and customer experience.

That’s why we built Costwiz, a tool that allows us to reduce costs by helping teams keep an eye on budgets and over-provisioned or under-utilized resources. Costwiz provides a unified experience that helps leaders drive more accurate forecasting of Azure budgets at LinkedIn with resource ownership detection, accountability, expedited remedies, and holistic data visibility (via custom dashboards). In this blog post, we will share our progress, challenges, and lessons learned from our Costwiz journey.

How Costwiz works

Costwiz detects and stops cloud cost anomalies as they occur to avoid unpleasant billing surprises. To identify where suboptimal spending is occurring, it ingests cost-cutting recommendations from Azure Advisor, an Azure service that is constantly analyzing resource utilization and other metrics to help ensure an optimized Azure deployment. Costwiz automates this process to alert teams of cost-saving options and proactively save money while giving teams deep visibility into your cloud costs.

Costwiz creates accountability by notifying organization owners to assign these recommendations to engineers or SREs and tracking the workflow of a recommendation. If a recommendation is not remediated in a set timeframe, Costwiz escalates the issue to the assigned person’s team and shares a summary email to organization leaders. It also helps aid decision-makers with the information they need around resource utilization details, its current cost in Azure, recommended action, potential savings, assigned engineers, and more in a unified UI.

Our approach to building Costwiz

For dashboards and alerts through regular emails, we decided to use native reporting tools available to us like Power BI to help us scale quickly. These tools expose the right set of data without us having to worry about engineering efforts for dedicated reporting UI or a workflow for email alerts and reporting dashboards. This allowed us to concentrate on other business problems and optimization efforts that needed attention.

Costwiz application and workflow management system

For the initial rollout, we built a single-page app for users to perform actions on their recommendations (as seen in Figure 1). The landing page lists all the resource recommendations along with metadata around resource owners (Azure security groups), recommendation message, current lifecycle status of the recommendation, due date, assigned engineer, last action message in terms of comments, and a history modal option to check the timeline of actions taken. Furthermore, engineers can access more information about subscription details, the total number of escalations, and the last escalation date.

Managers will have a custom view based on their login token, giving extended visibility into all recommendations assigned to their organization.



Source link