Introducing Our Next Generation Infrastructure for AI


The next generation of Meta’s large-scale infrastructure is being built with AI in mind, including supporting new generative AI products, recommendation systems and advanced AI research. It’s an investment we expect will grow in the years ahead, as the compute requirements to support AI models increase alongside the models’ sophistication.

Last year, we unveiled our Meta Training and Inference Accelerator (MTIA) v1, our first-generation AI inference accelerator that we designed in-house with Meta’s AI workloads in mind. It was designed specifically for our deep learning recommendation models that are improving a variety of experiences across our apps and technologies. 

MTIA is a long-term bet to provide the most efficient architecture for Meta’s unique workloads. As AI workloads become increasingly important to our products and services, this efficiency will be central to our ability to provide the best experiences for our users around the world. MTIA v1 was an important step in improving the compute efficiency of our infrastructure and better supporting our software developers as they build AI models that will facilitate new and better user experiences. 

The next generation of MTIA is part of our broader full-stack development program for custom, domain-specific silicon that addresses our unique workloads and systems. This new version of MTIA more than doubles the compute and memory bandwidth of our previous solution while maintaining our close tie-in to our workloads. It is designed to efficiently serve the ranking and recommendation models that provide high-quality recommendations to users.

This chip’s architecture is fundamentally focused on providing the right balance of compute, memory bandwidth and memory capacity for serving ranking and recommendation models. 

A GIF showing the chip's architecture.

MTIA has been deployed in our data centers and is now serving models in production. We are already seeing the positive results of this program as it’s allowing us to dedicate and invest in more compute power for our more intensive AI workloads.

The results so far show that this MTIA chip can handle both low complexity and high complexity ranking and recommendation models which are key components of Meta’s products.  Because we control the whole stack, we can achieve greater efficiency compared to commercially available GPUs (graphics processing units). 

This slideshow requires JavaScript.

Meta’s Ongoing Investment in Custom Silicon

MTIA will be an important piece of our long-term roadmap to build and scale the most powerful and efficient infrastructure possible for Meta’s unique AI workloads.

We’re designing our custom silicon to work in cooperation with our existing infrastructure as well as with new, more advanced hardware (including next-generation GPUs) that we may leverage in the future. Meeting our ambitions for our custom silicon means investing not only in compute silicon but also in memory bandwidth, networking and capacity, as well as other next-generation hardware systems.

An image to show our ongoing investment in custom silicon.

We currently have several programs underway aimed at expanding the scope of MTIA, including support for GenAI workloads. And we’re only at the beginning of this journey.





Source link