Introducing Purple Llama for Safe and Responsible AI Development


With over 100 million downloads of Llama models to date, a lot of this innovation is being fueled by open models. In order to build trust in the developers driving this new wave of innovation, we’re launching Purple Llama, an umbrella project that will bring together tools and evaluations to help developers build responsibly with open generative AI models. 

Why purple? Borrowing a concept from the cybersecurity world, we believe that to truly mitigate the challenges that generative AI presents, we need to take both attack (red team) and defensive (blue team) postures. Purple teaming, composed of both red and blue team responsibilities, is a collaborative approach to evaluating and mitigating potential risks.

To start, Purple Llama will include tools and evaluations for cybersecurity and input/output safeguards, with more to come in the near future. Components within the Purple Llama project will be licensed permissively enabling both research and commercial use. We believe this is a major step in enabling collaboration among developers and standardizing trust and safety tools for generative AI.

Cybersecurity

We are sharing what we believe is the first industry-wide set of cyber security safety evaluations for Large Language Models (LLMs). These benchmarks are based on industry guidance and standards and are built in collaboration with our security experts. With this initial release, we aim to provide tools that will help address risks outlined in the White House commitments including: 

  • Metrics for quantifying LLM cybersecurity risk
  • Tools to evaluate the frequency of insecure code suggestions 
  • Tools to evaluate LLMs to make it harder to generate malicious code or aid in carrying out cyber attacks

We believe these tools will reduce the frequency of insecure AI-generated code suggested by LLMs and reduce the helpfulness of LLMs to cyber adversaries. 

Input/Output Safeguards 

As we outlined in Llama 2’s Responsible Use Guide, we recommend that all inputs and outputs to the LLM be checked and filtered in accordance with content guidelines appropriate to the application.

To support this, we are releasing Llama Guard, an openly available foundational model to help developers avoid generating potentially risky outputs. As part of our ongoing commitment to open and transparent science, we are releasing our methodology and an extended discussion of our results in our paper. This model has been trained on a mix of publicly available datasets to enable detection of common types of potentially risky or violating content. Ultimately, our vision is to enable developers to customize future versions to support relevant use cases based on their own requirements and make it easier to adopt best practices and improve the open ecosystem.

An Open Ecosystem

Taking an open approach to AI is not new for Meta. Exploratory research, open science and cross-collaboration are foundational to our AI efforts and we believe there’s an important opportunity to create an open ecosystem. This collaborative mindset was at the forefront when Llama 2 launched in July with over 100 partners and we’re excited to share that many of those same partners are working with us on open trust and safety including: AI Alliance, AMD, Anyscale, AWS, Bain, CloudFlare, Databricks, Dell Technologies, Dropbox, Google Cloud, Hugging Face, IBM, Intel, Microsoft, MLCommons, Nvidia, Oracle, Orange, Scale AI, Together.AI and many more to come. 

We’re excited to collaborate with our partners and others who share the same vision of an open ecosystem of responsibly-developed generative AI. 





Source link