Announcing Meta’s Llama 3.1 405B, 70B and 8B at Amazon Bedrock | Amazon Web Services

Voiced by Polly

Today we’re announcing the general availability of Llama 3.1 models on Amazon Bedrock. The Llama 3.1 models are Meta’s most advanced and capable models to date. The Llama 3.1 models are a collection of 8B, 70B, and 405B parameter size models that demonstrate state-of-the-art performance across a wide range of industry benchmarks and offer new capabilities for your generative artificial intelligence (AI) applications.

All Llama 3.1 models support a context length of 128,000 (an increase of 120,000 tokens over Llama 3), which is 16 times the capacity of Llama 3 models, and improved reasoning for multilingual dialog use cases in eight languages, including English, German, French, Italian, Portuguese, Hindi, Spanish and Thai.

You can now use Meta’s three new Llama 3.1 models in Amazon Bedrock to build, experiment, and responsibly scale your generative AI ideas:

  • Llama 3.1 405B is the largest publicly available large language model (LLM) in the world, according to Meta. This model sets a new standard for AI and is ideal for enterprise-level applications and research and development (R&D). It is ideal for tasks such as synthetic data generation where model outputs can be used to improve smaller Llama models and model distillations to transfer knowledge to smaller models from the 405B model. This model excels in general knowledge, long text generation, multilingual translation, machine translation, coding, mathematics, tool use, better contextual understanding, and advanced reasoning and decision making. To learn more, visit the AWS Machine Learning Blog about using Llama 3.1 405B to generate synthetic data for model distillation.
  • Llama 3.1 70B it is ideal for content creation, conversational artificial intelligence, language understanding, research and development, and enterprise applications. The model excels in text summarization and accuracy, text classification, sentiment analysis and nuanced reasoning, language modeling, dialog systems, code generation, and instruction following.
  • Llama 3.1 8B it is best suited for limited computing power and resources. The model excels in text summarization, text classification, sentiment analysis, and language translation requiring low-latency inference.

Meta measured the performance of Llama 3.1 on more than 150 benchmark datasets covering a wide range of languages ​​and extensive human evaluation. As you can see in the chart below, Llama 3.1 outperforms Llama 3 in all major benchmarking categories.

To learn more about Llama 3.1 features and capabilities, visit the Llama 3.1 by Meta and Llama tab in the AWS documentation.

You can leverage the responsible AI capabilities of Llama 3.1 combined with the data management and model evaluation capabilities of Amazon Bedrock to build secure and reliable generative AI applications.

  • Handrail for Amazon Bedrock – By creating multiple guardrails with different configurations tailored to specific use cases, you can use the guardrails to support secure interactions between users and your generative AI applications by implementing security tailored to your use cases and responsible AI policies. With Guardrails for Amazon Bedrock, you can continuously monitor and analyze user input and model responses that might violate customer-defined policies, detect hallucinations in model responses that are not based on business data or are not relevant to the user’s query, and evaluate different models. including in-house and third-party models. To get started, visit the Creating a Guardrail page in the AWS documentation.
  • Model rating on Amazon Bedrock – You can evaluate, compare and select the best llama models for your use case in a few steps using automatic or human evaluation. With model evaluation on Amazon Bedrock, you can choose automatic evaluation with predefined metrics such as accuracy, robustness, and toxicity. Alternatively, you can choose human rating workflows for subjective or custom metrics such as relevance, style and alignment with brand voice. Model Evaluation provides built-in curated datasets, or you can bring your own datasets. To get started, visit the Getting Started with Model Evaluation page in the AWS documentation.

To learn more about how to keep your data and applications secure and private in AWS, visit the Amazon Bedrock Security and Privacy page.

Starting with Llama 3.1 models in Amazon Bedrock
If you are new to using Meta’s Llama models, go to the Amazon Bedrock console in the US West (Oregon) region and select Access to the model in the lower left panel. To access the latest Llama 3.1 models from Meta, request access separately for Llama 3.1 8B Instruction, Llama 3.1 70B Instructiongold Llama 3.1 450B Instruction.

To test Llama 3.1 models in the Amazon Bedrock console, select Text gold Cat under Playground in the left menu pane. Then take your pick Select a model and select Target as category a Llama 3.1 8B Instruction, Llama 3.1 70B Instructiongold Llama 3.1 405B Instruction like a model.

In the following example I selected Llama 3.1 405B Instruction Model.

By choosing View API request, you can also access the model using code examples in the AWS Command Line Interface (AWS CLI) and the AWS SDK. You can use model IDs such as meta.llama3-1-8b-instruct-v1, meta.llama3-1-70b-instruct-v1 gold meta.llama3-1-405b-instruct-v1.

Here is a sample AWS CLI command:

aws bedrock-runtime invoke-model \
  --model-id meta.llama3-1-405b-instruct-v1:0 \
--body "{\"prompt\":\" (INST)You are a very intelligent bot with exceptional critical thinking(/INST) I went to the market and bought 10 apples. I gave 2 apples to your friend and 2 to the helper. I then went and bought 5 more apples and ate 1. How many apples did I remain with? Let's think step by step.\",\"max_gen_len\":512,\"temperature\":0.5,\"top_p\":0.9}" \
  --cli-binary-format raw-in-base64-out \
  --region us-west-2 \
  invoke-model-output.txt

You can use the code examples for Llama models in Amazon Bedrock using the AWS SDK to build applications using different programming languages. The following Python code examples show how to send a text message to Llama using the Amazon Bedrock Converse text generation API.

import boto3
from botocore.exceptions import ClientError

# Create a Bedrock Runtime client in the AWS Region you want to use.
client = boto3.client("bedrock-runtime", region_name="us-west-2")

# Set the model ID, e.g., Llama 3 8b Instruct.
model_id = "meta.llama3-1-405b-instruct-v1:0"

# Start a conversation with the user message.
user_message = "Describe the purpose of a 'hello world' program in one line."
conversation = (
    {
        "role": "user",
        "content": ({"text": user_message}),
    }
)

try:
    # Send the message to the model, using a basic inference configuration.
    response = client.converse(
        modelId=model_id,
        messages=conversation,
        inferenceConfig={"maxTokens": 512, "temperature": 0.5, "topP": 0.9},
    )

    # Extract and print the response text.
    response_text = response("output")("message")("content")(0)("text")
    print(response_text)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)

You can also use all Llama 3.1 models (8B, 70B and 405B) in Amazon SageMaker JumpStart. You can discover and deploy Llama 3.1 models with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK. You can run your models using SageMaker features such as SageMaker Pipelines, SageMaker Debugger, or container protocols under virtual private cloud (VPC) controls to help keep data secure.

Fine tuning for Llama 3.1 in Amazon Bedrock and Amazon SageMaker JumpStart is coming soon. When you create tuned models in SageMaker JumpStart, you’ll also be able to import your own models into Amazon Bedrock. To learn more, visit Meta Llama 3.1 models are now available in Amazon SageMaker JumpStart on the AWS Machine Learning Blog.

For customers who want to deploy Llama 3.1 models on AWS through self-directed machine learning workflows for greater flexibility and control of underlying resources, Amazon Elastic Compute Cloud (Amazon EC2) instances supported by AWS Trainium and AWS Inferentia enable high-performance and cost-effective deployment of Llama models 3.1 on AWS. To learn more, visit AWS AI Chipsets Deliver High Performance and Low Cost for Meta Llama 3.1 Models on AWS on the AWS Machine Learning Blog.

Customer Voices
To celebrate the launch, Parkin Kent, Business Development Manager at Meta, talks about the power of collaboration between Meta and Amazon, highlighting how Meta and Amazon are working together to push the boundaries of what’s possible with generative AI.

Learn how customer businesses are using Amazon Bedrock’s Llama models to harness the power of generative AI. Nomura, a global financial services group spanning 30 countries and regions, is democratizing generative artificial intelligence across its organization using Llama models in Amazon Bedrock.

TaskUs, a leading provider of outsourced digital services and next-generation customer experiences for the world’s most innovative companies, helps clients represent, protect and grow their brands using Llama models in Amazon Bedrock.

Now available
Meta’s Llama 3.1 405B, 70B, and 8B models are now generally available from Amazon Bedrock in the US West (Oregon) region. Check out the full list of regions for future updates. To learn more, check out the Llama in Amazon Bedrock product page and the Amazon Bedrock pricing page.

Try Llama 3.1 in the Amazon Bedrock console today and submit your feedback to AWS re:Post for Amazon Bedrock or through your usual AWS support contacts.

Visit our community.aws page for detailed technical content and learn how our Builder communities are using Amazon Bedrock in their solutions. Let me know what you build with Llama 3.1 in Amazon Bedrock!

Channy

July 23, 2024 – Updated post to add new model access screenshot and customer video with TaskUs.
July 25, 2024 – Updated post stating that Llama 3.1 405B is now generally available.

Leave a Comment