Amazon now has its own foundation models called ‘Amazon Nova’. The suite of multi-modal models were unveiled by Amazon CEO Andy Jassy at AWS’ Re:Invent conference in Las Vegas this week.
According to Jassy, the new models (integrated into Amazon Bedrock) are designed to enhance multi-modal capabilities, speed, and cost-efficiency for enterprise customers.
The Nova family comprises four models tailored to various workloads:
- Amazon Nova Micro: A lightweight text-only model optimised for rapid response times and cost efficiency.
- Amazon Nova Lite: A multi-modal model capable of processing text, images, and videos.
- Amazon Nova Pro: Designed for more complex multi-modal tasks.
- Amazon Nova Premier: Expected in early 2025, aimed at complex reasoning and advanced use cases.
The models also support distillation. This means they can transfer specific knowledge from larger ‘teacher’ models to smaller models that are accurate but cheaper to run.
There currently isn’t any pricing attached to the models, so you’ll have to stay tuned for that in the future. However, the company says its Nova models are at least 75% less pricey than their most comparable models already available on Bedrock.
AWS claims Nova models perform on par with or better than leading models across industry benchmarks.
Nova Micro reportedly matches or exceeds Google Gemini 1.5 and Nova Pro rivals OpenAI GPT-4o and Google Gemini Pro in instruction-following and complex multi-modal tasks.
“Inside Amazon, we have about 1,000 generative AI applications in motion, and we’ve had a bird’s-eye view of what application builders are still grappling with,” Rohit Prasad, senior vice president and head scientist of Amazon Artificial General Intelligence, said.
“Our new Amazon Nova models are intended to help with these challenges for internal and external builders.”
The Nova suite also offers:
- Nova Canvas: Generates high-quality images with features like text-based editing and adjustable layouts.
- Nova Reel: Focuses on video generation, enabling users to produce six-second video clips with customisable styles and pacing.
AWS plans to extend Reel’s capabilities to handle videos up to two minutes in the coming months.
Both models incorporate safeguards such as watermarking and content moderation to promote responsible AI use.
Multimodal-to-multimodal models are also on the cards
AWS announced plans to further enhance Nova’s capabilities in 2025. This will include:
- Speech-to-speech model: Capable of handling natural, human-like conversations with contextual understanding.
- Multimodal-to-multimodal model: The idea is to process and generate text, images, audio, and video seamlessly.
AWS’s broader announcements
Beyond Amazon Nova, AWS announced multi-agents and enhanced guardrails in Amazon Bedrock.
It also introduced SageMaker Studio to consolidate data and model workflows, updates to Amazon Q Developer and Q Business, as well as the general availability of its Trainium 2 chips.
The author travelled to Re:Invent in Las Vegas as a guest of AWS.
Never miss a story: sign up to SmartCompany’s free daily newsletter and find our best stories on LinkedIn.