Audience
AI and LLM developers
About Ferret
An End-to-End MLLM that Accept Any-Form Referring and Ground Anything in Response.
Ferret Model - Hybrid Region Representation + Spatial-aware Visual Sampler enable fine-grained and open-vocabulary referring and grounding in MLLM.
GRIT Dataset (~1.1M) - A Large-scale, Hierarchical, Robust ground-and-refer instruction tuning dataset.
Ferret-Bench - A multimodal evaluation benchmark that jointly requires Referring/Grounding, Semantics, Knowledge, and Reasoning.
Other Popular Alternatives & Related Software
Amazon Nova 2 Lite
Nova 2 Lite is a lightweight, high-speed reasoning model designed to handle everyday AI workloads across text, images, and video. It can generate clear, context-aware responses and lets users fine-tune how much internal reasoning the model performs before producing an answer. This adjustable “thinking depth” gives teams the flexibility to choose faster replies or more detailed problem-solving depending on the task. It stands out for customer service bots, automated document handling, and general business workflow support. Nova 2 Lite delivers strong performance across standard evaluation tests. It performs on par with or better than comparable compact models in most benchmark categories, demonstrating reliable comprehension and response quality. Its strengths include interpreting complex documents, pulling accurate insights from video content, generating usable code, and delivering grounded answers based on provided information.
Learn more
Selene 1
Atla's Selene 1 API offers state-of-the-art AI evaluation models, enabling developers to define custom evaluation criteria and obtain precise judgments on their AI applications' performance. Selene outperforms frontier models on commonly used evaluation benchmarks, ensuring accurate and reliable assessments. Users can customize evaluations to their specific use cases through the Alignment Platform, allowing for fine-grained analysis and tailored scoring formats. The API provides actionable critiques alongside accurate evaluation scores, facilitating seamless integration into existing workflows. Pre-built metrics, such as relevance, correctness, helpfulness, faithfulness, logical coherence, and conciseness, are available to address common evaluation scenarios, including detecting hallucinations in retrieval-augmented generation applications or comparing outputs to ground truth data.
Learn more
Llama 4 Maverick
Llama 4 Maverick is one of the most advanced multimodal AI models from Meta, featuring 17 billion active parameters and 128 experts. It surpasses its competitors like GPT-4o and Gemini 2.0 Flash in a broad range of benchmarks, especially in tasks related to coding, reasoning, and multilingual capabilities. Llama 4 Maverick combines image and text understanding, enabling it to deliver industry-leading results in image-grounding tasks and precise, high-quality output. With its efficient performance at a reduced parameter size, Maverick offers exceptional value, especially in general assistant and chat applications.
Learn more
LLaVA
LLaVA (Large Language-and-Vision Assistant) is an innovative multimodal model that integrates a vision encoder with the Vicuna language model to facilitate comprehensive visual and language understanding. Through end-to-end training, LLaVA exhibits impressive chat capabilities, emulating the multimodal functionalities of models like GPT-4. Notably, LLaVA-1.5 has achieved state-of-the-art performance across 11 benchmarks, utilizing publicly available data and completing training in approximately one day on a single 8-A100 node, surpassing methods that rely on billion-scale datasets. The development of LLaVA involved the creation of a multimodal instruction-following dataset, generated using language-only GPT-4. This dataset comprises 158,000 unique language-image instruction-following samples, including conversations, detailed descriptions, and complex reasoning tasks. This data has been instrumental in training LLaVA to perform a wide array of visual and language tasks effectively.
Learn more
Pricing
Starting Price:
Free
Pricing Details:
Open source
Free Version:
Free Version available.
Integrations
No integrations listed.
Company Information
Apple
Founded: 1976
United States
github.com/apple/ml-ferret
Other Useful Business Software
Auth0 for AI Agents now in GA
Connect your AI agents to apps and data more securely, give users control over the actions AI agents can perform and the data they can access, and enable human confirmation for critical agent actions.
Product Details
Platforms Supported
Cloud
Training
Documentation