Bespoke Labs

Bespoke Labs

Data Infrastructure and Analytics

Mountain View, California 1,017 followers

Bespoke Labs is a venture funded startup creating AI tools for data curation and post-training LLMs. (We are hiring!)

About us

Data curation and Small Specialized Models using Generative AI.

Industry
Data Infrastructure and Analytics
Company size
2-10 employees
Headquarters
Mountain View, California
Type
Privately Held

Locations

  • Primary

    800 W El Camino Real

    Mountain View, California 94040, US

    Get directions

Employees at Bespoke Labs

Updates

  • View organization page for Bespoke Labs, graphic

    1,017 followers

    Bespoke Labs is excited to contribute to Evalchemy, an open-source platform for LLM evaluation. The problem: Running popular Evals for an LLM like MMLU, MTBench, WildBench, RepoBench, IFEval, AlpacaEval requires you to download and install more than 10 repos, each with different dependencies and issues. This is, as you might expect, an actual nightmare. Many LM benchmarks are not optimized for performance and cost, and can take dozens of hours to compute. Evalchemy can run the full battery of benchmarks 3x faster compared to previous repos, due to parallelism optimizations in our implementation. It also allows easy installation and a consistent platform to run benchmarks and keep track in a leaderboard. We also support adding your own custom benchmarks and leaderboards. https://lnkd.in/gZQG9ZTY We hope that the open source community will help us develop this library into a convenient evaluation tool for AI engineers. Please tell us about your favorite benchmarks or features and we can add them!

    • No alternative text description for this image
  • Bespoke Labs reposted this

    View profile for Alex Dimakis, graphic

    Professor, University of Texas at Austin. Co-Director, Center for the Foundations of Machine Learning. BespokeLabsAI: data curation for post-training (we are hiring).

    Very happy from the news that our paper  "Which questions should I answer? Salience Prediction of Inquisitive Questions" received an outstanding paper award in EMNLP 2024.  Congratulations to Yating, Ritika and the whole team.   #EMNLP2024 The paper is available online: https://lnkd.in/gNQBenZ6

    • No alternative text description for this image
  • Bespoke Labs reposted this

    View profile for Alex Dimakis, graphic

    Professor, University of Texas at Austin. Co-Director, Center for the Foundations of Machine Learning. BespokeLabsAI: data curation for post-training (we are hiring).

    AI monoliths vs Unix Philosophy:  The case for Small Specialized Models. The current thinking in AI is that AGI is coming, and that one gigantic model will be able to reason and solve business problems ranging from customer support to product development. Currently, agents are basically big system prompts on the same gigantic model. Through prompt engineering, AI builders are trying to plan and execute complex multi-step processes. This is not working very well.  This monolith view of AI is in sharp contrast to how we teach engineers to build systems. When multiple people have to build complex systems, they should build specialized modular components. This makes systems reliable and helps large teams of people coordinate with specs that are easy to explain, engineer and evaluate. Monolithic gigantic AI systems are also extremely wasteful in terms of energy and cost: using GPT4o as a summarizer, fact checker, or user intent detector, reminds me of the first days of the big data wave, when people where spinning Hadoop clusters to process 1GB of data.  Instead, I would like to make the case for Small Specialized Models following the Unix philosophy guidelines:  1. Write programs that do one thing and do it well. 2. Write programs to work together. 3. Write programs to handle text streams, because that is a universal interface. Now replace programs with AI models. I believe that the best way to engineer AI systems will be to use post-training to specialize Llama small models into narrow focused jobs. 'Programming' these small specialized models will be done by creating post-training datasets. These datasets will be created by transforming internal data by prompting big foundation models and then distilling them through post-training. This is similar to the "Textbooks is all you need", but for narrow jobs like summarization, legal QA, and so on, as opposed to building general-purpose small models. Several papers have shown that it is possible to create post-training datasets by prompting big models and creating small specialized models that are faster and also outperform their big teachers in narrow tasks. Creating small specialized models is currently hard. Evaluation, post-training data curation and fine-tuning are tricky, and better tools are needed. Still, its good to go back to UNIX philosophy to inform our future architectures. 

    • No alternative text description for this image
  • View organization page for Bespoke Labs, graphic

    1,017 followers

    Bespoke Labs is excited to support the Datacomp community and related open-source AI efforts with curation tools, datasets and compute.

    View profile for Alex Dimakis, graphic

    Professor, University of Texas at Austin. Co-Director, Center for the Foundations of Machine Learning. BespokeLabsAI: data curation for post-training (we are hiring).

    Wow, I just realized that our Datacomp datasets have 800k downloads last month on Huggingface! Excited to see this project come so far. (if you don't know it already, Datacomp is the largest public multimodal dataset of images and captions).

    • No alternative text description for this image
  • View organization page for Bespoke Labs, graphic

    1,017 followers

    Quite the speaker list for the Metadata &AI Summit.

    View organization page for Acryl Data, graphic

    3,939 followers

    Learn about the hottest trends, biggest challenges, and best solutions around metadata and AI from a STAR-STUDDED speaker lineup starting tomorrow at the 2024 Metadata & AI Summit! Register here: https://lnkd.in/enwxCq-p Apple - Deepak Chandramouli, Ravi Sharma, Satish Kotha Netflix - Alicia J., Ashwin Iyer, Kevin C. Meta - Raghotham Murthy Slack - Nedra Albrecht Pinterest - Deepak Agarwal LinkedIn - Raghavan Muthuregunathan Microsoft - Sadid Hasan RunLLM - Joseph Gonzalez Deutsche Telekom Digital Labs - Shashidhar Singhal Checkout - Matthew Coudert Accenture - Teresa Tung DeepLearning - Joe Reis 🤓 Kraft Heinz - Jeffrey Tackes Generationship - Michelle Yi UC Berkeley - Joe Hellerstein Bespoke Labs - Alex Dimakis Grab - Harvey LI Merck - Dr. Harsha Gurulingappa Star Tree - Chinmay Soman Acryl Data & DataHub - Shirshanka Das, Maggie Hays We hope to see you there!

    • No alternative text description for this image
  • View organization page for Bespoke Labs, graphic

    1,017 followers

    Stellar panel: Our Chief Scientist Alex Dimakis will be presenting on the struggles of moving AI from research to production at a panel at #MetadataAISummit2024 with Joe Hellerstein Teresa Tung Deepak Agarwal hosted by Shirshanka Das

    View profile for Shirshanka Das, graphic

    Co-founder and CTO @ Acryl | Founder DataHub project | Ex-LinkedIn

    🤔 Why do enterprise AI initiatives often struggle to move from research to production? And more importantly - how can we bridge this gap effectively? I'm excited to moderate a stellar panel at #MetadataAISummit2024 featuring experts who're at the cutting edge and have successfully taken AI from research to production - repeatedly: ➡️ Teresa Tung (Senior Managing Director, Accenture - Leading AI transformation initiatives)  ➡️ Joe Hellerstein (Jim Gray Professor of CS, UC Berkeley - Pioneer in distributed systems & databases) ➡️ Deepak Agarwal (Chief AI Officer & VP, Pinterest - At the forefront of Internet-scale AI for more than a decade)  ➡️ Alex Dimakis (Co-founder & Chief Scientist, BespokeLabsAI - Leading researcher in ML systems) We'll mix theory, practical solutions and opine about the future. Stuff like:  🚨 How to stop AI models from going off the rails 💡 Proven governance frameworks  🌟 Why metadata matters and how to collect it cheaply  🤖 Balancing innovation with safety and reliability considerations 🔥 Battle-tested scaling strategies Whether you’re an optimist and can’t wait to have AI take over our daily lives, or a pessimist and worried about how we can safely use AI in production, you’re invited to listen in! 𝗪𝗵𝗲𝗻: 📅 Oct 29, 2024 | Tuesday 🕐 1:40 - 2:40pm EDT 𝗥𝗲𝗴𝗶𝘀𝘁𝗲𝗿 👉 https://lnkd.in/g_JJ6Pmx #AIGovernance #MLOps #ResponsibleAI #EnterpriseAI #Metadata

    • No alternative text description for this image
  • View organization page for Bespoke Labs, graphic

    1,017 followers

    This and the Nobel Prize for Hinton have made our day :)

    View profile for Shreya Rajpal, graphic

    CEO and Cofounder, Guardrails AI

    We benchmarked the OpenAI DevDay Eval product and Bespoke Labs's Minicheck for hallucination detection. Minicheck is the current best hallucination detector on Guardrails AI Hub. OpenAI: - Accuracy: 69.19% - F1: 0.7564 - High recall, lower precision Minicheck: - Accuracy: 74.96% - F1: 0.7516 - Better at detecting hallucinations Overall, OpenAI classifies more hallucinations as factual, but has high recall around detecting factual statements. However the current most precise model is Minicheck.

    • No alternative text description for this image
  • View organization page for Bespoke Labs, graphic

    1,017 followers

    Our small verifier model is now integrated with Ollama. Now we can all check our hallucinations locally

  • View organization page for Bespoke Labs, graphic

    1,017 followers

    Bespoke-minicheck-7B is better than GPT-o1 on grounded fact checking (and small enough to run on a macbook).

    View profile for Alex Dimakis, graphic

    Professor, University of Texas at Austin. Co-Director, Center for the Foundations of Machine Learning. BespokeLabsAI: data curation for post-training (we are hiring).

    Is GPT-o1 crushing all the benchmarks? It's better than GPT-4o on grounded fact checking (78.5 raised to 79.7 on WiCE). But more expensive and slow. Gladly, our 7B model Bespoke-minicheck is even better and gets 83 on this benchmark. https://lnkd.in/ggYNxUxR

    • No alternative text description for this image

Similar pages