Bespoke Labs

Data Infrastructure and Analytics

Mountain View, California 1,867 followers

Bespoke Labs is a venture funded startup creating AI tools for data curation and post-training LLMs. (We are hiring!)

View all 8 employees

About us

Data curation and Small Specialized Models using Generative AI.

Website: https://bespokelabs.ai/
External link for Bespoke Labs
Industry: Data Infrastructure and Analytics
Company size: 2-10 employees
Headquarters: Mountain View, California
Type: Privately Held

Locations

Primary

800 W El Camino Real

Mountain View, California 94040, US

Get directions

Employees at Bespoke Labs

See all employees

Updates

Bespoke Labs

1,867 followers
6d
Report this post
We are excited to contribute to the best 32B reasoning model- The post-training data was created with our Bespoke Curator tool.
Ryan Marten

AI Researcher
6d Edited

Announcing OpenThinker-32B 🧠 The best open-data reasoning model distilled from DeepSeek-R1! Our results show that large, carefully curated datasets with verified R1 annotations produce SoTA reasoning models. OpenThinker-32B outperforms **all** 32B reasoning models including DeepSeek-R1-Distill-Qwen-32B (a closed data model) in MATH500 and GPQA Diamond, and shows similar performance in other benchmarks. Blog Post: https://lnkd.in/eD7gkK7b Model: https://lnkd.in/edGxGSjY Dataset: https://lnkd.in/eNpTsmsQ Data Curation Code: https://lnkd.in/eBfGvQSq Evaluation Code: https://lnkd.in/eErj2vbp An incredible team effort from Negin Raoof, Etash Guha, Trung Vu, Sedrick Keh, Marianna Nezhurina, Zaid Khan, George Smyrnis, Shreyas Pimpalgaonkar, Hritik Bansal, Jean Mercat, Mike Merrill, Niklas Muennighoff, Jenia Jitsev, Mahesh (Maheswaran) Sathiamoorthy, Alex Dimakis, Yejin Choi, Tatsu Hashimoto, and Ludwig Schmidt
Like Comment Share
Bespoke Labs

1,867 followers
1mo
Report this post
We are going to break the so-called AI wall with synthetic data. Please check out the repository (https://lnkd.in/gNRdxC8v) and help us by starring it!
Mahesh (Maheswaran) Sathiamoorthy

Co-Founder/CEO of Bespoke Labs. Ex-Google DeepMind
1mo

We are happy to announce Curator, an open-source library designed to streamline synthetic data generation! High-quality synthetic data generation is essential in training and evaluating LLMs/agents/RAG pipelines these days, but tooling around this is still entirely lacking! So we built Curator, and I think it has easily 10x'ed our productivity in creating post-training datasets! While there are many libraries to prompt LLMs, the semantics of generating synthetic data is different from prompting. For example, we need to process a large number of prompts (sometimes in millions or more) while accepting some failures, utilize several stages of prompting, incorporate human feedback, and filter out bad data using verifiers and heuristics. Curator addresses these challenges: 1. Supports efficient data generation supporting several API providers and local models. 2. Recovers from failures and caches previous output. 3. Utilizes structured outputs to enable programming complex data generation pipelines. 4. Visualize your data generation in real time. We are working on many more features (such as adding verifiers, diversity and data quality indicators, calling external tools for generating data etc.). We hope to help the community create high-quality datasets to train their own bespoke models! Links in comments!!
Like Comment Share
Bespoke Labs

1,867 followers
2mo
Report this post
Excited that our models are integrated with Ollama, and their incredible open source AI community- announced today at Ollama unwrapped event in GitHub
2 Comments

Like Comment Share
Bespoke Labs

1,867 followers
2mo
Report this post
We're #hiring a new Founding Software Engineer in Mountain View, California. Apply today or share this post with your network.

Founding Software Engineer

Bespoke Labs, Mountain View, CA

1 Comment

Like Comment Share
Bespoke Labs

1,867 followers
3mo
Report this post
Bespoke Labs is excited to contribute to Evalchemy, an open-source platform for LLM evaluation. The problem: Running popular Evals for an LLM like MMLU, MTBench, WildBench, RepoBench, IFEval, AlpacaEval requires you to download and install more than 10 repos, each with different dependencies and issues. This is, as you might expect, an actual nightmare. Many LM benchmarks are not optimized for performance and cost, and can take dozens of hours to compute. Evalchemy can run the full battery of benchmarks 3x faster compared to previous repos, due to parallelism optimizations in our implementation. It also allows easy installation and a consistent platform to run benchmarks and keep track in a leaderboard. We also support adding your own custom benchmarks and leaderboards. https://lnkd.in/gZQG9ZTY We hope that the open source community will help us develop this library into a convenient evaluation tool for AI engineers. Please tell us about your favorite benchmarks or features and we can add them!
Like Comment Share
Bespoke Labs reposted this
Alex Dimakis

Professor, University of California Berkeley. Co-Founder, Bespoke Labs: Data curation for post-training.
3mo
Report this post
Very happy from the news that our paper "Which questions should I answer? Salience Prediction of Inquisitive Questions" received an outstanding paper award in EMNLP 2024. Congratulations to Yating, Ritika and the whole team. #EMNLP2024 The paper is available online: https://lnkd.in/gNQBenZ6
22 Comments

Like Comment Share
Bespoke Labs reposted this
Alex Dimakis

Professor, University of California Berkeley. Co-Founder, Bespoke Labs: Data curation for post-training.
3mo
Report this post
AI monoliths vs Unix Philosophy: The case for Small Specialized Models. The current thinking in AI is that AGI is coming, and that one gigantic model will be able to reason and solve business problems ranging from customer support to product development. Currently, agents are basically big system prompts on the same gigantic model. Through prompt engineering, AI builders are trying to plan and execute complex multi-step processes. This is not working very well. This monolith view of AI is in sharp contrast to how we teach engineers to build systems. When multiple people have to build complex systems, they should build specialized modular components. This makes systems reliable and helps large teams of people coordinate with specs that are easy to explain, engineer and evaluate. Monolithic gigantic AI systems are also extremely wasteful in terms of energy and cost: using GPT4o as a summarizer, fact checker, or user intent detector, reminds me of the first days of the big data wave, when people where spinning Hadoop clusters to process 1GB of data. Instead, I would like to make the case for Small Specialized Models following the Unix philosophy guidelines: 1. Write programs that do one thing and do it well. 2. Write programs to work together. 3. Write programs to handle text streams, because that is a universal interface. Now replace programs with AI models. I believe that the best way to engineer AI systems will be to use post-training to specialize Llama small models into narrow focused jobs. 'Programming' these small specialized models will be done by creating post-training datasets. These datasets will be created by transforming internal data by prompting big foundation models and then distilling them through post-training. This is similar to the "Textbooks is all you need", but for narrow jobs like summarization, legal QA, and so on, as opposed to building general-purpose small models. Several papers have shown that it is possible to create post-training datasets by prompting big models and creating small specialized models that are faster and also outperform their big teachers in narrow tasks. Creating small specialized models is currently hard. Evaluation, post-training data curation and fine-tuning are tricky, and better tools are needed. Still, its good to go back to UNIX philosophy to inform our future architectures.
19 Comments

Like Comment Share
Bespoke Labs

1,867 followers
3mo
Report this post
Bespoke Labs is excited to support the Datacomp community and related open-source AI efforts with curation tools, datasets and compute.
Alex Dimakis

Professor, University of California Berkeley. Co-Founder, Bespoke Labs: Data curation for post-training.
3mo

Wow, I just realized that our Datacomp datasets have 800k downloads last month on Huggingface! Excited to see this project come so far. (if you don't know it already, Datacomp is the largest public multimodal dataset of images and captions).
Like Comment Share
Bespoke Labs

1,867 followers
3mo
Report this post
Quite the speaker list for the Metadata &AI Summit.
Acryl Data

4,312 followers
3mo Edited

Learn about the hottest trends, biggest challenges, and best solutions around metadata and AI from a STAR-STUDDED speaker lineup starting tomorrow at the 2024 Metadata & AI Summit! Register here: https://lnkd.in/enwxCq-p Apple - Deepak Chandramouli, Ravi Sharma, Satish Kotha Netflix - Alicia J., Ashwin Iyer, Kevin C. Meta - Raghotham Murthy Slack - Nedra Albrecht Pinterest - Deepak Agarwal LinkedIn - Raghavan Muthuregunathan Microsoft - Sadid Hasan RunLLM - Joseph Gonzalez Deutsche Telekom Digital Labs - Shashidhar Singhal Checkout - Matthew Coudert Accenture - Teresa Tung DeepLearning - Joe Reis 🤓 Kraft Heinz - Jeffrey Tackes Generationship - Michelle Yi UC Berkeley - Joe Hellerstein Bespoke Labs - Alex Dimakis Grab - Harvey LI Merck - Dr. Harsha Gurulingappa Star Tree - Chinmay Soman Acryl Data & DataHub - Shirshanka Das, Maggie Hays We hope to see you there!
Like Comment Share
Bespoke Labs

1,867 followers
3mo
Report this post
Stellar panel: Our Chief Scientist Alex Dimakis will be presenting on the struggles of moving AI from research to production at a panel at #MetadataAISummit2024 with Joe Hellerstein Teresa Tung Deepak Agarwal hosted by Shirshanka Das
Shirshanka Das

Co-founder and CTO @ Acryl | Founder DataHub project | Ex-LinkedIn
3mo

🤔 Why do enterprise AI initiatives often struggle to move from research to production? And more importantly - how can we bridge this gap effectively? I'm excited to moderate a stellar panel at #MetadataAISummit2024 featuring experts who're at the cutting edge and have successfully taken AI from research to production - repeatedly: ➡️ Teresa Tung (Senior Managing Director, Accenture - Leading AI transformation initiatives) ➡️ Joe Hellerstein (Jim Gray Professor of CS, UC Berkeley - Pioneer in distributed systems & databases) ➡️ Deepak Agarwal (Chief AI Officer & VP, Pinterest - At the forefront of Internet-scale AI for more than a decade) ➡️ Alex Dimakis (Co-founder & Chief Scientist, BespokeLabsAI - Leading researcher in ML systems) We'll mix theory, practical solutions and opine about the future. Stuff like: 🚨 How to stop AI models from going off the rails 💡 Proven governance frameworks 🌟 Why metadata matters and how to collect it cheaply 🤖 Balancing innovation with safety and reliability considerations 🔥 Battle-tested scaling strategies Whether you’re an optimist and can’t wait to have AI take over our daily lives, or a pessimist and worried about how we can safely use AI in production, you’re invited to listen in! 𝗪𝗵𝗲𝗻: 📅 Oct 29, 2024 | Tuesday 🕐 1:40 - 2:40pm EDT 𝗥𝗲𝗴𝗶𝘀𝘁𝗲𝗿 👉 https://lnkd.in/g_JJ6Pmx #AIGovernance #MLOps #ResponsibleAI #EnterpriseAI #Metadata
Like Comment Share

Bespoke Labs

Data Infrastructure and Analytics

Mountain View, California 1,867 followers

Bespoke Labs is a venture funded startup creating AI tools for data curation and post-training LLMs. (We are hiring!)

About us

Locations

Employees at Bespoke Labs

Mahesh (Maheswaran) Sathiamoorthy

Co-Founder/CEO of Bespoke Labs. Ex-Google DeepMind

Alex Dimakis

Professor, University of California Berkeley. Co-Founder, Bespoke Labs: Data curation for post-training.

Ryan Marten

AI Researcher

Charlie Cheng-Jie Ji

AI Engineer @ Bespoke Labs | Gorilla LLM | Sky Lab | CS & DS @ UC Berkeley | Prev. AWS, Tencent

Updates

Founding Software Engineer

Bespoke Labs, Mountain View, CA

Join now to see what you are missing

Similar pages

Guardrails AI

Ollama

BESPOKE LAB

Simplicit AI

Contextual AI

Google DeepMind

Potion AI

8VC

WarpStream

Imbue

Browse jobs

Scientist jobs

Machine Learning Engineer jobs

Engineer jobs

Researcher jobs

Developer jobs

Product Management Intern jobs

Associate Product Manager jobs

Analyst jobs

Analog Design Engineer jobs

Product Manager jobs

Account Executive jobs

Account Manager jobs

Associate jobs

Project Manager jobs

Principal Scientist jobs

Head of Analytics jobs

Science Manager jobs

Intelligence Specialist jobs

Solutions Engineer jobs

Python Developer jobs