Name		Name	Last commit message	Last commit date
Latest commit History 2,073 Commits
.github		.github
docs		docs
frontend		frontend
k8s		k8s
scripts		scripts
src		src
tests		tests
.codecov.yml		.codecov.yml
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
commitlint.config.js		commitlint.config.js
docker-compose.elasticsearch.yaml		docker-compose.elasticsearch.yaml
docker-compose.opensearch.yaml		docker-compose.opensearch.yaml
docker-compose.yaml		docker-compose.yaml
environment_dev.yml		environment_dev.yml
environment_docs.yml		environment_docs.yml
pull_request_template.md		pull_request_template.md
pyproject.toml		pyproject.toml
quickstart.Dockerfile		quickstart.Dockerfile
quickstart.README.md		quickstart.README.md
quickstart.requirements.txt		quickstart.requirements.txt
release.Dockerfile		release.Dockerfile
setup.py		setup.py

Repository files navigation

✨ Argilla ✨

Open-source data curation platform for LLMs

MLOps for NLP: from data labeling to model monitoring

argilla-feedback.mp4

📄 Documentation | 🚀 Quickstart | 🎼 Cheatsheet | 📏 Principles | 🫱🏾‍🫲🏼 Contribute

🚀 Quickstart

Argilla is an open-source data curation platform for LLMs. Using Argilla, everyone can build robust language models through faster data curation using both human and machine feedback. We provide support for each step in the MLOps cycle, from data labeling to model monitoring.

There are different options to get started:

Take a look at our quickstart page 🚀
Start contributing by looking at our contributor guidelines 🫱🏾‍🫲🏼
Skip some steps with our cheatsheet 🎼

🎼 Cheatsheet

Deploy Locally

docker run -d --name argilla -p 6900:6900 argilla/argilla-quickstart:latest

Deploy on Hugging Face Hub

LLM support

import argilla as rg

dataset = rg.FeedbackDataset(
    guidelines="Please, read the question carefully and try to answer it as accurately as possible.",
    fields=[
        rg.TextField(name="question"),
        rg.TextField(name="answer"),
    ],
    questions=[
        rg.RatingQuestion(
            name="answer_quality",
            description="How would you rate the quality of the answer?",
            values=[1, 2, 3, 4, 5],
        ),
        rg.TextQuestion(
            name="answer_correction",
            description="If you think the answer is not accurate, please, correct it.",
            required=False,
        ),
    ]
)

Create Records

import argilla as rg

rec = rg.TextClassificationRecord(
    text="Sun Is Closer... a parachute.",
    prediction=[("Sci/Tech", 0.75), ("World", 0.25)],
    annotation="Sci/Tech"
)
rg.log(records=record, name="news")

Query datasets

import argilla as rg

rg.load(name="news", query="text:spor*")

Semantic search

import argilla as rg

record = rg.TextClassificationRecord(
    text="Hello world, I am a vector record!",
    vectors= {"my_vector_name": [0, 42, 1984]}
)
rg.log(name="dataset", records=record)
rg.load(name="dataset", vector=("my_vector_name", [0, 43, 1985]))

Weak supervision

from argilla.labeling.text_classification import add_rules, Rule

rule = Rule(query="positive impact", label="optimism")
add_rules(dataset="go_emotion", rules=[rule])

Train models

from argilla.training import ArgillaTrainer

trainer = ArgillaTrainer(name="news", workspace="recognai", framework="setfit")
trainer.train()

📏 Principles

Open: Argilla is free, open-source, and 100% compatible with major NLP libraries (Hugging Face transformers, spaCy, Stanford Stanza, Flair, etc.). In fact, you can use and combine your preferred libraries without implementing any specific interface.
End-to-end: Most annotation tools treat data collection as a one-off activity at the beginning of each project. In real-world projects, data collection is a key activity of the iterative process of ML model development. Once a model goes into production, you want to monitor and analyze its predictions and collect more data to improve your model over time. Argilla is designed to close this gap, enabling you to iterate as much as you need.
User and Developer Experience: The key to sustainable NLP solutions are to make it easier for everyone to contribute to projects. Domain experts should feel comfortable interpreting and annotating data. Data scientists should feel free to experiment and iterate. Engineers should feel in control of data pipelines. Argilla optimizes the experience for these core users to make your teams more productive.
Beyond hand-labeling: Classical hand-labeling workflows are costly and inefficient, but having humans in the loop is essential. Easily combine hand-labeling with active learning, bulk-labeling, zero-shot models, and weak supervision in novel data annotation workflows**.

🫱🏾‍🫲🏼 Contribute

We love contributors and have launched a collaboration with JustDiggit to hand out our very own bunds and help the re-greening of sub-Saharan Africa. To help our community with the creation of contributions, we have created our developer and contributor docs. Additionally, you can always schedule a meeting with our Developer Advocacy team so they can get you up to speed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ Argilla ✨

Open-source data curation platform for LLMs

MLOps for NLP: from data labeling to model monitoring

📄 Documentation | 🚀 Quickstart | 🎼 Cheatsheet | 📏 Principles | 🫱🏾‍🫲🏼 Contribute

🚀 Quickstart

🎼 Cheatsheet

Deploy Locally

Deploy on Hugging Face Hub

LLM support

Create Records

Query datasets

Semantic search

Weak supervision

Train models

📏 Principles

🫱🏾‍🫲🏼 Contribute

🥇 Contributors

About

Releases

Packages

Languages

License

not-lain/argilla

Folders and files

Latest commit

History

Repository files navigation

✨ Argilla ✨

Open-source data curation platform for LLMs

MLOps for NLP: from data labeling to model monitoring

📄 Documentation | 🚀 Quickstart | 🎼 Cheatsheet | 📏 Principles | 🫱🏾‍🫲🏼 Contribute

🚀 Quickstart

🎼 Cheatsheet

📏 Principles

🫱🏾‍🫲🏼 Contribute

🥇 Contributors

About

Resources

License

Stars

Watchers

Forks

Languages