Argilla is a collaboration tool for AI engineers and domain experts who need to build high-quality datasets for their projects.
This repository only contains developer info about the backend server. If you want to get started, we recommend taking a look at our main repository or our documentation.
Are you a contributor or do you want to understand what is going on under the hood, please keep reading the documentation below.
The following is a high-level overview of relevant folders and files in the argilla-server source code:
/argilla_server
/api # Including all the API endpoints and related code
/errors # Custom exceptions and error handlers
/v1
/handlers # Request FastAPI handlers
/v1
/policies # Authorization policies for resources
/v1
/schemas # Pydantic schemas for request and response bodies
/v1
/contexts # Domain contexts for the application including business logic
accounts.py
datasets.py
search.py
...
/models # SQLAlchemy ORM models for the database
database.py
Folders inside /api
are organized by API version, having right now only v1 implemented. This is in contrast to /contexts
and /models
folders that are not versioned. This is because the business logic and canonical database models are not exposed directly to the API, and therefore are not subject to versioning.
By default all commands executed with pdm run
will get environment variables from .env.dev
except the command pdm test
which will overwrite some of them using values coming from .env.test
file.
These environment variables can be overridden if necessary so feel free to define your own ones locally.
This single command prepares the development server to run locally. It does so by chaining commands to migrate the databse, create default users and launch the server on the right port.
pdm server-dev
A SQLite database located at ~/.argilla/argilla-test.db
will be automatically created to run tests. You can run the
entire test suite using the following custom PDM command:
pdm test
If you need to run the frontend server you can follow the instructions at the argilla-frontend project.
pdm cli
By default, the SQLite database located at ~/.argilla/argilla.db
will be used. You can create the database and run migrations with
the following custom PDM command:
pdm migrate
pdm server
pdm worker
This section list and describe the commands offered by the argilla_server
Python package. If you need more information about the available
commands in the CLI you can use the --help
option:
python -m argilla_server --help
If you need more information about a specific command you can use the --help
option too:
python -m argilla_server database --help
The argilla_server start
command will start the Argilla server blocking the current terminal. You can use the following command to start the server:
python -m argilla_server start
The following options can be provided:
--host
: The host where the Argilla server will be bound. Default value is0.0.0.0
.--port
: The port where the Argilla server will be bound. Default value is6900
.--access-log/--no-access-log
: Enable/disable the server access log. Default value isTrue
.
You can also launch the argilla server using uvicorn
:
uvicorn argilla_server:app --port 6900
Note
For more details about FastAPI and uvicorn, see here. You can also visit the uvicorn official documentation here.
The argilla_server database
group of commands offers functionality for managing the Argilla server database:
python -m argilla_server database migrate
: applies the database migrations.python -m argilla_server database revisions
: list the different revisions to which the database can be migrated.
Since Argilla 1.6.0, the information about users and workspaces, and the data of the Dataset
s is stored in an SQL database (SQLite or PostgreSQL). That being said,
every release of Argilla may require a database migration to update the database schema to the new version. This section explains how to perform the database migrations.
To apply the migrations, a connection to the database needs to be established. In the case that SQLite is used, then the only way to apply the migrations is by
executing the migration command from the same machine where the Argilla server is running. In the case that PostgreSQL is used, then the migration command can be executed
from any machine that has access to the PostgreSQL database setting the ARGILLA_DATABASE_URL
environment variable to the URL of the database.
To list the available database revisions/migrations, the argilla_server database revisions
command can be used. This command will list the different revisions to which
the database can be migrated. As several revisions could be generated for a single release, the command will also show the latest revision that was generated for each release.
python -m argilla_server database revisions
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
Tagged revisions
-----------------
• 1.7 (revision: '1769ee58fbb4')
• 1.8 (revision: 'ae5522b4c674')
• 1.11 (revision: '3ff6484f8b37')
• 1.13 (revision: '1e629a913727')
• 1.17 (revision: '84f6b9ff6076')
• 1.18 (revision: 'bda6fe24314e')
• 1.28 (revision: 'ca7293c38970')
• 2.0 (revision: '237f7c674d74')
Alembic revisions
-----------------
45a12f74448b -> 237f7c674d74 (head), add status column to records table
d00f819ccc67 -> 45a12f74448b, add distribution column to datasets table
ca7293c38970 -> d00f819ccc67, update responses user_id foreign key
bda6fe24314e -> ca7293c38970, change suggestions score column to json
7850ab5b42d9 -> bda6fe24314e, create vectors table
84f6b9ff6076 -> 7850ab5b42d9, create vectors settings table
b8458008b60e -> 84f6b9ff6076, add last_activity_at to datasets table
7cbcccf8b57a -> b8458008b60e, add allow_extra_metadata column to datasets table
1e629a913727 -> 7cbcccf8b57a, create metadata_properties table
3fc3c0839959 -> 1e629a913727, fix suggestions type enum values
8c574ada5e5f -> 3fc3c0839959, create suggestions table
3ff6484f8b37 -> 8c574ada5e5f, update_enum_columns
ae5522b4c674 -> 3ff6484f8b37, add metadata column to records table
e402e9d9245e -> ae5522b4c674, create fields table
8be56284dac0 -> e402e9d9245e, create responses table
3a8e2f9b5dea -> 8be56284dac0, create records table
b9099dc08489 -> 3a8e2f9b5dea, create questions table
1769ee58fbb4 -> b9099dc08489, create datasets table
82a5a88a3fa5 -> 1769ee58fbb4, create workspaces_users table
74694870197c -> 82a5a88a3fa5, create workspaces table
<base> -> 74694870197c, create users table
Current revision
----------------
Current revision(s) for sqlite:////Users/root/.argilla/argilla.db?check_same_thread=False:
Rev: 237f7c674d74 (head)
Parent: 45a12f74448b
Path: /Users/root/argilla/argilla-server/src/argilla_server/alembic/versions/237f7c674d74_add_status_column_to_records_table.py
add status column to records table
Revision ID: 237f7c674d74
Revises: 45a12f74448b
Create Date: 2024-06-18 17:59:36.992165
If the argilla_server database migrate
command is called without any argument, then all the unapplied migrations will be applied:
python -m argilla_server database migrate
The argilla_server database migrate
command can also be used to apply a specific migration. To do so, the --revision
option needs to be provided with the name of the revision or the Argilla
version to which the database will be migrated.
python -m argilla_server database migrate --revision 2.0
Warning
Applying a revision that is older than the current revision of the database will revert the database to the state of that revision, which means that the data could be lost.
The argilla_server database users
group of commands offers functionality for managing the users of the Argilla server.
python -m argilla_server database users create
: creates a new user in the Argilla server database.python -m argilla_server database users create_default
: creates the default users in the Argilla server database.python -m argilla_server database users migrate
: migrates the users from the oldYAML
file to the Argilla server database.python -m argilla_server database users update
: updates a user in the Argilla server database.
The argilla_server search-engine
group of commands offers functionality to work with the search engine used by Argilla.
python -m argilla_server search-engine reindex
: reindex all Argilla entities into search engine.
Argilla uses RQ as background job manager. RQ depends on Redis to store and retrieve information about the jobs to be processed.
Once that you have correctly installed Redis on your system, you can start the RQ worker by running the following CLI command:
python -m argilla_server worker
To help our community with the creation of contributions, we have created our community docs. Additionally, you can always schedule a meeting with our Developer Advocacy team so they can get you up to speed.
We continuously work on updating our plans and our roadmap and we love to discuss those with our community. Feel encouraged to participate.