-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add background processing jobs #5432
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #5432 +/- ##
===========================================
- Coverage 91.32% 91.00% -0.33%
===========================================
Files 141 144 +3
Lines 5869 5915 +46
===========================================
+ Hits 5360 5383 +23
- Misses 509 532 +23
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
…orker
…E.md
…kers to use on HF docker image
…riable from 1 to 2 workers
…d rename environment variable
The URL of the deployed environment for this PR is https://argilla-quickstart-pr-5432-ki24f765kq-no.a.run.app |
…when dataset distribution strategy is update
…command
|
||
Redis is used by Argilla to store information about jobs to be processed on background. The following environment variables are useful to config how Argilla connects to Redis: | ||
|
||
- `ARGILLA_REDIS_URL`: A URL string that contains the necessary information to connect to a Redis instance (Default: `redis://localhost:6379/0`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `ARGILLA_REDIS_URL`: A URL string that contains the necessary information to connect to a Redis instance (Default: `redis://localhost:6379/0`). | |
- `ARGILLA_REDIS_URL`: A URL string that contains the necessary information to connect to a Redis instance (Default: `redis://localhost:6379`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 0
it's the Redis database. We can remove it and it will use by default the db 0
but I though it was better to be explicit.
# Description <!-- Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change. --> After merging changes included in #5432, the SDK tests are failing since the argilla test server requires Redis and workers to work. This PR uses the hf-spaces standalone image which includes and manages all the server deps. **Type of change** <!-- Please delete options that are not relevant. Remember to title the PR according to the type of change --> - Bug fix (non-breaking change which fixes an issue) **How Has This Been Tested** <!-- Please add some reference about how your feature has been tested. --> **Checklist** <!-- Please go over the list and make sure you've taken everything into account --> - I added relevant documentation - I followed the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)
Description
This PR add the following changes:
rq
to help us execute background jobs.rq
workers inside honcho Procfile.ARGILLA_REDIS_URL
environment variable.README.md
file adding Redis as dependency to install.BACKGROUND_NUM_WORKERS
environment variable to specify the number of workers in the HF Space container.ModifyDockerfile
template on HF to include the environment variable [TASK] Modify HF Space template to include a comment aboutBACKGROUND_NUM_WORKERS
environment variable #5443:TODO
sections before merging.Procfile
Redis process to the following:Allow tests job workers synchronously (with pytest)It's not working due to asyncio stuff (running an asynchronous loop inside another one, more info here: Task got Future attached to a different loop rq/rq#1986).Closes #5431
Benchmarks
The following timings were obtained updating the distribution strategy of a dataset with 100 and 10.000 records, using a basic and an upgraded CPU on HF Spaces, with and without persistent storage and measuring how much time the background job takes to complete:
CPU basic: 2 vCPU, 16GB RAM
CPU upgrade: 8 vCPU, 32GB RAM
Type of change
How Has This Been Tested
Checklist