Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_collect_docker_size queries for all items in the registry #107

Open
tiagomeireles opened this issue Apr 10, 2023 · 4 comments
Open

_collect_docker_size queries for all items in the registry #107

tiagomeireles opened this issue Apr 10, 2023 · 4 comments

Comments

@tiagomeireles
Copy link

Running an aql query to get all items is very slow on large repositories. I also use object storage for the binary store which likely contributes to slower queries.

Example rule combination that I'm trying to use:

    - name: Example
      rules:
        - rule: Repo
          name: "docker"
        - rule: IncludePath
          masks: "app/*"
        - rule: DeleteDockerImagesOlderThan
          days: 14

args = ["items.find", {"$or": [{"repo": repo} for repo in docker_repos]}]

I tested replacing this line with args = ["items.find", {"$or": [{'path': {'$match': 'app/*'}}]}] and it is significantly faster while retaining the size info.

Happy to attempt to contribute a fix. I thought about two potential options; disabling getting the size or accepting a mask on DeleteDockerImagesOlderThan.

@allburov
Copy link
Member

it is significantly faster while retaining the size info.

What timing are you talking about, could you give an example for your case?

Like if the some cleanup-script runs even for an hour each night - it should be fine, imo.

@allburov
Copy link
Member

I think right now it's not possible to pass other rules attributes to DeleteDockerImagesOlderThan - this is the reason why we requested it this way.

@tiagomeireles
Copy link
Author

I stopped it after 3 hours.

I have a large backlog of things to cleanup, repo wide searches are very slow. Right now i'm using the following patch to filter to the common path of the artifacts returned, this avoid any additional parameters.

            common_path = path.commonpath([artifact['path'] for artifact in artifacts])
            args = ["items.find", {"$and": [{"repo": {"$eq": repo for repo in docker_repos}}, {"$or": [{"path": { "$match": f"{common_path}/*", }}]}]}]

Deletes are also slow in my case, each delete takes a couple minutes. Right now its performed serially, have parallel deletes been considered?

@allburov
Copy link
Member

I stopped it after 3 hours.

It sounds awful, agreed. With common path it's possible that the common path will be / - so the request will be the same...
But we can add it as a quick fix if it helps for some cases. Could you create a PR for that?

have parallel deletes been considered?

There ware no needs, but it's possible. We could use thread pool for that as an easy fix.
If you want to add it too - please create a separate PR for that, don't mix with the common path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants