title | shortTitle | versions | topics | type | intro | redirect_from | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Running CodeQL queries at scale with multi-repository variant analysis |
Queries at scale |
|
|
reference |
You can run {% data variables.product.prodname_codeql %} queries on a large number of repositories on {% data variables.product.github %} from {% data variables.product.prodname_vscode %}. |
|
About running {% data variables.product.prodname_codeql %} queries at scale with multi-repository variant analysis
With multi-repository variant analysis (MRVA), you can run {% data variables.product.prodname_codeql %} queries on a list of up to 1,000 repositories on {% data variables.product.github %} from {% data variables.product.prodname_vscode %}.
When you run MRVA against a list of repositories, your query is run against each repository that has a {% data variables.product.prodname_codeql %} database available to analyze. {% data variables.product.github %} creates and stores the latest {% data variables.product.prodname_codeql %} database for the default branch of thousands of public repositories, including every repository that runs {% data variables.product.prodname_code_scanning %} using {% data variables.product.prodname_codeql %}.
You need to enable {% data variables.product.prodname_code_scanning %} using {% data variables.product.prodname_codeql %} on {% data variables.product.github %}, using either default setup or advanced setup, before adding your repository to a list for analysis. For information about enabling {% data variables.product.prodname_code_scanning %} using {% data variables.product.prodname_codeql %}, see "AUTOTITLE."
How MRVA runs queries against {% data variables.product.prodname_codeql %} databases on {% data variables.product.prodname_dotcom_the_website %}
When you run MRVA, the analysis is run entirely using {% data variables.product.prodname_actions %}. You don't need to create any workflows, but you must specify which repository the {% data variables.product.prodname_codeql %} for {% data variables.product.prodname_vscode %} extension should use as a controller repository. As the analysis of each repository completes, the results are sent to {% data variables.product.prodname_vscode_shortname %} for you to view.
The {% data variables.product.prodname_codeql %} extension builds a {% data variables.product.prodname_codeql %} pack with your library and any library dependencies. The {% data variables.product.prodname_codeql %} pack and your selected repository list are posted to an API endpoint on {% data variables.product.github %}, which triggers a {% data variables.product.prodname_actions %} dynamic workflow in your controller repository. The workflow spins up multiple parallel jobs to execute the {% data variables.product.prodname_codeql %} query against the repositories in the list, optimizing query execution. As each repository is analyzed, the results are processed and displayed in {% data variables.product.prodname_vscode_shortname %}.
-
You must define a controller repository before you can run your first multi-repository variant analysis.
-
Controller repositories can be empty, but they must have at least one commit.
{% ifversion ghec %}
-
The controller repository must be hosted on the same site as the repositories that you want to analyze using MRVA, that is: {% data variables.product.prodname_dotcom_the_website %} or {% data variables.enterprise.data_residency_domain %}. If you want to run MRVA on {% data variables.enterprise.data_residency %}, see "Changing the {% data variables.product.github %} URL used by the extension." {% endif %}
-
On {% data variables.product.prodname_dotcom_the_website %}, the controller repository visibility can be "public" if you plan to analyze only public repositories. The variant analysis will be free.
-
The controller repository visibility must be "private" if you need to analyze any private or internal repositories on {% data variables.product.prodname_dotcom_the_website %}. {% ifversion fpt or ghec %}
Any actions minutes that you use to run variant analysis on private or internal repositories, above the free limit, is charged to the repository owner. For more information about free minutes and billing, see "AUTOTITLE."{% endif %}
-
In the "Variant Analysis Repositories" view, click Set up controller repository to display a field for the controller repository.
-
Type the owner and name of the repository on {% data variables.product.github %} that you want to use as your controller repository and press the Enter key. {% ifversion ghec %}This repository must be on the same instance of {% data variables.product.github %} as the repositories that you want to analyze, see "Changing the {% data variables.product.github %} URL used by the extension."{% endif %}
-
If you are prompted to authenticate with {% data variables.product.github %}, follow the instructions and sign in to your account. When you have finished, a prompt from {% data variables.product.github %} Authentication may ask for permission to open in {% data variables.product.prodname_vscode %}, click Open.
The name of the controller repository is saved in your settings for the {% data variables.product.prodname_codeql %} extension. For information on how to edit the controller repository, see "AUTOTITLE."
-
By default, the "Variant Analysis Repositories" view shows the default lists of the Top 10, Top 100, and Top 1000 public repositories on {% data variables.product.prodname_dotcom_the_website %} for the language that you are analyzing. If your controller repository is hosted on {% data variables.enterprise.data_residency_domain %}, these lists are not available.
-
Optionally, you can add a new repository, organization, or list.
-
In the "Variant Analysis Repositories" view, click + to add a new database.
-
From the dropdown menu, select From a {% data variables.product.github %} repository or All repositories of {% data variables.product.github %} org or owner.
-
Type the identifier of the repository or organization that you want to use into the field.
-
-
Select which {% data variables.product.github %} repository or repositories you want to run your query against.
-
Open the query you want to run, right-click in the query file, and select {% data variables.product.prodname_codeql %}: Run Variant Analysis to start variant analysis.
Note
To a cancel a variant analysis run, click Stop query in the "Variant Analysis Results" view.
-
In the "Variant Analysis Repositories" view, click + to add a new database.
-
From the dropdown menu, select From a {% data variables.product.github %} repository or All repositories of {% data variables.product.github %} org or owner.
-
Type the identifier of the repository or organization that you want to use into the field.
When you run MRVA, there are two key places where errors and warnings are displayed:
-
{% data variables.product.prodname_vscode %} errors: any problems with creating a {% data variables.product.prodname_codeql %} pack and sending the analysis to {% data variables.product.github %} are reported as {% data variables.product.prodname_vscode %} errors in the bottom right corner of the application. Information is also available in the "Problems" view.
-
"Variant Analysis Results": any problems with the variant analysis run are reported in this view.
As soon as a workflow to run your variant analysis on {% data variables.product.github %} is running, a "Variant Analysis Results" view opens to display the results as they are ready. You can use this view to monitor progress, see any errors, and access the workflow logs in your controller repository.
When your variant analysis run is scheduled, the "Results" view automatically opens. Initially, the view shows a list of every repository that was scheduled for analysis. As each repository is analyzed, the view is updated to show a summary of the number of results. To view the detailed results for a repository (including results paths), click the repository name.
For each repository, you can see:
-
Number of results found by the query
-
Visibility of the repository
-
Whether analysis is still running or has finished
-
Number of stars the repository has on {% data variables.product.github %}
-
Click the repository name to show a summary of each result.
-
Explore the information available for each result using links to the source files on {% data variables.product.github %}. For data flow queries, there'll be an additional "Show paths" link.
You can export your results for further analysis or to discuss them with collaborators. In the "Results" view, click Export results to export the results to a secret gist on {% data variables.product.github %} or to a Markdown file in your workspace.
Note
{% data variables.product.prodname_codeql %} analysis always requires a {% data variables.product.prodname_codeql %} database to run queries against. When you run variant analysis against a list of repositories, your query will only be executed against the repositories that currently have a {% data variables.product.prodname_codeql %} database available to download. The best way to make a repository available for variant analysis is to enable {% data variables.product.prodname_code_scanning %} with {% data variables.product.prodname_codeql %}. For information about enabling {% data variables.product.prodname_code_scanning %} using {% data variables.product.prodname_codeql %}, see "AUTOTITLE."
-
In the "Variant Analysis Repositories" view, click the "Add list" icon.
-
Type a name for the new list and press Enter.
-
Select your list in the view, then click + to add a repository to your list.
You can manage and edit your custom lists by right-clicking on either the list name, or a repository name within the list, and selecting an option from the context menu.
The custom lists are stored in your workspace in a databases.json
file. If you want to edit this file directly in {% data variables.product.prodname_vscode %}, you can open it by clicking { } in the view header.
For example, if you want to continue analyzing a set of repositories that had results for your query, click Copy repository list in the "Variant Analysis Results" view to add a list of only the repositories that have results to the clipboard as JSON.
In the following example snippet, my-organization/my-repository
had results for a query:
{
"name": "new-repo-list",
"repositories": [
"my-organization/my-repository"
]
}
You can then insert the new-repo-list
of repositories into databases.json
for easy access in the "Variant Analysis Repositories" view.
Note
This feature uses the legacy code search via the {% data variables.product.github %} code search API. For more information on the syntax to use, see "AUTOTITLE."
You can use code search directly in the {% data variables.product.prodname_codeql %} extension to add a subset of repositories from {% data variables.product.github %} to a custom list.
For example, to add all repositories in the rails
organization on {% data variables.product.github %}, search org:rails
.
You can add a maximum of 1,000 repositories to a custom list per search.
-
In the "Variant Analysis Repositories" view, choose the list that you want to add repositories to. You can create a new list or choose an existing list that already contains repositories.
-
Right-click on the list you have chosen and then click Add repositories with {% data variables.product.prodname_dotcom %} code search.
-
In the pop-up that appears at the top of the application, under the search bar, select a language for your search from the choices in the dropdown.
-
In the search bar, type the search query that you want to use and press Enter.
You can view the progress of your search in the bottom right corner of the application in a box with the text Searching for repositories...
. If you click Cancel, no repositories will be added to your list. Once complete, you will see the resulting repositories appear in the dropdown under your custom list in the Variant Analysis Repositories view.
Some of the resulting repositories will not have {% data variables.product.prodname_codeql %} databases and some may not allow access by the {% data variables.product.prodname_codeql %} extension for {% data variables.product.prodname_vscode %}. When you run an analysis on the list, the "Variant Analysis Results" view will show you which repositories were analyzed, which denied access, and which had no {% data variables.product.prodname_codeql %} database.