GitHub Enterprise Server: Analyzing Git traffic using Governor #34220
Replies: 6 comments 2 replies
This comment was marked as disruptive content.
This comment was marked as disruptive content.
This comment was marked as disruptive content.
This comment was marked as disruptive content.
This comment was marked as disruptive content.
This comment was marked as disruptive content.
-
So cool! I never knew this. Thank you for sharing. |
Beta Was this translation helpful? Give feedback.
-
When doing a 0 = fetch |
Beta Was this translation helpful? Give feedback.
-
These are going to be great in producing some usage charts to help us teach our users how to re-architect their CICD processes. We have two different teams who decided they want "build on dev checkin" that makes them our largest impact by over an order of magnitude of all other projects combined. Now we can show them the actual impact. How impactful to server operations is this tool? Can we run it from our replica regularly to minimize impact on the primary server? Would the results be the same there? TIA! Steve |
Beta Was this translation helpful? Give feedback.
-
Governor is a small application recording every invocation of the core Git binary on GitHub Enterprise Server. Read on to learn how to use it and how to gather helpful statistics and insights about your developer's usage of Git.
Introduction
GitHub Enterprise Server has an internal monitor and concurrency controller for Git processes called Governor, which keeps count of Git operations. A command line utility to query Governor data (
ghe-governor
) was made available with GitHub Enterprise Server 2.11. Governor data files, located under/data/user/gitmon/
, hold 1 hour of data per file and are retained for 2 weeks. The files contain timestamps in their names which you can use to confirm the time period they cover. Here is an example:Usage
First, let's have a look at Governor's syntax. We will focus on common examples and queries later in this article.
Individual (Top) queries
Governor can find the top N records of Git queries for a given metric (column). The resulting table will be sorted by that column.
The column can be any of
rt
,cpu
,disk_read
,disk_write
,disk
,uploaded
,received
,net
,rss
, orcpu_busy
.Aggregate queries
Governor can find the top N groups of Git queries for a given grouping function and a given metric (column).
The grouping function can be any of
hostname
,program
,repo
,git_dir
,via
,ip
,user_id
,result_code
,cloning
,die_message
, ordie_message_raw
.The column can be any of
count
,rt
,max_rt
,avg_rt
,avg_parallelism
,max_parallelism
,cpu
,avg_cpu
,disk
,disk_read_bytes
,disk_read_kb
,avg_read_bytes
,disk_write_bytes
,disk_write_kb
,avg_write_bytes
,uploaded_bytes
,uploaded_kb
,received_bytes
,received_kb
,net
,avg_uploaded
,cpu_busy
, orusers
.Please see below for an explanation of some of the resulting table columns:
RT
means response time, soAVG RT
is the average time in seconds that Git invocations took, andMAX RT
is the running time in seconds of the longest-running invocation, per host.PL
is parallelism, or how many Git invocations are outstanding at one time. SoMAXPL
andAVGPL
are the maximum and average, respectively.CPU/SEC
is how many seconds of CPU time are used by Git only per second of wall-clock time. This is the number of CPUs dedicated to Git, averaged over the entire duration of the query. Divide it by the actual number of CPU cores to get Git specific CPU percentage utilization. Unlike Unix system load, this number cannot exceed the actual number of CPU cores.UPL
is data that GitHub Enterprise Server uploaded -- i.e., client fetches and clones.RCV
is data that GitHub Enterprise Server received -- i.e., client pushes.READ
,WRITE
,UPL
, andRCV
columns are all in GB, but the rate is in MB/s.Options for all queries
Every query type can be limited in scope in the following ways:
-j
= set output format to JSON instead of an ASCII table-n<N>
= limit the output size to N (default: 20)-t <timespec>
= only consider Git invocations since a given start time (default: 48 hours ago). You may want to use a tool such as https://www.epochconverter.com/ to convert UTC to Unix Epoch for finely grained queries.-t 1371614483
= Invocations since a given Unix timestamp (seconds since 1970)-t 1371614483637
= Invocations since a given Java timestamp (milliseconds since 1970)-t-1d
= Invocations in the last day-t-2h
= Invocations in the last two hours-t-20m
= Invocations in the last twenty minutes-u <timespec>
= consider Git invocations up to a given end time (default: now)-r <owner>/<repository>
= consider only queries that match a given owner (user or organization) and repository. Specify this multiple times to OR together several possible repositories.-o <owner>
= consider only queries that match a given owner (user or organization). Specify this multiple times to OR together several possible owners.-V <protocol>
= consider only queries arriving via a specific protocol (e.g.shell
,git
,blob edit
,gitrpc
,ssh
,initial commit
,web branch create
,pull request branch delete button
, orpull request merge button
). Specify this multiple times to OR together several possible protocols.-P <program>
= consider only queries that ran a given Git subprogram (e.g.rev-list
,diff-tree
,dgit-helper
,show-ref
,merge-base
,log
,diff
,blame-tree
,diff-pairs
,upload-pack
,shortlog
,rev-parse
,pack-objects
,pack-refs
,repack
,cat-file
,upload-file
,ahead-behind
,dgit-state
, orfor-each-ref
). Specify this multiple times to OR together several possible programs.-I <address>
= consider only queries from a specific IP address.-I ""
means local operations and is equivalent to-V shell
. Specify this multiple times to OR together several possible addresses.The following are long options for aggregate queries:
--count-only
= only show theKEY
andCOUNT
columns--distinct-users
= also show the#USERS
columnExample queries
Now, that we know Governor's syntax, let's have a look at typical usage scenatios and example queries.
Analyzing Git traffic
The overall summary provides the total and average number of Git requests over a recorded period:
The following set of sample commands may help to identify Git traffic patterns or spikes in activity. They make use of the
count
metric, which is a good reference point to know what is being requested the most.To dive a bit deeper, the following queries indicate the actual volume of Git traffic:
Furthermore, you might be interested in bursts of concurrent clones. A thundering herd of clones can cause a spike in resource usage. You can check for concurrent clones by aggregating on
max_parallelism
(result table columnMAXPL
):CPU Profiling
The above metrics are only so useful in performance profiling. But Governor also collects CPU timing data, which is helpful in diagnosing high CPU utilization caused by Git operations.
Top repositories by CPU time:
Top programs by CPU time for a single repository:
Using the repository
-r
flag, you can see the CPU breakdown for individual repositories as well. This time we're interested in theprogram
that used the most CPU time:Top IP addresses by CPU time for a single repository:
Grouping by IP address and CPU time can help to identify continuous integration systems or users that are causing a performance hit:
General Governor records with the most CPU time (not grouped):
Disk usage
Sometimes, you want to find out which repository or program caused a specific disk write peak that you've seen. The following commands may be of help here.
TL;DR
Governor ships with GitHub Enterprise Server and is able to provide insights about how your developers use Git and which implications their behavior may have on your GitHub Enterprise Server instance. In Enterprise Support, we regularly rely on Governor to help us answer all kinds of questions related to Git usage. Now, you can do the same.
What are your experiences with Governor? Feel free to comment below!
Beta Was this translation helpful? Give feedback.
All reactions