Pushlog¶
Mozilla has taught Mercurial how to record who pushes what where and when. This is called the pushlog. It is essentially a log of pushes to repositories.
Technical Details¶
All pushes to hg.mozilla.org
occur via SSH. When clients talk to
the server, the authenticated username from SSH is stored in the
USER
environment variable. When a push occurs, our custom
pushlog
Mercurial extension will record the username, the current
time, and the list of changesets that were pushed in a SQLite database
in the repository.
Installing¶
The pushlog extension (source in hgext/pushlog
) contains the core
data recording and data replication code. When installed, a
pretxnchangegroup
hook inserts pushlog entries when changesets are
introduced. To install this extension, add the following line to your
hgrc:
[extensions]
pushlog = /path/to/version-control-tools/hgext/pushlog
No additional configuration is necessary.
The web components for pushlog are separate from the core extension and
require a bit more effort to configure. This code lives in
hgext/pushlog/feed.py
. It is our intention to eventually aggregate
this code into a single pushlog extension so there is a unified pushlog
experience.
The web component will require the following extension:
[extensions]
pushlog-feed = /path/to/version-control-tools/hgext/pushlog/feed.py
pushlog/feed.py
exposes some hgweb endpoints that expose pushlog
data.
Templates¶
It isn’t enough to activate the pushlog/feed.py
extension: you’ll also
need to configure some
Mercurial theming
to render pushlog data.
The Atom output will require the existence of an atom
style. You are
encouraged to copy the files in hgtemplates/atom
to your Mercurial
styles directory.
The pushloghtml
page will render the pushlog
template. This is
something you’ll need to define. Look for pushlog.tmpl
files in
hgtemplates/
in this repository for examples.
Pushlog templates typically make use of a named pushlogentry
entity. You may also need to define this. Searching for pushlog
in
hgtemplates
to find all references is probably a good idea.
Pushlog Wire Protocol Command¶
The pushlog
extension exposes a pushlog
command and capability
to the Mercurial wire protocol. This enables Mercurial clients to
retrieve pushlog data directly from the wire protocol.
For more details, read the source in hgext/pushlog/__init__.py
.
The Push ID¶
Entries in the pushlog have an incrementing integer key that uniquely
identifies them. It is guaranteed that push ID N + 1
occurs after
N
.
hgweb Commands¶
There are a couple custom hgweb commands that expose pushlog information.
For reference, an hgweb command is essentially a per-repository
handler in hgweb (Mercurial’s HTTP interface). URLs have the form
https://hg.mozilla.org/<repository>/<command>/<args>
.
json-pushes Command¶
The json-pushes
command exposes JSON representation of pushlog data.
pushlog Command¶
The pushlog
command exposes an ATOM feed of pushes to the
repository.
It behaves similarly to json-pushes
in terms of what
parameters it can accept.
pushloghtml Command¶
The pushloghtml
command exposes HTML show pushlog data.
Query Parameters¶
Various hgweb pushlog commands accept query string parameters to control what data is returned.
The following parameters control selection of the lower bound of pushes. Only 1 takes effect at a time. The behavior of specifying multiple parameters is undefined.
- startdate
- A string defining the start date to query pushes from. Only pushes after this date (non-inclusive) will be returned.
- fromchange
- Only return pushes that occurred after the push that introduced this changeset. The value can be any changeset identifier that Mercurial can resolve. This is typically a 40 byte changeset SHA-1.
- startID
- Only return pushes whose ID is greater than the integer specified.
The following parameters control selection of the upper bound of pushes. Behavior is similar to the parameters that control the lower bound.
- enddate
- A string defining the end date for pushes. Only pushes before this date (non-inclusive) will be returned.
- tochange
- Only return pushes up to and including the push that introduced the specified changeset.
- endID
- Only return pushes up to and including the push with the specified push ID.
Only parameters that control behavior include:
- user
- Only show pushes performed by the specified user.
- changeset
- Only show pushes that introduced the specified changeset.
- tipsonly
- If the value is
1
, only return info from the tip-most changeset in the push. The default is to return info for all changesets in a push. - full
- If this parameter is present (the value is ignored), responses will contain more verbose info for each changeset.
- version
Format of the response.
1
and2
are accepted.1
is the default (for backwards compatibility).This is only used by
json-pushes
.
Dates can be specified a number of ways. However, using seconds since UNIX epoch is highly preferred.
JSON Payload Formats¶
Version 1¶
Version 1 (the default) consists of a JSON object with keys corresponding to push IDs and values containing metadata about just the push. e.g.:
{
"16": {
"changesets": [
"91826025c77c6a8e5711735adaa9766dd4eac7fc",
"25f2a69ac7ac2919ef35c0b937b862fbb9e7e1f7"
],
"date": 1227196396,
"user": "gszorc@mozilla.com"
}
}
An optional obsoletechangesets
key may also be present in each push.
Read below for more.
Version 2¶
Version 2 introduces a container for pushes so that additional metadata can be communicated in the main object in the payload. Here is an example payload:
{
"lastpushid": 21,
"pushes": {
"16": {
"changesets": [
"91826025c77c6a8e5711735adaa9766dd4eac7fc",
"25f2a69ac7ac2919ef35c0b937b862fbb9e7e1f7"
],
"date": 1227196396,
"user": "gszorc@mozilla.com"
}
}
}
The top-level objects contains the following properties:
- pushes
An object containing push information.
This is the same object that constitutes version 1’s response.
- lastpushid
The push ID of the most recent push known to the database.
This value can be used by clients to determine if more pushes are available. For example, clients may query for N changesets at a time by specifying
endID
. The value in this property can tell these clients when they have exhausted all known pushes.
Push Objects¶
The value of each entry in the pushes object is an object describing the push and the changesets therein.
The following properties are always present:
- changesets
An array of changeset entries.
By default, entries are 40 character changeset SHA-1s included in the push. If
full
is specified, entries are objects containing changeset metadata (see below).Changesets are in DAG/revlog order with the tip-most changeset last.
The array may be empty. This can occur if changesets from this push are now hidden/obsolete.
- obsoletechangesets
(optional) An array of 40 character changeset SHA-1s of now obsolete changesets included in the push.
The DAG order relationship between
changesets
andobsoletechangesets
is strictly speaking undefined.This key is only present if the repository has obsolescence data and the push has changesets that are now obsolete.
- date
Integer seconds since UNIX epoch that the push occurred.
For pushes that take a very long time (more than a single second), the data will be recorded towards the end of the push, just before the transaction is committed to Mercurial. Although, this is an implementation details.
There is no guarantee of strict ordering between dates. i.e. the
date
of push IDN + 1
could be less than thedate
of push IDN
. Such is how clocks work.- user
- The string username that performed the push.
If full
is specified, each entry in the changesets
and
obsoletechangesets
array will be an object instead of a string.
Each object will have the following properties:
- node
- The 40 byte hex SHA-1 of the changeset.
- parents
- An array of 1 or 2 elements containing the 40 byte hex SHA-1 of the
parent changesets. Merges have 2 entries. Root changesets have the
value
0000000000000000000000000000000000000000
. - author
- The author string from the changeset.
- desc
- The changeset’s commit message.
- branch
The branch the changeset belongs to.
default
is the default branch in Mercurial.- tags
- An array of string tags belonging to this changeset.
- files
- An array of filenames that were changed by this changeset.
- precursors
(optional) An array of 40 character hex SHA-1 nodes identifying precursor nodes.
Precursor nodes are essentially previously versions of this changeset.
Precursor nodes come from obsolescence data. This key won’t exist if there are no precursor nodes for this changeset.
The precursor changesets are hidden and not available to normal Mercurial operations. However, querying the pushlog for their info may return results.
Here’s an example:
{
"author": "Eugen Sawin <esawin@mozilla.com>",
"branch": "default",
"desc": "Bug 1110212 - Strong randomness for Android DNS resolver. r=sworkman",
"files": [
"other-licenses/android/res_init.c"
],
"node": "ee4fe2ec168e719e822dabcdd797c0cff9ce2407",
"parents": [
"803bc910c45a875d9d76dc689c45dd91a1e02e23"
],
"precursors": [
"d313a202a85e114000f669c2fcb49ad42376ac04"
],
"tags": []
}
Writing Agents that Consume Pushlog Data¶
It is common to want to write tools or services that consume pushlog data. For example, you may wish to perform processing of new commits as they arrive.
Before you consider using the pushlog for this, you should consider the change notification services on hg.mozilla.org instead. If those aren’t sufficient, you should request one that is.
If you must consume the pushlog for monitoring for new pushes, you will need to periodically poll each repository separately. The following best practices should be used:
- Query by push ID, not by changeset or date.
- Always specify a
startID
andendID
. - Try to avoid
full
if possible. - Always use the latest format version.
- Don’t be afraid to ask for a new pushlog feature to make your life easier.
Querying by push ID is preferred because date ordering is not guaranteed (due to system clock skew) and because changesets can occur in multiple pushes in Headless Repositories. If a changeset occurs in multiple pushes, using the changeset as an identifier is ambiguous! Push IDs are the only guaranteed stable method for selecting pushes.
We recommend that startID
and endID
always be specified so
response sizes are bound. If they are omitted, the server may generate a
very large payload. We’ve seen clients request all push data from
the server and the response JSON is over 100 MB!
Specifying full
will incur an additional lookup on the server.
Without full
, the response JSON is generated purely from the SQLite
database. With full
, data needs to be read from Mercurial. This adds
overhead to the lookup and to the transfer. If you don’t need the extra
data, please don’t request it.