The DAG and Mercurial¶
Distributed version control systems (DVCS) like Mercurial (and Git) utilize a directed acyclic graph (DAG) for representing commits. This DAG is how the history of a repository is stored and often represented.
To help understand how this works, we’ll be using Mercurial commands to show and manipulate the DAG.
We start by creating an empty Mercurial repository with an empty graph:
$ hg init repo
$ cd repo
The Mercurial command for inspecting the repository history (and the
DAG by extension) is hg log
. Let’s run it on our empty repository:
$ hg log -G
No output. This is confirmation that the graph is empty.
Note
We use -G
throughout this article because it is necessary to
print a visual representation of the DAG.
Adding Nodes¶
We’ll need to create nodes to make our DAG interesting. The way we do this is by committing. But first, we need something to commit. Let’s create a file:
$ echo 1 > file
Then we tell Mercurial that we’re interested in storing the history of this file:
$ hg add file
We then use hg commit
to create a new node (changeset in Mercurial
parlance) and add it to the DAG:
$ hg commit -m A
(-m A
says to use A
for the commit message).
Now let’s take a look a the DAG:
$ hg log -G -T '{desc}'
@ A
Note
-T '{desc}'
tells Mercurial to only print the description or
commit message from the changeset/node. Without it, output would be
much more verbose.
It looks like we have a single node - denoted by @
and A
.
This graph / repository state isn’t very interesting. So, we perform another commit:
$ echo 2 >> file
$ hg commit -m B
And then inspect the DAG:
$ hg log -G -T '{desc}'
@ B
|
o A
This is slightly more interesting!
The B
node has been introduced. That is pretty obvious.
A
has its node represented by a o
. B
has its node represented
by @
. Mercurial uses @
to represent the node currently attached
to the working directory. This will be discussed later.
The first column has grown a vertical pipe character (|
) between
the two entries. In graph terms, this is an edge. While the visualization
doesn’t indicate it, B
internally stores a pointer back to A
.
This constitutes the edge and since the edge is directional
(B
to A
), that makes the graph directed (the D from
DAG).
Directed graphs borrow terminology from biology to represent relationships between nodes:
- parent node
- A node that came before another (is referred to by another node).
- child node
- A node that derives directly from another.
- root node
- A node that has no parents.
- head node
- A node that has no children.
- descendants
- All the children, the children’s children, the children’s children’s children, and so on of a node.
- ancestors
- The parents, parents’ parents, parents’ parents’ parents, and so on of a node.
In our graph so far:
A
is the parent ofB
B
is a child ofA
A
is a rootB
is a head- The descendants of
A
are justB
. - The ancestors of
B
are justA
.
If you don’t understand this, let’s try committing a few more nodes to help your understanding.:
$ echo 3 >> file
$ hg commit -m C
$ echo 4 >> file
$ hg commit -m D
$ hg log -G -T '{desc}'
@ D
|
o C
|
o B
|
o A
A
is still the root node. Since B
has children, it is no
longer a head. Instead, D
is now our head node.
If all you do is hg commit
like we’ve been doing so far, your
repository’s DAG will be a linear chain of nodes, just like we
have constructed above. 1 head. Every node has 1 parent (except the
root).
Important
The important takeaway from this section is that the history
of Mercurial repositories is stored as a DAG. hg commit
creates a changeset and appends a node to a graph. A DAG node
and a Mercurial changeset are effectively the same thing.
Nodes are Hashes of Content¶
Up to this point, we’ve been using our single letter commit messages
(A
, B
, etc) to represent nodes in our DAG. This is good
for human understanding, but it hides an important detail of how
Mercurial actually works.
Mercurial uses a SHA-1 digest to identify each node/changeset in the repository/DAG. The SHA-1 digest is obtained by hashing the content of the changeset.
Essentially, Mercurial changesets consist of various pieces of data including but not limited to:
- The parent node(s)
- The set of files and their state
- The author
- The date
- The commit message
Mercurial assembles all these pieces of data in a well-defined manner, feeds the result into a SHA-1 hasher, and uses the digest of the result as the node/changeset ID.
SHA-1 digests are 20 bytes or 40 hex characters. They look
like 835dbd9444dbed0cdc2ca27e23839f05a58e1dc1
. For readability,
these are almost always abbreviated to 12 characters in user-facing
interfaces. e.g. 835dbd9444db
.
We can ask Mercurial to render these SHA-1 digests instead of the commit messages:
$ hg log -G -T '{node}'
@ 2bf9b23b2d0379540038866a72699a8ce5e92e84
|
o 0f165760af41ddde6470860088f421c1efcc5a5f
|
o 7175417717e87c88e4cf61ab2f76f2c54c76fa4b
|
o 8febb2b7339e5843832ab893ca2a002cd4394a03
Or we can ask for the short versions:
$ hg log -G -T '{node|short} {desc}'
@ 2bf9b23b2d03 D
|
o 0f165760af41 C
|
o 7175417717e8 B
|
o 8febb2b7339e A
Note
We start to use some more capabilities of Mercurial’s templates
feature. This allows output from Mercurial commands to be
customized. See hg help templates
for more.
Because SHA-1s (even their short versions) are difficult to remember, we’ll continue using commit messages and single letters throughout this article to aid comprehension.
Important Properties from Using Hashing¶
Since node IDs are derived by hashing content, this means that changing any of that content will result in the node ID changing.
Change a file: new node ID.
Change the commit message: new node ID.
Change the parent of a node: new node ID.
Changing the content of a changeset and thus its node ID is referred to as history rewriting because it changes the history of a repository/DAG. History rewriting is an important topic, but it won’t be discussed quite yet. The important thing to know is that if you change anything that’s part of the changeset, the node ID changes.
Moving Between Nodes¶
Looking at the state of our Mercurial repository on the filesystem, we see two entries:
$ ls -A
.hg/
file
The .hg
directory contains all the files managed by Mercurial. It should
be treated as a black box.
Everything else in this directory (currently just the file
file and
the current directory) is referred to as the working directory or
working copy (both terms used interchangeably).
The working directory is based on the state of the files in a repository at a specific changeset/node. We say based on because you can obviously change file contents. But initially, the working directory matches exactly what is stored in a specific changeset/node.
The hg update
(frequently hg up
) command is used to change which
node in the DAG the working directory corresponds to.
If you hg up 7175417717e8
, the working directory will assume the state
of the files from changeset/node 7175417717e8...
. If you
hg up 2bf9b23b2d03
, state will be changed to 2bf9b23b2d03...
.
The ability to move between nodes in the DAG introduces the possibility to…
Creating DAG Branches¶
Up until this point, we’ve examined perfectly linear DAGs. As a refresher:
$ hg log -G -T '{node|short} {desc}'
@ 2bf9b23b2d03 D
|
o 0f165760af41 C
|
o 7175417717e8 B
|
o 8febb2b7339e A
Every node (except the root, A
/8febb2b7339e
) has 1 parent node.
And the graph as a whole has a single head (D
/2bf9b23b2d03
).
Let’s do something a bit more advanced. We start by switching the working directory to a different changeset/node:
$ hg up 7175417717e8
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ hg log -G -T '{node|short} {desc}'
o 2bf9b23b2d03 D
|
o 0f165760af41 C
|
@ 7175417717e8 B
|
o 8febb2b7339e A
(Note how @
- the representation of the active changeset/node
in the working directory - moved from D
to B
)
Now let’s commit a new changeset/node:
$ echo 5 >> file
$ hg commit -m E
created new head
That created new head message is a hint that our DAG has changed. Can you guess what happened?
Let’s take a look:
$ hg log -G -T '{node|short} {desc}'
@ 4a3687e9313a E
|
| o 2bf9b23b2d03 D
| |
| o 0f165760af41 C
|/
o 7175417717e8 B
|
o 8febb2b7339e A
B
now has multiple direct children nodes, C
and E
. In
graph terminology, we refer to this as a branch point.
E
has no children, so it is a head node (D
is still a
head node as well).
Because the visualization of the graph can resemble a tree (from nature, not your computer science textbooks), small protrusions from the main trunk are referred to as branches from the perspective of the DAG. (Mercurial has overloaded branch to convey additional semantics, so try not to confuse a DAG branch with a Mercurial branch.)
The created new head message was Mercurial telling us that we created not only a new DAG head but also a new DAG branch.
Because your commit is taking the repository in a different direction (very non-scientific word), this act of creating new DAG branches is sometimes referred to as divergence or diverging.
DAG branches turn out to be an excellent way to work on separate and isolated units of change. These are often referred to as feature branches because each DAG branch consists of a specific feature. For more, see Workflows.
It’s worth noting that hg commit
always produces a new head
node because the node being created never has any children. However,
it may not create a new DAG branch: a new DAG branch is only created
when the parent node of the commit isn’t a head node.
Before we go on let’s commit a new changeset on top of E
to make
the DAG branch more pronounced:
$ echo 6 >> file
$ hg commit -m F
$ hg log -G -T '{node|short} {desc}'
@ da36621d7a94 F
|
o 4a3687e9313a E
|
| o 2bf9b23b2d03 D
| |
| o 0f165760af41 C
|/
o 7175417717e8 B
|
o 8febb2b7339e A
Merging DAG Branches¶
Now that we have multiple DAG branches, it is sometimes desirable to
merge them back into one. The Mercurial command for performing this
action is hg merge
.
Let’s change our working directory to the changeset that we want to
merge into. We choose D
, since it was our original head.:
$ hg up 2bf9b23b2d03
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
Now we tell Mercurial to bring the changes from F
’s head into
D
’s:
$ hg merge da36621d7a94
0 files updated, 1 files merged, 0 files removed, 0 files unresolved
(branch merge, don't forget to commit)
$ hg commit -m G
Visualizing the result:
$ hg log -G -T '{node|short} {desc}'
@ 19c6c94d7bb2 G
|\
| o da36621d7a94 F
| |
| o 4a3687e9313a E
| |
o | 2bf9b23b2d03 D
| |
o | 0f165760af41 C
|/
o 7175417717e8 B
|
o 8febb2b7339e A
G
/19c6c94d7bb2
is what is referred to as a merge commit. It
is the result of a commit operation that merged 2 nodes. From the
perspective of the DAG, it is a node with 2 parents, not 1.
Conclusion¶
These are the basics of how Mercurial uses a directed acyclic graph (DAG) to represent repository history.
If you would like to learn more about how distributed version control systems (like Mercurial) use DAGs, please read this article.
For more on workflows that build upon this knowledge, see Workflows.