Version Control
distributed client-server local
git (2005) hg (2005) bzr (2005) svn (2000) p4 (1995) cvs (1990) rcs (1982) sccs (1972)
put file under version control add add add add add add
annotate lines of source code with commit info blame annotate annotate/ann blame/ann annotate annotate
create a git-style branch branch bookmarks
create an svn-style branch branch(es) branch copy/cp branch(es) tag -b
update working directory to most recent version of a branch checkout update/up checkout/co update/up sync update co -l get -e
get local copy of repository from server or existing repository clone clone branch checkout/co sync checkout/co
create a new commit commit commit/ci commit/ci commit/ci submit commit/ci ci delta
show difference between file in working directory and most recent commit diff diff diff/di diff/di diff diff rcsdiff sccsdiff
show commits available to be pulled
git: pull changes into remote branches; don't modify local branches
fetch incoming/in missing
online documentation help help help help help
create a repository from init init init init ci admin
show commit information for current branch in reverse chronological order log log -b tip log log filelog log rlog
merge branches merge merge merge merge integrate
change the name of a file under version control mv rename/mv mv move/mv move
get commits from a remote repository pull pull pull
push commits to a remote repository push push push
move the commits on a branch to the end of another branch rebase rebase
show remote repositories remote -v show paths
make the working directory match the most recent commit reset revert revert revert
create a commit which undoes the result of a previous commit revert backout revert
mark a file with merge conflicts as resolved add resolve resolve resolve resolve
mark a file as not present in the next commit rm remove/rm remove/rm delete/rm delete remove
write contents of a file version to standard out show cat cat cat print checkout -p co get -p
store uncommitted working directory changes in a temporary location stash shelve shelve
show files in working directory which don't match the most recent commit status status/st status/st status/st changes status
give a name to a commit tag tag(s) tag(s) copy/cp tag tag

archiving and patching tools: diff | cpio | diff3 | ar | tar | patch | zip | jar | rsync

Distributed Version Control

git and hg compared: repositories and versions | files, directories, and commits | branches, tags, and merges | pulling and pushing | renamed files | identifiers | repository urls | config files | ignore files | hooks | metasyntactic variables

git usage git description hg usage hg description
add PATHSPEC
add -e FILE
add -i FILE
add -u PATHSPEC
Add file contents to the index. If PATH is a directory it is added with all its contents recursively. Error if no arguments are provided.

Add a portion of a change to the index by editing the diff.

Add file contents to the index interactively.

Only add file contents to the index which are already tracked. Newly created files will never be added to the index when the -u flag is used.
add [PATH] … Put files under version control. If no argument is provided all files in the working directory are put under version control; equivalent to

  git add .
 
Under hg a file must be added only once, before it is first committed. Under git a file must be added each time it is modified. hg add is used to notify Mercurial that a file is being tracked by the version control system. It is not possible to add part of a file change. git add, by contrast, adds the changes to a file, including partial changes, to a staging area called the index to be flushed out with the next commit.
none how to perform equivalent of Mercurial addremove with Git:
 
  git add .
  git ls-files -deleted | xargs git rm
addremove [PATH] … Add or remove files depending upon whether they are in the working directory; if no PATHs are provided, all new files are added and all missing files are removed.
archive --format=tar TREEISH > NAME.tar Create a tarball from TREEISH. archive -t tar ../NAME.tar archives root directory; git does not.
backout described below
bisect see manual Find by binary search the change that introduced a bug. bisect
blame PATH [COMMIT] Show the revision number, author and timestamp of the last commit which modified each line in FILE. COMMIT can be used to specify an older version of FILE. annotate -cudln [-r REV] [PATH] Mercurial by default only indicates the local revision number. The flags -c, -u, -d, -l, and -n add changeset, user, date, line number, and local revision number.
branch [-r|-a]
branch [--contains|--merged] COMMIT
branch NAME [COMMIT]
branch --track NAME [BRANCH]
branch -m BRANCH NAME
branch (-d|-D) BRANCH
List branches. If -r option is provided remote tracking branches are listed. If -a option is provided both local and remote tracking branches are listed.

List branches that are descendants of COMMIT if --contains option is used. List branches that are ancestors of COMMIT if --merged is used.

Create a branch named NAME using COMMIT as a starting point. If COMMIT is not specified then HEAD is the starting point.

Create a branch NAME which tracks BRANCH. Usually BRANCH is a remote tracking branch. This configures the repository so that when git pull is executed on NAME a merge equivalent to git merge BRANCH is automatically performed. If BRANCH is not specified the current branch is tracked.

Rename the branch BRANCH to NAME.

Delete branch BRANCH. Use -D to delete a branch with commits which have not been merged.
bookmarks
none
bookmarks [-r REV] NAME
none
bookmarks -m NAME1 NAME2
bookmarks -d NAME
List the bookmarks.

none

Create a bookmark for the parent changeset of the working directory or REV.

Mercurial does not have tracking bookmarks.

Rename bookmark NAME1 to NAME2.

Delete the bookmark NAME.
Git branches are the equivalent of Mercurial bookmarks. Git does not have an equivalent of Mercurial branches. branches
branch
branch BRANCH
List branches.

Show the current branch. A close git equivalent is

  git branch | grep '*'

Create a branch named BRANCH which will be created from the working directory with the next commit.

Mercurial does not provide a mechanism for renaming or deleting branches. The recommended way to get rid of unwanted branches is to rename the repository and then clone it to the original name with

  hg clone -r REV
bundle see manual Move objects and refs by archive. bundle
unbundle
cat described below
cat-file -p HASH
cat-file -t HASH
Display content of repository object HASH.

Get the type of repository object HASH. The type can be 'blob', 'tree', or 'commit'.
none Mercurial does not assign identifiers to files and directories, so no equivalent of git cat-file is possible. The following are equivalent, however:

  git cat-file commit HASH
  hg log -r REV
checkout [-f] BRANCH
checkout TREEISH PATHSPEC
checkout -p PATHSPEC
checkout -b NAME [COMMIT]
Checkout the branch named BRANCH. BRANCH becomes the current branch. Changes in the index are carried over but if there are changes to tracked files that are not in the index the checkout fails. If the -f option is specified changes in the index and to tracked files will be discarded.

Copy the files or directories PATH … from TREEISH to the working directory. The current branch is not changed.

Copy files or directories PATH … from the index. The current branch is not changed.

Create a branch named NAME using COMMIT as the starting point. If COMMIT is not specified the HEAD of the current branch is used. NAME becomes the current branch.
update [-c|-C] (BRANCH|-r REV)
revert [-a] [-C] [-r REV] PATH
none
branch BRANCH
Checkout BRANCH or REV. If there are changes in the working directory they are applied to the new working directory; the -C option discards changes in the working directory and the -c option prevents an update when there are changes.

Revert PATHs to how they are according to the parent of the working directory or REV if specified. If this makes the files different from how they are in the parent of the working directory then the file will have a modified status. Backup copies of the files will be saved with .orig suffixes unless the -C option is used. If no PATHs are provided and the
-a option is used, the entire working tree will be reverted.

Mercurial has no index and thus no equivalent to git checkout -p.

Start a new BRANCH using the current working directory which will be created with the next commit.
cherry-pick COMMIT Apply the changes introduced by some COMMITs to current branch. Although it is possible to specify multiple commits, it is better to use git rebase --onto if the commits are a chain because rebasing provides mechanisms (continue, skip, abort) for dealing with conflicts. export
import
graft
clean -n
clean -f
Show what files would be removed if run with -f option.

Remove untracked files from the working tree.
purge -p
purge
Show what files would be removed if run without -p option.

Remove untracked files from the working tree.
clone [-b BRANCH] URL [DIR]
clone [-o NAME] URL [DIR]
clone [-c SECTION.KEY=VAL] URL [DIR]
clone (--bare|--mirror) URL [DIR]
clone --recursive URL
Clone a repository. If BRANCH is provided, then it will be the current branch in the new repository. If DIR is provided it will be the name of the directory containing the repository.

If NAME is provided it is used as the name of the origin instead of the default 'origin'.

If any KEY=VAL pairs are provided they are written in the .git/config file of the new repository.

If --bare is provided as an option a bare repository will be created. In a bare repository there is no working directory and the contents of the top directory are what would have been in the .git directory had the --bare flag not been used.

Clone a repository and any submodules.
clone [-r REV|-b BRANCH] … URL [DIR]
none
none
clone -U URL [DIR]
Clone the repository at URL. Only changesets in the history of REV or BRANCH are copied over to the new repository. If DIR is provided it will be the name of the directory containing the new repository.

The name default which is assigned to URL can be changed by editing .hg/hgrc.

Configuration settings are changed by editing ~/.hgrc.

Clone the repository at URL. The clone will have no working directory files, only a .hg subdirectory.
commit [-m STR]
commit -a [-m STR]
commit --amend
commit --amend --author=STR
Record changes to the repository. STR is the commit message.

Commit all changes to tracked files.

Merge index with head commit.

Change author of most recent commit.
commit [-m STR]
commit -A [-m STR]
commit --amend
commit --amend -u STR
commit --close-branch
With both git and hg the files to be committed can be specified on the command line.

If no files are specified hg commit will commit all modified files that are currently tracked in the working directory. Newly created files that have not be added with hg add will not be committed. git commit without arguments by contrast will only commit the files that have been specified with git add. git commit -a behaves like the hg commit, however.

Merge working directory with parent changeset.

Change author of parent changeset.

Close head. A closed head will not be displayed by the heads command.
config -l [--global]
config -e [--global]
config --get [--global] SECTION.KEY
config [--global] SECTION.KEY VAL
config --unset [--global] SECTION.KEY
config --remove-section SECTION
List configuration settings.

Open configuration settings file in an editor.

Lookup configuration setting KEY in section SECTION.

Add configuration setting KEY in section SECTION with value VAL.

Remove configuration setting KEY in section SECTION.

Remove all keys in SECTION.

Writes modify .git/config unless --global is specified, in which case ~/.gitconfig is edited. Reads look at both files unless --global is specified, in which case they only look at ~/.gitconfig.
showconfig

none
List configuration settings.

Configuration settings are changed by editing ~/.hgrc
It is not possible to copy a file and preserve the history. Note that the git mv command will preserve history, but this is because Git will assume that two paths are the same when one is removed and the other is added in the same commit and their contents are similar. copy (-f|-A) SRC_PATH DEST_PATH Copy SRC_PATH to DEST_PATH, where SRC_PATH is already under version control. DEST_PATH will inherit the revision history of SRC_PATH. Use the -A flag when DEST_PATH already exists on the file system, or the -f flag when DEST_PATH already has a different revision history.
debugdata FILE REV
describe Show the most recent tag that is reachable from a commit.
diff [PATHSPEC …]
diff --cached [COMMIT] [PATHSPEC …]
diff COMMIT1 COMMIT2 [PATHSPEC …]
diff COMMIT [PATHSPEC …]
Produce a diff between the working directory and the index. If PATHs are provided only diffs for those files are produced.

Produce a diff between the index and COMMIT. If COMMIT is not specified it defaults to HEAD.

Produce a diff between COMMIT1 and COMMIT2.

Produce a diff between the working directory and COMMIT.
diff [PATH …]
diff -r REV [PATH …]
diff -r REV1 -r REV2 [PATH …]
diff -c REV [PATH …]
Produce a diff between tracked files in the working directory and the last commit. If PATHs are provided only diffs for those files are produced.

Produce a diff between tracked files in the working directory and REV.

Produce a diff between REV1 and REV2.

Produce a diff between the previous revision to REV and REV.
fetch [-f] [-t] [-p] REPO [REFSPEC]
fetch [-f] [-t] [-p]
fetch [-f] [-t] [-p] --all
fetch [-f] [-t] [-p]--multiple REPO
Fetch objects and refs from REPO. If REFSPEC is not provided, then the value from the remotes section of .git/config for REPO is used; HEAD is fetched if REPO is a URL. FETCH_HEAD is set to point to the local copy of the remote HEAD. The -f option will force a fetch if the destination exists and the update isn't a fast forward. The -t option copies tags. The -p option removes local references that are no longer on the remote repository.

Fetch objects and refs from origin.

Fetch objects and refs from all remotes.

Fetch objects and refs from multiple REPOs.
none Mercurial does not have remote tracking branches; hence no equivalent to git fetch.
fsck verify
forget PATH Mark files to removed in next commit, but don't remove from working directory.
gc see manual Remove unnecessary files and optimize the local repository.
grep [-i] [-v] [-E|F|P] \
  [-h|H] [-l|L] [-n] -e STR
grep --untracked [--no-exclude-standard]
  STR
grep -e STR --and --or --not \( \)
grep -f PATH
grep --cached -e STR
grep -e STR TREEISH
grep -e STR -- PATHSPEC
Print lines from files in working directory which are tracked by git and which match the pattern STR. Flags have the same meaning as for command line grep.

Also print untracked files in the working directory. With --no-exclude-standard also print lines from files excluded by .gitignore.

Print lines matching a logical expression of patterns.

Print lines matching any of the patterns read from the file PATH. Patterns are separated by newlines.

Search the index.

Search the commit version or directory TREEISH.

Seach the files matching PATHSPEC.
Use command line grep -r to search the working directory. To search a different revision of the working directory, it must be checked out.

The
hg grep command searches the entire revision history and is equivalent to git log -S
hash-object PATH
hash-object -w PATH
Compute the object ID for a file.

Add a blob to the object database.
none
Git branches cannot have multiple heads; a git branch is a ref which always points to the head. heads [-c] List all heads of a branch. A head is a changeset with no child changesets. With the -c flag closed heads will also be shown.
help
help CMD

help -a
help -g
List most common commands and shared options.

Show help for git command CMD.

List all subcommands.

List available concept guides. Use git help GUIDE to read a concept guide.

man pages might also be installed:

  man git
  man git-clone
help [-v]
help CMD
List commands and additional help topics. With the -v flag shared options are also listed.

Show help for hg command CMD. Use hg help TOPIC to read a help topic.
identify
none incoming Shows the changesets that are available to be pulled.
init [DIR]
init --bare [DIR]
Create an empty git repository or reinitialize an existing one. If DIR is not specified the current directory is used.

Create a bare empty git repository or reinitialize an existing one. In a bare repository there is no working tree and the files normally in .git are in the top directory. If DIR is not specified the current directory is used.
init [DIR]
none
locate
log [-N] [PATH …]
log [-N] --branches [PATH …]
log (-p|--pretty=oneline)-S STR
Show commit log for current branch. If N is provided limit output to last N commits. If PATHs are provided, limit output to commits that affected one or more of them.

Show commit log for all branches.

Search for commits which added or removed STR. With -p flag include a diff describing the commit. With --pretty=oneline describe the commit in a single line.
log [-l N] -b BRANCH [PATH …]
log [-l N] [PATH …]
grep
Show commit log for BRANCH. Use 'tip' for the current branch. If N is provided limit output to last N commits. If PATHs are provided limit output to commits that affected one or more of them.

Show commit log for all branches.
ls-files [PATHSPEC] …
ls-files --stage [PATHSPEC] …
ls-files --delete [PATHSPEC] …
List files under version control. This is the files which have had "git add" run on them and have not subsequently had "git rm" run on them. If PATH is not specified, all files are listed. Otherwise only files in PATH are listed.

With the --stage option the command includes the mode bits, object ID, and stage number of the files.

List files under version control which aren't in the working directory.
manifest [-r REV]
none
status -d
ls-tree TREEISH
ls-tree -r[t] TREEISH
List the contents of a tree.

List the contents of a tree and all its subtrees recursively. Use the -t option to include subtrees and their object IDs in the output.
merge COMMIT
merge --abort
merge --squash
Merge one or more commits into the current branch.

Restore the working directory to the state it had before a merge was attempted. This might not be possible if there were uncommitted changes in the working directory.

Modify index and working directory with results of merge but don't commit.
merge [[-r] REV]
update --clean
mv OLDPATH NEWPATH
mv FILEDIR
Move or rename a file, a directory, or a symlink.

Move one or more files into a directory.
rename OLD NEW
rename FILE … DIR
notes see manual Add or inspect object notes.
none outgoing Show the changesets that have not been pushed. Synonym: out
parents
phase
pull [-f] REPO [REFSPEC]
pull [-f]
Short for

  git fetch [-f] REPO [REFSPEC]
  git merge FETCH_HEAD

Short for

  git fetch [-f]
  git merge FETCH_HEAD
pull [-u] [SOURCE]
pull (-b BRANCH) … [SOURCE]
Pull changesets from SOURCE. If no SOURCE is specified, the value of default in the [paths] section of .hg/hgrc is used. Only changesets which affect branches already on local repository are pulled. If the -u flag is used and there were changesets affecting the current branch, make the working directory match the most recent changeset.

Pull changesets affecting BRANCH from SOURCE. The -b flag can be used multiple times.
push [-f] [--prune] [--tags]
push [-f] [-u] REPO [BRANCH] …
push [-f] --all REPO
push [-f] REPO REFSPEC
push --delete REPO BRANCH
If the current branch is a tracking branch for a remote branch, then push to the repository for the remote branch. Otherwise the command does nothing. If the -f option is used conflicts will be overwritten in favor of the local repository. The --prune option removes remote refs which are not local. The --tag option copies tags.

Push to REPO. If one or more BRANCHES are specified, all necessary objects are copied to the remote repository and the remote refs are updated. If no BRANCHES are specified, the branches that were set using 'remote set-branch' are used. With the -u flag make local tracking branches in refs/remotes for the corresponding remote branches.

Push all local branches to REPO. If any local branches do not have remote branches and remote tracking branches they are added.

Push local branches to REPO or origin according to REFSPEC.

Delete the specified remote BRANCHES and their remote tracking branches.
push [-f] [SOURCE]
push -b BRANCH [--new-branch] [SOURCE]
Push changesets to SOURCE. If no SOURCE is specified, the value of default in the [paths] section of .hg/hgrc is used. A push that creates a branch with multiple heads will fail unless the -f flag is used.

Push changesets affecting BRANCH to SOURCE. The -b flag can be used multiple times. This will fail if BRANCH is not on SOURCE unless the --new-branch flag is used.
rebase BRANCH
rebase --onto BRANCH COMMIT1 COMMIT2
rebase --abort
rebase --continue
rebase --skip
rebase -i COMMIT
Rebase BRANCH onto the current branch. All commits on BRANCH going back to the latest common ancestor are applied to the current branch; the head of BRANCH remains the same and the head of the current branch points to the new branch.

Apply all commits after but not including COMMIT1 and up to and including COMMIT2 to BRANCH. If successful the repository will have a detached HEAD, meaning that HEAD points at a commit and not a named branch. Use git branch NAME to assign a branch name to HEAD and then git checkout NAME to switch to the new branch.

Abort the results of a rebase that had conflicts.

Continue with a rebase that had conflicts which have been resolved by editing the files and running add on them.

Skip commit that caused conflicts and continue with rebase.

Perform an interactive rebase on current branch using all commits after but not including COMMIT. This can be used to squash multiple commits into a single commit.
rebase
recover
reflog see manual Show the history of changes to refs and HEAD. This will contain branch commits as well as the creation and switching of branches.
remote
remote add [-t BRANCH] … NAME URL
remote add [-m BRANCH] NAME URL
remote rm REMOTE
remote rename REMOTE NAME
remote show [-v]
remote show [-n] REMOTE
remote set-head REMOTE (-a|-d) BRANCH
remote set-url --add REMOTE URL
remote set-url --delete REMOTE URL
remote set-branches REMOTE [--add] \
  BRANCH
List the remotes.

Add a remote NAME at url URL. The -t option can be used repeatedly to track specific branches. Otherwise all branches are tracked.

Add a remote NAME at url URL. The -m option can be used to set the head. The head can also be set with the set-head subcommand.

Remove REMOTE.

Rename REMOTE to NAME.

Show the name of the remote repository. With -v flag show the url for the remote repository.

Get information about REMOTE including remote branches. This requires connecting to the remote machine unless the -n option is used.

Set the head for the remote to BRANCH. Having a remote head permits the remote name to be used in places a branch name would normally be used.

Add a URL to REMOTE. This can be used to push to multiple repositories simultaneously.

Delete a URL from REMOTE.

Set branches for REMOTE. If the
--add option is used, the branches are added to the existing branches. Otherwise the new branches replace the existing branches. These are the branches that will get pushed or pulled when no branches are explicitly specified.
paths
none
List the paths.

Names are assigned to repository urls in the [paths] section of the .hg/hgrc file. When a repository is cloned the source url is given the name default.

Mercurial does not provide commands for adding and removing paths. Instead one edits the .hg/hgrc file.
reset [-p] [PATH]
reset --soft COMMIT
reset [--mixed] COMMIT
reset --hard COMMIT
Set index to HEAD. If PATHs are provided, only set those PATHs to HEAD. With the -p flag, interactively select the hunks to set to HEAD.

Move branch head to COMMIT. Neither the index nor the working directory are modified.

Move branch head to COMMIT and reset the index. The working directory is not modified.

Move branch head to COMMIT and reset index and working directory.

Use checkout to restore the working directory without changing the index.
none

none

none

none

Use revert to restore the working directory.
Mercurial does not have an equivalent to the Git index.
resolve FILE …
resolve -a
resolve -l
resolve -m FILE …
resolve -u FILE …
List all unresolved
revert [-n] COMMIT
revert [-n] COMMIT1..COMMIT2
Create one or more commits which reverse the effects of the COMMITs. If the -n the reversing changes are not committed but merely applied to the index and working directory.

Create one or more commits which reverse the effects of COMMIT1 up to but not including COMMIT2.
backout -r REV
rev-list COMMIT
rev-list COMMIT1 ^COMMIT2
Show commits which are ancestors of COMMIT in reverse chronological order.

Show commits which are ancestors of COMMIT1 and not ancestors of COMMIT2 in reverse chronological order.
rm [-f] FILE
rm -r DIR
rm --cached FILE
Remove files from the working tree and from the index. The -f option can be used to remove the files even if they have changes staged in the index.

Remove directories from the working tree and from the index.

Remove files from the index only.
remove [-f] PATH
remove -A PATH
remove --include PATTERN
Remove files. The -f option can be used to remove the files even if they have been modified or added.

With the -A flag remove files which are no longer in the working directory.

With the --include flag remove files which match PATTERN.
root
serve
shortlog [COMMIT1..COMMIT2] Summarize the commit history in a one-line-per-commit format. The commits are grouped by author. If a commit range is provided, it will include commits after COMMIT1 and up to and including COMMIT2.
show COMMIT:FILE Show blob. cat -r REV FILE
showconfig described above
show-ref List all references. none
stash [save [STR]]
stash show [STASH]
stash pop [STASH]
stash list
stash drop [STASH]
stash clear
Stash the changes in a dirty working dir. If STR is provided it is used as an identifier.

Show specified or latest stash.

Recover specified or latest stash.

List stashes.

Delete specified or latest stash.

Delete all stashes.
shelve
status [PATH …] Show paths in the working tree that differ from the index, paths in the index which differ from HEAD, and paths in the working directory which are not in the index or HEAD. Reports on all files unless PATHs are provided. status
submodule see manual Initialize, update or inspect submodules. subrepo
summary
tag
tag NAME [COMMIT]
tag -d TAG
List tags.

Create a tag. If COMMIT is not specified, HEAD is used.

Delete a tag.
tags
tag [-r REV] NAME
tag --remove NAME
tip
unbundle described above
update described above
verify described above
version Show git version. version Show Mercurial version.
_______________________________________________ ______________________________________________ ______________________________________________ ______________________________________________

Git and Mercurial Compared

git and hg compared: repositories and versions | files, directories, and commits | branches, tags, and merges | pulling and pushing | renamed files | identifiers | repository urls | config files | ignore files | hooks | metasyntactic variables

repositories and versions

A set of files and directories under version control is called a repository.

git

A file or directory under version control has one or more versions. One adds new versions to the repository by making a commit. The set of all files and directories in the repository can also be seen as having versions; these versions are called commits; they consist of at most one version of each file or directory in the repository.

hg

A file or directory under version control has one or more revisions. One adds new revisions to the repository by making a commit. The set of files and directories in the repository can also be seen as having revisions; these revisions are called changesets.

files, directories, and commits

git

Git keeps copies of all versions of files and directories that have been committed, as well as the commits themselves, in the directory .git/objects. All objects are identified by their 40 character SHA-1 checksum called the hash. There are three types of git objects in this directory: a blob, which is the contents of a file. A tree, which corresponds to file system directory, and which contains the file system name of the objects, which can be blobs (regular files) or trees (directories) and their hashes. Finally, a commit contains the top level tree for the commit and the parents of the commit. Their will be zero parents for the initial commit and more than one parent for a commit which was created by a merge. Git stores a separate, albeit compressed, copy of each version of a file, tree, or commit in the .git/objects directory.

The git cat-file -p HASH command, though not needed for day-to-day use, provides a way to inspect a git object. It shows the additional information stored in trees and commits which we have not mentioned here.

hg

Mercurial uses a storage format called a revlog to store the versions of a file. Most revlogs are kept in .hg/store/data. A revlog usually consists of two files: one with an .i suffix and another with a .d suffix. If the file is small and has little or no history, the revlog might consist of only a .i file. A revlog which tracks the history of a file is called a filelog. When the file is first committed, it is written to the filelog. Each time a commit is made which alters it, a delta describing the change is appended to the file. Thus, to fetch the current version of a file, all the deltas must be applied in order to the original version of the file. As a performance optimization, Mercurial will sometimes append the full version of the file to a filelog. Thus, when reconstructing the current version, one need only apply delta starting from the last time the full version was stored.

Mercurial stores a manifest for each revision of the repository. A reviion of the repository is called a changeset. The manifest is list of the pathnames relative to the root of all files in the changeset. Rather than store the manifests in separate files, all the manifests for the repository are stored in a revlog in .hg/store. Each time a new changeset is added to a repository by a push, pull, or commit command, it is assigned a local revision number which is the order in which it was appended to the local manifest revlog. If the changeset was pulled from a different repository, the local revision numbers might not match.

Information about changesets is also stored in the changelog, which is another type of revlog. The changelog has a pointer to manifest revision, pointers to parents of the changeset, and information about the committer.

branches, tags, and merges

git

Git has a low level feature called a ref which it uses to implement branches and tags. A ref is a file in .git/refs which contains the hash of a commit. Branches are in .git/refs/heads and tags are in .git/refs/tags. Whenever a commit is made, the value in .git/refs/heads/BRANCH is updated where BRANCH is the current branch. The values in .git/refs/tags/TAG do not change.

The name of the branch which is currently checked out is stored in .git/HEAD. It is stored as the relative path refs/head/NAME.

Git also stores remote branches and tags in .git/refs/remotes/REPO. The git branch -r command can be used to list remote branches. Remote branches have names of the form REPO/BRANCH, and each remote branch will usually have a tracking branch, which is a local branch named BRANCH. This will be the case for any branches which were copied when a repository is created via git clone. A tracking branch can also be created when a remote repository is added using git remote -t BRANCH REPO URL. git fetch will only update remote branches. git pull will update remote branches and merge them with their tracking branches.

The default branch is called master. It is created by git init, and is the branch that is copied by git clone if no branch is explicitly specified.

Commits have zero or more parent commits. git commit creates a commit with one parent, except in the case of the initial commit. git merge creates a commit with two or more parent commits. If the commit has three or more parents, the merge is called an octopus merge.

staging numbers:

To perform a merge Git gets the tree contained in the common ancestor and puts its items into the staging area with staging number 1. It puts the current branch tree items in the staging area with staging number 2. It puts the tree items of the other branches in the staging area with staging number 3 or higher.

fast-forward commits aren't actually commits:

Suppose that bar is a branch of foo. If commits have subsequently been made to foo but not to bar, then running the following when bar is the current branch will perform a fast-forward:

git merge foo

In a fast-forward no merge commit is created. Instead the head of bar is simply moved to point to the same commit as the head of foo.

hg

A Mercurial branch is a name which is stored in a changeset. When a commit is made, the new changeset inherits the branch name of the previous changeset, unless a different name was specified before the commit with hg branch. To switch to a new branch one must make a commit.

Mercurial branches differ from Git branches in that:

  • every commit belongs to a single branch
  • a branch can have multiple heads

Mercurial tags are names for changesets. They are stored in the .hgtags file at the repository root. Creating a tag requires making a commit.

Mercurial does not support octopus merges. Thus changesets have at most two parents. A changeset created by hg merge sets the branch of the new changeset to be the branch of the first argument.

Changesets can have no branch specified. This is also called the default branch.

bookmarks:

Mercurial bookmarks work like Git branches, with the exception that Mercurial does not have the equivalent of Git tracking branches.

pulling and pushing

git

The basic command for getting changes from a remote repository origin is:

$ git fetch

Which branches are fetched is controlled by the fetch key in the remote section of .git/config. If the local repository was created by git clone, here is a likely value:

[remote "origin"]
        fetch = +refs/heads/*:refs/remotes/origin/*

In this case, git fetch origin connects to the remote repository and copies all of the remote branches to refs/remotes/origin. Then it adds all remote objects referred to by the remote branches to the local objects database. It also puts the remote HEAD into FETCH_HEAD. The + indicates that local branches should be updated even if the commits are not fast-forwards.

The basic command for sending changes back to the remote repository origin is:

$ git push

Which branches are pushed is controlled by the push key in the remote section of .git/config. Here is an example entry which pushes commits on the master branch, and fails if the commits are not fast-forwards:

[remote "origin"]
        push = refs/heads/master:refs/heads/master

A git pull is a git fetch followed by a git merge FETCH_HEAD, which git fetch sets to whatever was in HEAD on the remote repository.

hg

hg pull pulls changesets for all the remote branches that are also local branches unless branches are listed explicitly with the -b flag. hg pull -u is equivalent to hg pull followed by hg update. Pulling can create local branches with multiple heads, in which case an hg update will fail. An hg merge is used to merge the two heads, or an hg commit --close-branch is used to mark one of them as closed.

hg push pushes changsets for all local branches that are also remote branches unless branches are listed explicitly with the -b flag. A push which would create a branch with multiple heads will fail unless the -f flag is used. The --new-branch flag must be used to create a new branch.

renamed files

It is desirable for a version control system to track file name changes. Otherwise commands like blame and log when used on a single path will not show activity before the name change. If the version control system is aware of a name change, it can correctly handle the case when merging where the name was changed on one branch and edited on the other.

git

Although Git provides a git mv subcommand, it does not actually track name changes. Instead, it will assume that a name change occurred during a commit when one file disappeared, another appeared, and they have similar contents. Hence, even if the user uses git rm, a Unix command mv, and git add, Git will preserve the history for the file.

hg

Mercurial keeps track of the name a file had in each revision of a filelog. The hg rename subcommand must be used to preserve history.

identifiers

git

Git has three types of objects: commits, trees, and blobs. Each is assigned a unique hash ID which is a 40 digit hex string. The identifier is called the hash, SHA1, object name, or object identifier with no difference in meaning. When the underlying object is a commit or tree it is also called a tree-ish.

Commit hashes are the hashes the user most commonly sees and needs to reference. Only as many of the digits that are necessary to uniquely identify an object in the object database need to be provided to a git command; usually the first 6 or 7 is sufficient.

HEAD is a special name which refers to the most recent commit of the current branch. It is stored in .git/HEAD. The previous commit is HEAD^ and the commit before that is HEAD^^. The is also numerical notation: HEAD~4 is 4 commits ahead of HEAD. If HEAD is the result of a merge, then the antecedents can be be referenced with HEAD^1 and HEAD^2.

hg

In Mercurial, every commit is assigned two identifiers: a local revision number and a universal changeset identifier. The local revision number is a small integer that is unique only to the local repository. The first local revision number issued is zero, and it increments with each local commit. The changeset identifier is a twelve digit hex number which is unique across all repositories.

The -r option is used to pass a mercurial commit identifier to a command. The argument can be a local revision number, a changeset identifier, or both separated by a colon.

repository urls

git

protocol format
ssh ssh://[user@]host.xz[:port]/path/to/repo.git/
[user@]host.xz:path/to/repo.git/
git git://host.xz[:port]/path/to/repo.git/
http http[s]://host.xz[:port]/path/to/repo.git/
ftp ftp[s]://host.xz[:port]/path/to/repo.git/
rsync rsync://host.xz/path/to/repo.git/
local /path/to/repo.git/
file:///path/to/repo.git/

hg

local/filesystem/path[#revision]
file://local/filesystem/path[#revision]
http://[user[:pass]@]host[:port]/[path][#revision]
https://[user[:pass]@]host[:port]/[path][#revision]
ssh://[user@]host[:port]/[path][#revision]

config files

git

  • .gitconfig

mercurial

  • .hgrc

ignore files

git

man gitignore

A list of file patterns, one per line. The patterns specify files that git status and git add should ignore. Shell glob syntax (i.e. the asterisk: *) can be used.

A .gitignore can be placed in any directory in the repository. The rules in a given .giitignore file will only apply to the current directory and the directories beneath it.

Lines starting with a pound sign: # are ignored.

A pattern starting with an exclamation point: ! will negate a pattern. This can be used to include files that were excluded by a pattern higher in the file matching a broader set of files.

hg

Unlike .gitignore, an .hgignore file must be in the root of the working directory.

The format is one Perl regular expression per line. All files which match the regular expression will be ignored.

Comments start with the pound sign: #

It is also possible to use glob syntax:

# regexp to ignore twiddle files:
~$

# glob to ignore compiled python files:
syntax: glob
*.pyc

# additional patterns will use regexp format:
syntax: regexp

hooks

git

hg

metasyntactic variables

In subcommand usage we use the following metasyntactic variables:

git

BRANCH the name of a branch.
CMD the name of a version control command: the first argument of the base command.
COMMIT the HASH for a commit. A commit can be referenced indirectly via a branch or tag name or via commit notation. The symbolic references HEAD or FETCH_HEAD can also be used to reference commits.
DIR a directory on the file system. In some cases it must exist; in others it will be created.
FILE a regular file on the file system. In some cases it must exist; in others it will be created.
HASH a 40 digit hex string used as an identifier for something in the object database.
HEAD the literal string HEAD.
NAME a name for an entity which will be created. Usually there are restrictions on the characters that can be used.
PATH a path on the file system. In some cases it must exist; in others it will be created.
PATHSPEC like a file glob pattern, except that ? and * can match the directory separator: /
REFSPEC [+]SRC_REF:DEST_REF where SRC_REF and DEST_REF are ref paths relative to the .git directory. SRC_REF is on the remote repository in a fetch or a pull and on the local repository in a push.

An asterisk can be used in place of a component of the relative path to match everything in the directory. If the SRC_REF has an asterisk, the DEST_REF must also have one.

A plus sign prefix + is used to indicate that the update should be made even when it is not a fast-forward.

If the SRC_REF is the empty string, then the DEST_REF is deleted.

Leading components of SRC_REF or DEST_REF can be omitted if no ambiguity results.
REMOTE the name of a remote.
REPO A REMOTE or a URL.
STASH stash identifier format: stash@{0}, stash@{1}, …
STR a string. There are no restrictions on the characters that can be used, but if they include whitespace or characters special to the shell they must be escaped or quoted.
TREEISH the HASH for a tree, a commit, or a tag. If the HASH is for a commit or a tag the tree in the commit is used.
URL a url for a repository.

hg

BRANCH the name of a branch.
CMD the name of a version control command: the first argument of the base command.
DIR a directory on the file system. In some cases it must exist; in others it will be created.
FILE a regular file on the file system. In some cases it must exist; in others it will be created.
NAME a name for an entity which will be created. Usually there are restrictions on the characters that can be used.
PATH a path on the file system. In some cases it must exist; in others it will be created.
PATTERN a file glob pattern. The metacharacters ?, *, and ** are supported.
REV the revision number for a changeset. It can be either the local revision number, which is a small decimal integer, or the 12 hex digit universal revision number.
SOURCE A URL or a name for a URL in the [paths] section of the .hg/hgrc file
STR a string. There are no restrictions on the characters that can be used, but if they include whitespace or characters special to the shell they must be escaped or quoted.
URL a url for a repository.

Version Control, Archiving, and Patching Tools

sccs | diff | cpio | diff3 | ar | tar | rcs | patch | zip | cvs | p4 | jar | rsync | svn | bzr

sccs (1972)

In his 1975 paper Rochkind describes SCCS as a "radical departure from conventional methods for controlling source code". SCCS was initially implemented in 1972 on the IBM 370. The implementation language was SNOBOL. Rochkind was an employee of Bell Laboratories and SCCS was soon ported to Unix where it became a cornerstone of the "Programmer's Workbench", a suite of software distributed with early Unix.

The radical departure of SCCS appears to be the decision to store every version of each file under source control. This is done in a space efficient manner by means of deltas: the original file is stored with a delta for each change. To get the most recent version of the file all of the deltas must be applied to the original file. Also stored with each delta is the name of the user who made the change, the date and time of the change, and a user supplied comment explaining the change.

SCCS introduces a file format so that the original file, the deltas, and the meta-information can all be stored in a single history file. If the original file was foo.c, a common early convention was for the history file to be named s.foo.c. In the original Unix implementation the SCCS commands were standalone Unix commands. Starting with the version of SCCS which Allman wrote for BSD Unix in 1980 the SCCS commands became arguments or subcommands to a sccs executable.

Here is an sample SCCS session. The file foo.txt is put under source control. It is then checked out, edited, and the change committed. Finally a non-editable copy of the most recent version is checked out.

$ echo "foo" > foo.txt
$ sccs admin -ifoo.txt s.foo.txt
$ rm foo.txt
$ sccs get -e s.foo.txt
$ vi foo.txt
$ sccs delta s.foo.txt
$ sccs get -p s.foo.txt > foo.txt

The SCCS history file format consists of fields separated by the Ctrl-A (ASCII 1) characters. The fields are divided into headers, which contain the meta-information, and the body, which contains the original file and the deltas. The original file is given revision number 1, and the number is incremented with each change.

The body consists of the original file interspersed with nested insert blocks and delete blocks. The format for an insert block is

^AI REV
added line one
added line two
...
^AE REV

where REV is the revision number for which the lines were added. Similarly the format for a delete block is

^AD REV
deleted line one
deleted line two
...
^AE REV

When extracting a version of the file, the desired version is compared with each block. Insert blocks are ignored if they have a higher number than the desired version and delete blocks are ignored if they have a lower or equal number than the desired version.

diff (1974)

To implement an efficient version control system one needs to find a minimal delta or difference between two similar text files. The problem led to the development of the Unix diff utility. Regarding a file as a sequence of lines, the problem can be treated as an example of the longest common subsequence problem. The standard solution to this problem has O(nm) performance in both time and space, where n and m are the lengths of the two files. To facilitate quick comparison of lines, each line is replaced with a hash code. When implementing diff McIlroy developed an algorithm that was more efficient than the standard solution in most cases.

The standard diff notation prefixes lines with < and > to indicate whether the line originated in the first or second file. It also uses the letters a, c, and d to indicate lines being added, changed, or deleted:

$ echo "foo" > foo.txt

$ echo "bar" > bar.txt

$ diff foo.txt bar.txt 
1c1
< foo
---
> bar

$ diff foo.txt /dev/null
1d0
< foo

$ diff /dev/null foo.txt 
0a1
> foo

These letters used in diff notation are also ed commands. In fact, diff -e will output an ed script which can be used to convert the first file into the second:

$ diff -e foo.txt bar.txt > diff.ed

$ ( cat diff.ed ; echo "w" ) | ed foo.txt

The version of diff released with BSD 2.8 in 1981 added the -c option to show three lines of context around each change. This is called the context format.

The BSD 2.8 diff also added an -r option to perform a recursive diff on directories.

In 1990 the -u option was added, which gives a diff inunified format. In the context format, if a line is changed, the context is repeated: once around the old version of the line and once around the new. The uniformed format puts both version of the line in the same context, reducing the size of the diff file.

The -C NUM and -U NUM options are like the -c} and {{-u options, except that they show NUM lines of context.

normal format:

$ sed s/^root:/ROOT:/ /etc/passwd > /tmp/passwd

$ diff /etc/passwd /tmp/passwd
12c12
< root:*:0:0:System Administrator:/var/root:/bin/sh
---
> ROOT:*:0:0:System Administrator:/var/root:/bin/sh

ed script format:

$ sed s/^root:/ROOT:/ /etc/passwd > /tmp/passwd

$ diff -e /etc/passwd /tmp/passwd
12c
ROOT:*:0:0:System Administrator:/var/root:/bin/sh
.

context format:

$ sed s/^root:/ROOT:/ /etc/passwd > /tmp/passwd

$ diff -c /etc/passwd /tmp/passwd
*** /etc/passwd    2013-10-24 17:38:39.000000000 -0700
--- /tmp/passwd    2014-04-26 12:57:57.000000000 -0700
***************
*** 9,15 ****
  # Open Directory.
  ##
  nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
! root:*:0:0:System Administrator:/var/root:/bin/sh
  daemon:*:1:1:System Services:/var/root:/usr/bin/false
  _uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
  _taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
--- 9,15 ----
  # Open Directory.
  ##
  nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
! ROOT:*:0:0:System Administrator:/var/root:/bin/sh
  daemon:*:1:1:System Services:/var/root:/usr/bin/false
  _uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
  _taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false

unified format:

$ sed s/^root:/ROOT:/ /etc/passwd > /tmp/passwd

$ diff -u /etc/passwd /tmp/passwd
--- /etc/passwd    2013-10-24 17:38:39.000000000 -0700
+++ /tmp/passwd    2014-04-26 12:57:57.000000000 -0700
@@ -9,7 +9,7 @@
 # Open Directory.
 ##
 nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
-root:*:0:0:System Administrator:/var/root:/bin/sh
+ROOT:*:0:0:System Administrator:/var/root:/bin/sh
 daemon:*:1:1:System Services:/var/root:/usr/bin/false
 _uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
 _taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false

recursive format:

$ mkdir /tmp/a /tmp/b

$ cp /etc/passwd /tmp/a

$ $ sed s/^root:/ROOT:/ /etc/passwd > /tmp/b/passwd

$ diff -r /tmp/a /tmp/b
diff -r /tmp/a/passwd /tmp/b/passwd
12c12
< root:*:0:0:System Administrator:/var/root:/bin/sh
---
> ROOT:*:0:0:System Administrator:/var/root:/bin/sh

cpio (1977)

An ancient and to most people unfamiliar Unix archiving tool which is roughly equivalent to tar. The suffix .cpio is often used for cpio archive files.

The format is used by RPM packages, though RPM 5.0 and later also support the xar format. The Linux kernel since version 2.6 has a cpio archive called initramfs which it uses during the boot process. cpio is also used by the Mac OS X .pkg format.

The cpio file format is similar to the tar file format in that for each file which is added to an archive, a header and the file contents are appended to the archive file. In the case of cpio the header is smaller (76 bytes vs 512 bytes). This is in part because the header only contains the file name length; the actual file name is appended to the archive file between the header and the file contents. By contrast the tar format stores the name in fixed length fields, putting a limit on the possible path length. Another different is the cpio format lacks a checksum.

header format
offset length field description
0 6 c_magic The identifying value "070707"
6 6 c_dev
12 6 c_ino c_dev and c_ino together must be unique for each file in the archive
18 6 c_mode
24 6 c_uid
30 6 c_gid
36 6 c_nlink number of links to the file in the archive; can be incorrect if the -a flag was used to append files
42 6 c_rdev a place for implementations to store character or block special file information
48 11 c_mtime
59 6 c_namesize
65 11 c_filesize

Another difference between tar and cpio is that whereas tar takes the files to be archived on the command line, recursively descending any arguments which are directories, cpio when used with the -o flag takes its list of files to be archived from standard input. cpio was designed to be used with the find command. Similarly when using the i flag cpio reads the files to be extracted from an archive from standard input.

diff3 (1979)

diff3 displays the differences between three versions of the same file.

The three way diff is the foundation of branch merging. A two way diff is insufficient for merging because deleting a line in one branch looks like adding a line in the other branch. Only by comparing both branches with the original can these two cases be distinguished.

diff3 has three basic invocations:

diff3 EDIT1 ORIG EDIT2
diff3 -e EDIT1 ORIG EDIT2
diff3 -m EDIT1 ORIG EDIT2

The first invocation writes a description of the three-way diff to standard out.

The second invocation writes an ed script to standard out which will merge the changes in EDIT2 to EDIT1.

The third invocation performs the merge. It writes a version of the file with changes from both EDIT1 and EDIT2 to standard out.

Here is an example of the output format used by the first invocation:

$ cat /tmp/orig.txt 
a
b
c
d
e

$ cat /tmp/edit1.txt 
a
b1
c
d
e
f

$ cat /tmp/edit2.txt 
a
b
c
d1
e

$ diff3 /tmp/edit1.txt /tmp/orig.txt /tmp/edit2.txt
====1
1:2c
  b1
2:2c
3:2c
  b
====3
1:4c
2:4c
  d
3:4c
  d1
====1
1:6c
  f
2:5a
3:5a

Each hunk of the diff3 output starts with four hyphens. All of the hunks in the example above are two-way hunks, meaning that two of the three files are the same. In this case the number of the differing file as it appears in the diff3 arguments is placed after the hyphens.

Here is an example of a three-way hunk, where all three files differ and no number is placed after the hyphens:

$ cat /tmp/orig.txt 
a

$ cat /tmp/edit1.txt                               
a1

$ cat /tmp/edit2.txt 
a2

$ diff3 /tmp/edit1.txt /tmp/orig.txt /tmp/edit2.txt
====
1:1c
  a1
2:1c
  a
3:1c
  a2

ar (1979)

A tool on Unix systems to create static libraries from compiled objects. In other words, to create a .a file from a set of .o files. The format is understood by the linker—which these days is usually built into the compiler—and the loader ld.

The command line interface is broadly similar to tar. Here is how to create an archive; remove files from an archive; list the archive contents; extract files from an archive:

ar -c NAME.a FILE ...
ar -d ARCHIVE FILE ...
ar -t ARCHIVE
ar -x ARCHIVE FILE ...

The ar file format is not standardized and may differ between systems.

The file format used by GNU ar on Linux starts with the new line terminated string "!<arch>".

Each file starts with a 60 bytes header, followed by the file contents. The header has the following fixed-width fields:

offfset length name
0 16 file name in ASCII
16 12 file modification timestamp
28 6 uid
34 6 gid
40 8 file mode
48 10 file size in bytes
58 2 0x60 0x0A

The space allocated for the file name in the header is quite short. GNU ar actually stores a special file named "//" in the archive with a new line separated list of file names. A header can reference a name in this special file by storing "/" and a the decimal offset in the "//" file of the file name. When file names are stored directly in the header, a "/" is used to mark the end of the file and the rest of the field is space padded. This supports spaces in the file name.

GNU ar also stores a special file named "/" is the archive for a symbol table. The format is

  • a 32-bit integer containing the number of symbols
  • a list of 32-bit integers, one for each symbol, containing the offset of the header in the archive for the file containing the symbol
  • a list of null terminated strings, in the same order as the previous list, containing the symbol names

tar (1979)

The more portable twin of ar. Originally used for creating and using magnetic tape archives.

How to create a tar file; list the contents of a tar file; compare a tar file with the file system; and extract the contents of a tar file:

tar [-]cf NAME.tar DIR
tar [-]tf TARFILE
tar [-]df TARFILE [DIR]
tar [-]xf TARFILE

The -v option can be used with -c or -x to write the files being added or extracted to standard error.

Tar files store the files in sequential order. Each file is precede by a 512 byte header. The file itself is null byte padded to a multiple of 512 bytes.

Tar can write to and read from stdout. The following two invocations behave identically:

tar cf - . | (cd DIR ; tar xf -)
tar cf . - | tar xf - -C DIR

Tar can append data to an existing tar file. These commands append the contents of a directory to a tar file; append the contents of the directory which are newer than what is already on a tarfile; append subsequent tar files to the first tar file:

tar [-]rf TARFILE DIR
tar [-]uf TARFILE DIR
tar [-]Af TARFILE1 TARFILE2 ...

How to create a compressed tar file:

tar [-]czf NAME.tar.gz
tar [-]cjf NAME.tar.bz2
tar [-]cJf NAME.tar.xz

In 1988 POSIX extended the format of the header block in a backwardly compatible way. Additional header type flags were added in 2001.

header format
offset length original format ustar
0 100 file name
100 8 file mode
108 8 owner user id
116 8 group id
124 12 file size in bytes
136 12 last modification time
148 8 header checksum
156 1 type flag
157 100 name of linked file
257 6 "ustar"
263 2 "00"
265 32 owner user name
297 32 group name
329 8 device major number
337 8 device minor number
345 155 filename prefix
header type flags
flag original meaning ustar 2001
'\0' normal file
'0' normal file
'1' hard line
'2' symlink
'3' character device
'4' block device
'5' directory
'6' FIFO
'7' contiguous file
'g' global extended header
'x' extended header for the next file

rcs (1982)

RCS works in a similar manner to SCCS. There is a history file which is indicated with a ,v suffix. Thus, the history file for foo.txt would be foo.txt,v. The RCS commands take the original file as an argument instead of the history file like in SCCS. RCS supports multiline commit messages and it adds the rlog command for getting all the commit messages for a file. RCS has always been freely available software, a factor which has promoted its use over SCCS.

Here is a sample work session using RCS. It is equivalent to the SCCS work session in the previous section.

$ echo "foo" > foo.txt
$ ci foo.txt
$ co -l foo.txt
$ vi foo.txt
$ ci foo.txt
$ co foo.txt

Examining an RCS history reveals some improvements in the implementation over SCCS. First of all, ampersands (@) are used instead of Ctrl-A to demarcate sections of the file. Ampersands in the data are escaped by doubling them. This makes the history files easier to inspect at the command line.

Another change is that the current version of the file is stored in its entirety. Older revisions are obtained by applying a chain of reverse diffs. The advantage of this design is that it is optimized for the common case of fetching the current version.

Here is an example of adding two lines after line 6:

@a6 2
added line one
added line two
@

Here is an example of deleting two lines after line 6:

@d6 2
@

patch (1985)

The patch command can apply the output of diff to the file that was the first argument of diff to recover the file that was the second argument of diff. patch reads the output of diff from standard input:

$ echo "foo" > foo.txt
$ echo "bar" > bar.txt 
$ diff foo.txt bar.txt > foo.patch
$ patch foo.txt < foo.patch 
patching file foo.txt
$ cat foo.txt 
bar

The above is only a slight improvement over what could have been achieved with diff -e and ed. The novelty of patch is its ability to apply a patch file to an entire directory:

$ mkdir old
$ echo "bar" > old/bar.txt
$ echo "baz" > old/baz.txt
$ cp -R old new
$ echo "qux" > new/bar.txt
$ diff -Naur old new > foo.patch
$ rm -rf fnew
$ patch -Np0 < foo.patch
patching file old/bar.txt
$ cat old/bar.txt 
qux

This is a good way to create a patch file:

diff -Naur OLD NEW

When creating the patch file with diff, the -u or -c flags seem to be necessary so that patch has the file names. The -N flag is necessary if files are added or removed. The -a flag prevents diff from skipping files which it thinks are binary.

If the diff was performed outside of the directories, then the patch should be performed outside of the directory to be patched with the -p0 flag. Optionally the patch can be performed inside the directory to be patched with the -p1 flag. The -N flag instructs patch to not make a change if the patch appears to be reversed or already applied.

zip (1989)

zip combines file compression and archiving. It is a better choice for sharing files with Windows hosts than tar, which most Windows hosts don't have installed.

zip [-r] [-0] ARCHIVE FILE ...
zip -d ARCHIVE FILE ...
zip -u ARCHIVE [FILE ...]

unzip -l ARCHIVE
unzip ARCHIVE [FILE ...]

Compression is the DEFLATE algorithm, or no compression if the -0 flag is used.

zip stores the file name, file size, and last modification time of the file. The information is in a header which precedes the file itself and in the "central directory" at the end of the file.

By default zip does not recursively descend directories, adding their contents to the archive. Use the -r flag to get this behavior.

cvs (1990)

CVS was the first popular revision control system with a client-server architecture. The client would have a local copy of a recent version of the source code and only the server would have the complete version history. This made CVS somewhat cleaner to work with than RCS or SCCS which keep history files on the filesystem for the client to see. It also permitted developers to collaborate without logging in to the same machine. The CVS client-server protocol communicated over rsh and later over ssh. The well known port number for a CVS server is 2401.

CVS also enabled a user to commit several files together. Multiple file commits are sometimes necessary to keep the source code consistent after each commit. The definition of consistency varies from project to project, but C developers want the source code to compile without errors, for example. Although CVS permits a user to submit changes to several files with a single command, the file system operation performed by the server is not actually atomic.

Setting up a CVS server is a bit of a bother and I'm not aware of any free CVS hosting services. As a result, it is difficult these days to experiment with CVS even though the client is still installed by default on Mac OS X. There are GNU projects which still use CVS. One can register at savannah.gnu.org and upload a public SSH key to participate in a project. One can perform an anonymous checkout of source like this:

cvs -z3 -d:pserver:anonymous@cvs.savannah.gnu.org:/sources/emacs co emacs

p4 (1995)

Perforce has a client-server model. It supports atomic commits. It provides the ability to create and, unlike CVS, merge branches.

Perforce has a reputation for being able to handle large projects. Licenses are several hundred dollars per user.

jar (1995)

jar supports some of the tar commands:

jar cf NAME.jar DIR
jar tf JARFILE
jar xf JARFILE
jar uf JARFILE DIR

jar can write to and read from stdout; the syntax is different from tar:

jar c . | (cd DIR ; jar x)
jar c . | jar x -C DIR

Use jar -e to make a jar file runnable by java. The argument to -e is a class with a main routine which will be used as the entry point.

$ mkdir

$ cat > foo/A.java
package foo;

public class A {
    public static void main(String[] args) {
        System.out.println("A");
    }
}

$ sed s/A/B/ foo/A.java > foo/B.java

$ javac foo/*.java

$ jar cef foo.A foo.jar foo

$ java -jar foo.jar        
A

A jar file is a zip file; unzip can also be used to extract the contents. jar stores extra information about the jar file in META-INF/MANIFEST.MF:

$ unzip foo.jar

$ cat META-INF/MANIFEST.MF 
Manifest-Version: 1.0
Created-By: 1.6.0_26 (Sun Microsystems Inc.)
Main-Class: foo.A

rsync (1996)

A tool for copying files and directories between hosts. Usually it uses ssh. It is faster than scp when some of the files are already on the destination or when copying files that have been modified.

Here is the usage for putting and getting:

  rysnc -a PATH ... HOST:PATH
  rsync -a HOST:'PATH ...' PATH

The -a flag is equivalent to the flags -rptoglD which (1) recursively copy the contents of directories, (2) copy file permissions, (3) copy file times, (4) copy owner, (5) copy group, (6) copy symlinks, and (7) copy special devices.

Other useful flags are -v for verbose mode and --exclude which takes a file glob pattern to specify files to skip.

If the source and target paths have the same basename, then rsync will copy the contents of the source into the contents of the target. If the basenames are different, rsync will create a directory with the same name as the source inside the target. This behavior can be suppressed by putting a trailing slash / on the end of the source.

rsync can be used to backup a directory on a remote host. With the --backup flag, files which are already on the destination but have been modified on the source will be copied into a separate incremental directory with a tilde (~) suffix. The --backup-dir flag can be used to specify a different incremental directory.

svn (2000)

SVN has a client-server model. SVN replaced CVS as the most popular VCS sometime after 2004. As of 2013 it is still the most widely used VCS, being twice as likely to be used as Git and six times as likely to be used as Mercurial.

bzr (2005)

To get a list of common commands; to get help on a specific command:

bzr help
bzr help commit

To make a commit it is necessary to register a name and an email address:

bzr whoami "Joe Foo <joe@foo.com>"
content of this page licensed under creative commons attribution-sharealike 3.0