======================================================================
=                                Git                                 =
======================================================================

                            Introduction
======================================================================
Git () is a distributed version control system that tracks versions of
files. It is often used to control source code by programmers
collaboratively developing  software.

Git's goals include speed, data integrity, and support for
distributed, non-linear workflows (thousands of parallel branches
running on different computers). Git was originally authored by Linus
Torvalds in 2005 for the development of the Linux kernel, with other
kernel developers contributing to its initial development. It was
prompted by the revocation of the free license of BitKeeper, the
proprietary source-control management system used for Linux kernel
development since 2002. Since 2005, Junio Hamano has been the core
maintainer of Git. As with most other distributed version control
systems, and unlike most client-server systems, every Git directory on
every computer is a full-fledged repository with complete history and
full version-tracking abilities, independent of network access or a
central server. Git is a free and open-source software shared under
the GPL-2.0-only license.

Git's design benefits from Torvalds' experience with Linux and
file-system performance, leading to features such as support for
non-linear development, efficient handling of large projects, and
cryptographic authentication of history. Its toolkit-based design
allows for pluggable merge strategies and flexibility in managing
version control tasks. Despite its comprehensive feature set, Git has
faced security challenges, leading to updates and patches that address
vulnerabilities. The trademark "Git" is registered by the Software
Freedom Conservancy, marking its official recognition and continued
evolution in the open-source community.

Git's adoption has grown rapidly, becoming the most popular
distributed version control system, with nearly 95% of developers
reporting it as their primary version control system as of 2022. It is
the most widely used source-code management tool among professional
developers. There are offerings of Git repository services, including
GitHub, SourceForge, Bitbucket and GitLab.


                              History
======================================================================
Git development was started by Torvalds in April 2005 when the
proprietary source-control management (SCM) system used for Linux
kernel development since 2002, BitKeeper, revoked its free license for
Linux development. The copyright holder of BitKeeper, Larry McVoy,
claimed that Andrew Tridgell had created SourcePuller by reverse
engineering the BitKeeper protocols. The same incident also spurred
the creation of another version-control system, Mercurial.

Torvalds wanted a distributed system that he could use like BitKeeper,
but none of the available free systems met his needs. He cited an
example of a source-control management system needing 30 seconds to
apply a patch and update all associated metadata, and noted that this
would not scale to the needs of Linux kernel development, where
synchronizing with fellow maintainers could require 250 such actions
at once. For his design criterion, he specified that patching should
take no more than three seconds, and added three more goals:
* Take the Concurrent Versions System (CVS) as an example of what
'not' to do; if in doubt, make the exact opposite decision.
* Support a distributed, BitKeeper-like workflow.
* Include very strong safeguards against corruption, either accidental
or malicious.

These criteria eliminated every version-control system in use at the
time, so immediately after the 2.6.12-rc2 Linux kernel development
release, Torvalds set out to write his own.

The development of Git began on 3 April 2005. Torvalds announced the
project on 6 April and became self-hosting the next day. The first
merge of multiple branches took place on 18 April. Torvalds achieved
his performance goals; on 29 April, the nascent Git was benchmarked
recording patches to the Linux kernel tree at a rate of 6.7 patches
per second. On 16 June, Git managed the kernel 2.6.12 release.

Torvalds turned over maintenance on 26 July 2005 to Junio Hamano, a
major contributor to the project. Hamano was responsible for the 1.0
release on 21 December 2005.


Naming
========
Torvalds sarcastically quipped about the name 'git' (which means
"unpleasant person" in British English slang): "I'm an egotistical
bastard, and I name all my projects after myself. First 'Linux', now
'git'." The man page describes Git as "the stupid content tracker".

The read-me file of the source code elaborates further:



The source code for Git refers to the program as "the information
manager from hell".


Characteristics
=================
Git's design is a synthesis of Torvalds's experience with Linux in
maintaining a large distributed development project, along with his
intimate knowledge of file-system performance gained from the same
project and the urgent need to produce a working system in short
order. These influences led to the following implementation choices:
; Strong support for non-linear development: Git supports rapid
branching and merging, and includes specific tools for visualizing and
navigating a non-linear development history. In Git, a core assumption
is that a change will be merged more often than it is written, as it
is passed around to various reviewers. In Git, branches are very
lightweight: a branch is only a reference to one commit.
; Distributed development: Like Darcs, BitKeeper, Mercurial, Bazaar,
and Monotone, Git gives each developer a local copy of the full
development history, and changes are copied from one such repository
to another. These changes are imported as added development branches
and can be merged in the same way as a locally developed branch.
; Compatibility with existing systems and protocols: Repositories can
be published via Hypertext Transfer Protocol Secure (HTTPS), Hypertext
Transfer Protocol (HTTP), File Transfer Protocol (FTP), or a Git
protocol over either a plain socket or Secure Shell (ssh). Git also
has a CVS server emulation, which enables the use of existing CVS
clients and IDE plugins to access Git repositories. Subversion
repositories can be used directly with git-svn.
; Efficient handling of large projects: Torvalds has described Git as
being very fast and scalable, and performance tests done by Mozilla
showed that it was an order of magnitude faster diffing large
repositories than Mercurial and GNU Bazaar; fetching version history
from a locally stored repository can be one hundred times faster than
fetching it from the remote server.
; Cryptographic authentication of history: The Git history is stored
in such a way that the ID of a particular version (a 'commit' in Git
terms) depends upon the complete development history leading up to
that commit. Once it is published, it is not possible to change the
old versions without it being noticed. The structure is similar to a
Merkle tree, but with added data at the nodes and leaves. (Mercurial
and Monotone also have this property.)
; Toolkit-based design: Git was designed as a set of programs written
in C and several shell scripts that provide wrappers around those
programs. Although most of those scripts have since been rewritten in
C for speed and portability, the design remains, and it is easy to
chain the components together.
; Pluggable merge strategies: As part of its toolkit design, Git has a
well-defined model of an incomplete merge, and it has multiple
algorithms for completing it, culminating in telling the user that it
is unable to complete the merge automatically and that manual editing
is needed.
; Garbage accumulates until collected: Aborting operations or backing
out changes will leave useless dangling objects in the database. These
are generally a small fraction of the continuously growing history of
wanted objects. Git will automatically perform garbage collection when
enough loose objects have been created in the repository. Garbage
collection can be called explicitly using git gc.
; Periodic explicit object packing: Git stores each newly created
object as a separate file. Although individually compressed, this
takes up a great deal of space and is inefficient. This is solved by
the use of 'packs' that store a large number of objects
delta-compressed among themselves in one file (or network byte stream)
called a 'packfile'. Packs are compressed using the heuristic that
files with the same name are probably similar, without depending on
this for correctness. A corresponding index file is created for each
packfile, telling the offset of each object in the packfile. Newly
created objects (with newly added history) are still stored as single
objects, and periodic repacking is needed to maintain space
efficiency. The process of packing the repository can be very
computationally costly. By allowing objects to exist in the repository
in a loose but quickly generated format, Git allows the costly pack
operation to be deferred until later, when time matters less, e.g.,
the end of a workday. Git does periodic repacking automatically, but
manual repacking is also possible with the git gc command. For data
integrity, both the packfile and its index have an SHA-1 checksum
inside, and the file name of the packfile also contains an SHA-1
checksum. To check the integrity of a repository, run the git fsck
command.

Another property of Git is that it snapshots directory trees of files.
The earliest systems for tracking versions of source code, Source Code
Control System (SCCS) and Revision Control System (RCS), worked on
individual files and emphasized the space savings to be gained from
interleaved deltas (SCCS) or delta encoding (RCS) the (mostly similar)
versions. Later revision-control systems maintained this notion of a
file having an identity across multiple revisions of a project.
However, Torvalds rejected this concept. Consequently, Git does not
explicitly record file revision relationships at any level below the
source-code tree.

These implicit revision relationships have some significant
consequences:
* It is slightly more costly to examine the change history of one file
than the whole project. To obtain a history of changes affecting a
given file, Git must walk the global history and then determine
whether each change modified that file. This method of examining
history does, however, let Git produce with equal efficiency a single
history showing the changes to an arbitrary set of files. For example,
a subdirectory of the source tree plus an associated global header
file is a very common case.
* Renames are handled implicitly rather than explicitly. A common
complaint with CVS is that it uses the name of a file to identify its
revision history, so moving or renaming a file is not possible without
either interrupting its history or renaming the history and thereby
making the history inaccurate. Most post-CVS revision-control systems
solve this by giving a file a unique long-lived name (analogous to an
inode number) that survives renaming. Git does not record such an
identifier, and this is claimed as an advantage. Source code files are
sometimes split or merged, or simply renamed, and recording this as a
simple rename would freeze an inaccurate description of what happened
in the (immutable) history. Git addresses the issue by detecting
renames while browsing the history of snapshots rather than recording
it when making the snapshot. (Briefly, given a file in revision 'N', a
file of the same name in revision 'N' − 1 is its default ancestor.
However, when there is no like-named file in revision 'N' − 1, Git
searches for a file that existed only in revision 'N' − 1 and is very
similar to the new file.) However, it does require more CPU-intensive
work every time the history is reviewed, and several options to adjust
the heuristics are available. This mechanism does not always work;
sometimes a file that is renamed with changes in the same commit is
read as a deletion of the old file and the creation of a new file.
Developers can work around this limitation by committing the rename
and the changes separately.

Git implements several merging strategies; a non-default strategy can
be selected at merge time:
* 'resolve': the traditional three-way merge algorithm.
* 'recursive': This is the default when pulling or merging one branch,
and is a variant of the three-way merge algorithm.
* 'octopus': This is the default when merging more than two heads.


Data structures
=================
Git's primitives are not inherently a source-code management system.
Torvalds explains:


From this initial design approach, Git has developed the full set of
features expected of a traditional SCM, with features mostly being
created as needed, then refined and extended over time.


Git has two data structures: a mutable 'index' (also called 'stage' or
'cache') that caches information about the working directory and the
next revision to be committed; and an immutable, append-only 'object
database'.

The index serves as a connection point between the object database and
the working tree.

The object store contains five types of objects:
* A blob is the content of a file. Blobs have no proper file name,
time stamps, or other metadata (a blob's name internally is a hash of
its content). In Git, each blob is a version of a file, in which is
the file's data.
* A tree object is the equivalent of a directory. It contains a list
of file names, each with some type bits and a reference to a blob or
tree object that is that file, symbolic link, or directory's contents.
These objects are a snapshot of the source tree. (In whole, this
comprises a Merkle tree, meaning that only a single hash for the root
tree is sufficient and actually used in commits to precisely pinpoint
to the exact state of whole tree structures of any number of
sub-directories and files.)
* A commit object links tree objects together into history. It
contains the name of a tree object (of the top-level source
directory), a timestamp, a log message, and the names of zero or more
parent commit objects.
* A tag object is a container that contains a reference to another
object and can hold added meta-data related to another object. Most
commonly, it is used to store a digital signature of a commit object
corresponding to a particular release of the data being tracked by
Git.
* A packfile object collects various other objects into a
zlib-compressed bundle for compactness and ease of transport over
network protocols.

Each object is identified by a SHA-1 hash of its contents. Git
computes the hash and uses this value for the object's name. The
object is put into a directory matching the first two characters of
its hash. The rest of the hash is used as the file name for that
object.

Git stores each revision of a file as a unique blob. The relationships
between the blobs can be found through examining the tree and commit
objects. Newly added objects are stored in their entirety using zlib
compression. This can consume a large amount of disk space quickly, so
objects can be combined into 'packs', which use delta compression to
save space, storing blobs as their changes relative to other blobs.

Additionally, Git stores labels called refs (short for references) to
indicate the locations of various commits. They are stored in the
reference database and are respectively:
* Heads (branches): Named references that are advanced automatically
to the new commit when a commit is made on top of them.
* HEAD: A reserved head that will be compared against the working tree
to create a commit.
* Tags: Like branch references but fixed to a particular commit. Used
to label important points in history.


References
============
Every object in the Git database that is not referred to may be
cleaned up by using a garbage collection command or automatically. An
object may be referenced by another object or an explicit reference.
Git knows different types of references. The commands to create, move,
and delete references vary. git show-ref lists all references. Some
types are:
* 'heads': refers to an object locally,
* 'remotes': refers to an object which exists in a remote repository,
* 'stash': refers to an object not yet committed,
* 'meta': e.g. a configuration in a bare repository, user rights; the
refs/meta/config namespace was introduced retrospectively, gets used
by Gerrit,
* 'tags': see above.


                          Implementations
======================================================================
Git (the main implementation in C) is primarily developed on Linux,
although it also supports most major operating systems, including the
BSDs (DragonFly BSD, FreeBSD, NetBSD, and OpenBSD), Solaris, macOS,
and Windows.

The first Windows port of Git was primarily a Linux-emulation
framework that hosts the Linux version. Installing Git under Windows
creates a similarly named Program Files directory containing the
Mingw-w64 port of the GNU Compiler Collection, Perl 5, MSYS2 (itself a
fork of Cygwin, a Unix-like emulation environment for Windows) and
various other Windows ports or emulations of Linux utilities and
libraries. Currently, native Windows builds of Git are distributed as
32- and 64-bit installers. The git official website currently
maintains a build of Git for Windows, still using the MSYS2
environment.

The JGit implementation of Git is a pure Java software library,
designed to be embedded in any Java application. JGit is used in the
Gerrit code-review tool, and in EGit, a Git client for the Eclipse
IDE.

Go-git is an open-source implementation of Git written in pure Go. It
is currently used for backing projects as a SQL interface for Git code
repositories and providing encryption for Git.

Dulwich is an implementation of Git written in pure Python with
support for CPython 3.6 and later and Pypy.

The libgit2 implementation of Git is an ANSI C software library with
no other dependencies, which can be built on multiple platforms,
including Windows, Linux, macOS, and BSD. It has bindings for many
programming languages, including Ruby, Python, and Haskell.

JS-Git is a JavaScript implementation of a subset of Git.

GameOfTrees is an open-source implementation of Git for the OpenBSD
project.


                             Git server
======================================================================
As Git is a distributed version control system, it could be used as a
server out of the box. It is shipped with a built-in command git
daemon which starts a simple TCP server running on the Git protocol.
Dedicated Git HTTP servers help (amongst other features) by adding
access control, displaying the contents of a Git repository via the
web interfaces, and managing multiple repositories. Already existing
Git repositories can be cloned and shared to be used by others as a
centralized repo. It can also be accessed via remote shell just by
having the Git software installed and allowing a user to log in. Git
servers typically listen on TCP port 9418.


Open source
=============
* Hosting the Git server using the Git Binary.
* Gerrit, a Git server configurable to support code reviews and
provide access via ssh, an integrated Apache MINA or OpenSSH, or an
integrated Jetty web server. Gerrit provides integration for LDAP,
Active Directory, OpenID, OAuth, Kerberos/GSSAPI, X509 https client
certificates. With Gerrit 3.0 all configurations will be stored as Git
repositories, and no database is required to run. Gerrit has a
pull-request feature implemented in its core but lacks a GUI for it.
* Phabricator, a spin-off from Facebook. As Facebook primarily uses
Mercurial, Git support is not as prominent.
* RhodeCode Community Edition (CE), supporting Git, Mercurial and
Subversion with an AGPLv3 license.
* Kallithea, supporting both Git and Mercurial, developed in Python
with GPL license.
* External projects like gitolite, which provide scripts on top of Git
software to provide fine-grained access control.
* There are several other FLOSS solutions for self-hosting, including
Gogs and Gitea, a fork of Gogs, both developed in Go language with MIT
license.


Git server as a service
=========================
There are many offerings of Git repositories as a service. The most
popular are GitHub, SourceForge, Bitbucket and GitLab.


                              Adoption
======================================================================
The Eclipse Foundation reported in its annual community survey that as
of May 2014, Git is now the most widely used source-code management
tool, with 42.9% of professional software developers reporting that
they use Git as their primary source-control system compared with
36.3% in 2013, 32% in 2012; or for Git responses excluding use of
GitHub: 33.3% in 2014, 30.3% in 2013, 27.6% in 2012 and 12.8% in 2011.
Open-source directory Black Duck Open Hub reports a similar uptake
among open-source projects.

Stack Overflow has included version control in their annual developer
survey in 2015 (16,694 responses), 2017 (30,730 responses), 2018
(74,298 responses) and 2022 (71,379 responses). Git was the
overwhelming favorite of responding developers in these surveys,
reporting as high as 93.9% in 2022.

Version control systems used by responding developers:
Name !! 2015 !! 2017 !! 2018    !2022
Git      69.3%   69.2%   87.2%  |93.9%
Subversion       36.9%   9.1%    16.1%  |5.2%
TFVC     12.2%   7.3%    10.9%  |
Mercurial        7.9%    1.9%    3.6%   |1.1%
CVS      4.2%                   |
Perforce         3.3%                   |
VSS              0.6%           |
IBM DevOps Code ClearCase                0.4%           |
Zip file backups                 2.0%    7.9%   |
Raw network sharing              1.7%    7.9%   |
Other    5.8%    3.0%           |
None     9.3%    4.8%    4.8%   |4.3%

The UK IT jobs website itjobswatch.co.uk reports that as of late
September 2016, 29.27% of UK permanent software development job
openings have cited Git, ahead of 12.17% for Microsoft Team Foundation
Server, 10.60% for Subversion, 1.30% for Mercurial, and 0.48% for
Visual SourceSafe.


Extensions
============
There are many 'Git extensions', like
[https://github.com/git-lfs/git-lfs Git LFS], which started as an
extension to Git in the GitHub community and is now widely used by
other repositories. Extensions are usually independently developed and
maintained by different people, but at some point in the future, a
widely used extension can be merged with Git.

Other open-source Git extensions include:

* git-annex, a distributed file synchronization system based on Git
* git-flow, a set of Git extensions to provide high-level repository
operations for
[https://nvie.com/posts/a-successful-git-branching-model/ Vincent
Driessen's branching model]
* [https://github.com/VirtusLab/git-machete git-machete], a repository
organizer & tool for automating rebase/merge/pull/push operations

Microsoft developed the Virtual File System for Git (VFS for Git;
formerly Git Virtual File System or GVFS) extension to handle the size
of the Windows source-code tree as part of their 2017 migration from
Perforce. VFS for Git allows cloned repositories to use placeholders
whose contents are downloaded only once a file is accessed.


                            Conventions
======================================================================
Git can be used in a variety of different ways, but some conventions
are commonly adopted.

* The command to create a local repo, 'git init', creates a branch
named 'master'. Often it is used as the integration branch for merging
changes into. Since the default upstream remote is named 'origin', the
default remote branch is 'origin/master'. Some tools such as GitHub
and GitLab create a default branch named 'main' instead. Also, users
can add and delete branches and choose any branch for integrating.
* Pushed commits generally are not overwritten, but are 'reverted' by
committing another change which reverses an earlier commit. This
prevents shared commits from being invalid because the commit on which
they are based does not exist in the remote. If the commits contain
sensitive information, they should be removed, which involves a more
complex procedure to rewrite history.
* The 'git-flow' workflow and naming conventions are often adopted to
distinguish feature specific unstable histories (feature/*), unstable
shared histories (develop), production ready histories (main), and
emergency patches to released products (hotfix).
* A 'pull request', a.k.a. 'merge request', is a request by a user to
merge a branch into another branch. Git does not itself provide for
pull requests, but it is a common feature of git cloud services. The
underlying function of a pull request is no different than that of an
administrator of a repository pulling changes from another remote (the
repository that is the source of the pull request). However, the pull
request itself is a ticket managed by the hosting server which perform
these actions; it is not a feature of git SCM.


                              Security
======================================================================
Git does not provide access-control mechanisms, but was designed for
operation with other tools that specialize in access control.

On 17 December 2014, an exploit was found affecting the Windows and
macOS versions of the Git client. An attacker could perform arbitrary
code execution on a target computer with Git installed by creating a
malicious Git tree (directory) named '.git' (a directory in Git
repositories that stores all the data of the repository) in a
different case (such as .GIT or .Git, needed because Git does not
allow the all-lowercase version of '.git' to be created manually) with
malicious files in the '.git/hooks' subdirectory (a folder with
executable files that Git runs) on a repository that the attacker made
or on a repository that the attacker can modify. If a Windows or Mac
user 'pulls' (downloads) a version of the repository with the
malicious directory, then switches to that directory, the .git
directory will be overwritten (due to the case-insensitive trait of
the Windows and Mac filesystems) and the malicious executable files in
'.git/hooks' may be run, which results in the attacker's commands
being executed. An attacker could also modify the '.git/config'
configuration file, which allows the attacker to create malicious Git
aliases (aliases for Git commands or external commands) or modify
extant aliases to execute malicious commands when run. The
vulnerability was patched in version 2.2.1 of Git, released on 17
December 2014, and announced the next day.

Git version 2.6.1, released on 29 September 2015, contained a patch
for a security vulnerability () that allowed arbitrary code execution.
The vulnerability was exploitable if an attacker could convince a
victim to clone a specific URL, as the arbitrary commands were
embedded in the URL itself. An attacker could use the exploit via a
man-in-the-middle attack if the connection was unencrypted, as they
could redirect the user to a URL of their choice. Recursive clones
were also vulnerable since they allowed the controller of a repository
to specify arbitrary URLs via the gitmodules file.

Git uses SHA-1 hashes internally. Linus Torvalds has responded that
the hash was mostly to guard against accidental corruption, and the
security a cryptographically secure hash gives was just an accidental
side effect, with the main security being signing elsewhere. Since a
demonstration of the SHAttered attack against git in 2017, git was
modified to use a SHA-1 variant resistant to this attack. A plan for
hash function transition is being written since February 2020.


                             Trademark
======================================================================
"Git" is a registered word trademark of Software Freedom Conservancy
under [https://www.tmdn.org/tmview/#/tmview/detail/US500000085961336
US500000085961336] since 2015-02-03.


                              See also
======================================================================
* Comparison of source-code-hosting facilities
* Comparison of version-control software
* List of version-control software


License
=========
All content on Gopherpedia comes from Wikipedia, and is licensed under CC-BY-SA
License URL: http://creativecommons.org/licenses/by-sa/3.0/
Original Article: http://en.wikipedia.org/wiki/Git