Optimizing Git Performance
As an open source developer for the zopen community, I rely heavily on Git for version control. But, as repositories grow in size and history, my git operations often slow down to the point where they can impact my daily workflow, slow down CI/CD pipelines, and ultimately affect my productivity as a developer.
I wanted to make sure I was doing everything I can to optimize my Git worfklow, So recently, I investigated various Git performance optimization techniques, and I decided to put them to the test.
My goal was to understand the practical impact of these strategies on z/OS by using git from IBM Open Enterprise Foundation for z/OS.
We’ll start by exploring the strategies that provided the most significant performance improvements first.
1. Data Reduction Strategies
The idea behind “data reduction” strategies is simple: if we can reduce the amount of data Git has to process, we can improve Git’s performance.
Shallow Clones: Reducing the History (--depth=1
)
Shallow clones are a great way to speed up clones when you don’t need the full commit history. By cloning with a limited depth, you eliminate the commit history and thereby reduce the amount of data transferred, making cloning much faster. This is extremely useful in CI/CD pipelines or for quickly getting the latest codebase where the commit history is not relevant.
Benefits:
- Reduced Cloning Time: Save time, especially for repositories with a long history.
- Lower Bandwidth: Less data to transfer
- Reduced Disk Footprint: Shallow clones take up significantly less disk space.
Experiment on Perl 5: As an experiment, I cloned the Perl 5 repository (https://github.com/Perl/perl5). This repository contains the source code for the latest Perl language implementation. First I cloned perl5 with a full clone and then with a shallow clone (–depth=1 which downloads the latest commit only). I ran these tests on a z/OS 3.1 z15 machine.
1
2
3
4
5
# Baseline: Full Clone
time git clone https://github.com/Perl/perl5 perl5-full-clone
# Shallow Clone with depth 1
time git clone --depth=1 https://github.com/Perl/perl5 perl5-shallow-clone
The results were as follows:
1
2
3
4
5
6
7
8
9
# Baseline: Full Clone
real 0m46.611s
user 1m16.479s
sys 0m25.493s
# Shallow Clone with depth 1
real 0m4.447s
user 0m2.328s
sys 0m0.776s
Shallow cloning the Perl 5 repository showed a dramatic improvement! The clone time was reduced from over 46 seconds to just over 4 seconds, a reduction of over 90%! The repository size was also significantly reduced as only the latest commit was downloaded, instead of the entire history. For scenarios where commit history isn’t immediately needed, shallow clones make a clear difference, especially in CI/CD pipelines.
Inspecting the Size of a Shallow Clone
To gain further insight into the volume of data being processed during your shallow clones, you can execute the following command within your Git repository: git count-objects -vH
.
After performing a clone, navigate into the newly created repository directory (e.g., cd perl5-shallow-clone
) and run the command:
1
git count-objects -vH
This command will provide a human-readable summary of the number and size of the various Git objects in your repository’s object database.
Now let’s compare the output from a full clone vs a shallow clone:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Full clone output:
count: 0
size: 0 bytes
in-pack: 530133
packs: 1
size-pack: 319.96 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes
# Shallow clone output:
count: 0
size: 0 bytes
in-pack: 7703
packs: 1
size-pack: 22.09 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes
As shown, a shallow clone drastically reduces the data downloaded by fetching only recent commits instead of the entire history. This is reflected in the in-pack count, which drops from 530,133 objects in a full clone to just 7,703 objects in a shallow clone. Since fewer commits, trees, and blobs are included, the total size-pack also shrinks significantly, from 319.96 MiB to 22.09 MiB.
This reduction happens because Git stores repository data in packfiles, which group and compress objects for efficiency. In a full clone, all historical commits and their associated objects are included, leading to a large size-pack. In contrast, a shallow clone fetches only recent objects, leaving out older history, resulting in fewer packed objects and a much smaller repository footprint.
Metric | Full Clone | Shallow Clone | Reduction |
---|---|---|---|
Clone Time | 46s | 4s | 91% |
Size | 319MB | 22MB | 93% |
Objects | 530,133 | 7,703 | 98.5% |
Avoiding Downloading Large Binaries (--filter=blob:none
)
Repositories can sometimes accumulate large binary files over time. If these binaries (e.g pax files, object files, executables) are not needed for every operation or during development, you can instruct Git to filter these out during the cloning process. This can help reduce bandwidth, disk space and clone time.
The --filter=blob:none
option prevents Git from downloading any large file blobs.
I cloned the Perl 5 repository again, this time using the --filter=blob:none
option to exclude binary blobs.
1
time git clone --filter=blob:none git@github.com:Perl/perl5.git
The results were as follows:
1
2
3
real 0m15.333s
user 0m9.057s
sys 0m3.019s
Using --filter=blob:none
reduced the clone time to approximately 15 seconds, significantly faster than the baseline full clone of 46 seconds. This is a great way to reduce clone time, while maintaining history for repositories where binaries are not essential for maintenance.
Cloning Only a Single Branch and Avoiding Tags (--no-tags
and --single-branch
)
By default, git clone
clones all branches and tags. If you are only interested in a single branch, and tags are not immediately necessary, cloning only a single branch and excluding tags can reduce the amount of data transferred and speed up the clone operation. The --no-tags
option skips downloading tags, and --single-branch
clones only the specified branch.
I cloned the Perl 5 repository again, this time using the --no-tags
and --single-branch
options.
1
time git clone --no-tags --single-branch git@github.com:Perl/perl5.git
The results were:
1
2
3
real 0m34.867s
user 1m2.789s
sys 0m20.930s
Cloning a single branch without tags took approximately 35 seconds, which is faster than the full clone baseline, but not as fast as --filter=blob:none
. However, when combined with --filter=blob:none
, the total clone time was reduced to 13s.
Sparse Checkouts: Clone What You Need (git sparse-checkout
)
Sparse checkouts allow you to download only a specific subset of directories from a repository into your working directory. This is great for monorepos or large projects where you only work within certain modules or components.
Benefits:
- Reduced Working Directory Size: Save disk space by only checking out necessary files.
- Faster
git status
and Checkout Operations: Git has less data to track and manage in your working directory, improving command speed.
Using a sparse checkout on the Perl 5 repository, I wanted to exclude the cpan/
directory, which contains CPAN module-related code and is approximately 26mb in size. I used the following script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# sparse-checkout.sh
# Initialize an empty repository
rm -rf perl5-sparse-empty-test
mkdir perl5-sparse-empty-test
cd perl5-sparse-empty-test
git init
# Add the Perl 5 repository as a remote
git remote add origin [https://github.com/Perl/perl5](https://github.com/Perl/perl5)
# Ignore the cpan directory
git config core.sparseCheckoutCone false
git sparse-checkout init --no-cone
git sparse-checkout set "/*"
git sparse-checkout add '!cpan' # ignore cpan dir
git fetch origin blead
git checkout blead
Running this script with sparse checkout commands yielded a total time of:
1
2
3
real 0m42.361s
user 1m7.524s
sys 0m22.508s
When the sparse-checkout commands are commented out, essentially performing a full clone, the time taken was:
1
2
3
real 0m43.325s
user 1m8.081s
sys 0m22.694s
While the time difference isn’t dramatically significant in this specific test, sparse checkout successfully saved approximately 26mb of data by excluding the cpan
directory.
Working-Tree-Encoding: Consider the Conversion Costs (.gitattributes
)
If your project uses working-tree-encoding or zos-working-tree-encoding, be aware of the potential performance overhead. Encoding conversions handled by iconv can add CPU processing time, especially when global wildcards are used.
To test this, I created a fork of the Perl 5 repository with a .gitattributes
file that would apply zos-working-tree-encoding=ibm-1047
to all text files:
https://github.com/IgorTodorovskiIBM/perl5
1
2
# .gitattributes
* text zos-working-tree-encoding=ibm-1047
Cloning this repository:
1
time git clone git@github.com:IgorTodorovskiIBM/perl5.git
The encoding conversion, when applied globally with the *
wildcard, added about 3 seconds to the total clone time compared to cloning the original repository without the encoding attribute. While 3 seconds might seem small, it’s another factor to consider, especially in scenarios involving large repositories or frequent Git operations. For projects using working-tree-encoding, consider the scope of encoding conversions to reduce potential performance impacts.
However, an important counterpoint to skipping conversion at the Git level means each compile step must handle encoding conversion instead. For languages like C, which support auto-conversion of source files, this can introduce overhead every time a file is compiled. This ongoing cost might outweigh the one-time Git conversion delay.
Profiling and Diagnostics (GIT_TRACE=1
and GIT_PERFORMANCE=1
)
Git’s diagnostic environment variables such as GIT_TRACE
and GIT_PERFORMANCE
are great ways for identifying performance bottlenecks within Git operations. By enabling these variables, Git generates detailed logs that can reveal which specific steps are consuming the most time and resources.
To enable tracing and performance logging, set the corresponding environment variables before running your Git commands.
1
2
export GIT_TRACE=1
export GIT_PERFORMANCE=1
After setting these variables, execute the Git command you want to analyze. Git will output detailed trace information to stderr.
For example, to trace a clone operation:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
export GIT_TRACE_PERFORMANCE=1 GIT_TRACE=1
12:33:53.398786 exec-cmd.c:267 trace: resolved executable dir: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/bin
12:33:53.399026 git.c:476 trace: built-in: git clone https://github.com/Perl/perl5
Cloning into 'perl5'...
12:33:53.406725 run-command.c:667 trace: run_command: git remote-https origin https://github.com/Perl/perl5
12:33:53.406752 run-command.c:759 trace: start_command: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core/git remote-https origin https://github.com/Perl/perl5
12:33:53.433854 exec-cmd.c:267 trace: resolved executable dir: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core
12:33:53.434307 git.c:769 trace: exec: git-remote-https origin https://github.com/Perl/perl5
12:33:53.434315 run-command.c:667 trace: run_command: git-remote-https origin https://github.com/Perl/perl5
12:33:53.434338 run-command.c:759 trace: start_command: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core/git-remote-https origin https://github.com/Perl/perl5
12:33:53.468060 exec-cmd.c:267 trace: resolved executable dir: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core
12:33:53.938639 run-command.c:667 trace: run_command: git index-pack --stdin -v --fix-thin '--keep=fetch-pack 33562439 on ZOSCAN2B' --check-self-contained-and-connected
12:33:53.938676 run-command.c:759 trace: start_command: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core/git index-pack --stdin -v --fix-thin '--keep=fetch-pack 33562439 on ZOSCAN2B' --check-self-contained-and-connected
12:33:53.965929 exec-cmd.c:267 trace: resolved executable dir: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core
12:33:53.966507 git.c:476 trace: built-in: git index-pack --stdin -v --fix-thin '--keep=fetch-pack 33562439 on ZOSCAN2B' --check-self-contained-and-connected
remote: Enumerating objects: 613690, done.
remote: Counting objects: 100% (16/16), done.
remote: Compressing objects: 100% (16/16), done.
remote: Total 613690 (delta 5), reused 5 (delta 0), pack-reused 613674 (from 1)
12:34:02.103539 trace.c:417 performance: 8.669476706 s: git command: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core/git remote-https origin https://github.com/Perl/perl5
Receiving objects: 100% (613690/613690), 340.21 MiB | 42.31 MiB/s, done.
Resolving deltas: 100% (464433/464433), done.
12:34:31.897026 trace.c:417 performance: 37.930887480 s: git command: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core/git index-pack --stdin -v --fix-thin '--keep=fetch-pack 33562439 on ZOSCAN2B' --check-self-contained-and-connected
12:34:31.898222 run-command.c:667 trace: run_command: git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
12:34:31.898281 run-command.c:759 trace: start_command: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core/git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
12:34:31.932503 exec-cmd.c:267 trace: resolved executable dir: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core
12:34:31.933121 git.c:476 trace: built-in: git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
12:34:31.933623 trace.c:417 performance: 0.000879117 s: git command: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core/git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
12:34:31.999553 unpack-trees.c:2010 performance: 0.022947339 s: traverse_trees
Updating files: 100% (6895/6895), done.
12:34:33.150223 unpack-trees.c:512 performance: 1.150647878 s: check_updates
12:34:33.157202 cache-tree.c:492 performance: 0.006916968 s: cache_tree_update
12:34:33.157222 unpack-trees.c:2107 performance: 1.180711913 s: unpack_trees
12:34:33.160126 read-cache.c:3115 performance: 0.002893479 s: write index, changed mask = 2a
12:34:33.160601 trace.c:417 performance: 39.761582911 s: git command: git clone https://github.com/Perl/perl5
real 0m39.793s
user 1m12.404s
sys 0m24.135s
You can now examine the log output to identify performance bottlenecks and areas for potential optimization.
Conclusion
Whenever possible, use shallow clones with a depth of 1 as it provides a significant performance benefit.
For those working within z/OS, a more detailed guide outlining specific performance recommendations can be found in the Git on z/OS guide: https://github.com/zopencommunity/gitport?tab=readme-ov-file#git-performance-considerations.
Git also provides various performance-related configuration settings, such as adjusting the number of parallel threads and other optimizations: Git Configuration Options
If you have any other tips or strategies for improving Git performance, I’d love to hear them!
A special thanks to Mike Fulton for his contributions to this article.
Additional Resources
- IBM Open Enterprise Foundation for z/OS
- Worried about cloning your large Z code Git repo?
- Get up to speed with partial clone and shallow clone