Optimizing Git Performance on z/OS

Posted by Igor Todorovski on March 19, 2025

Optimizing Git Performance

As an open source developer for the zopen community, I rely heavily on Git for version control. But, as repositories grow in size and history, my git operations often slow down to the point where they can impact my daily workflow, slow down CI/CD pipelines, and ultimately affect my productivity as a developer.

I wanted to make sure I was doing everything I can to optimize my Git worfklow, So recently, I investigated various Git performance optimization techniques, and I decided to put them to the test.

My goal was to understand the practical impact of these strategies on z/OS by using git from IBM Open Enterprise Foundation for z/OS.

We’ll start by exploring the strategies that provided the most significant performance improvements first.

1. Data Reduction Strategies

The idea behind “data reduction” strategies is simple: if we can reduce the amount of data Git has to process, we can improve Git’s performance.

Shallow Clones: Reducing the History (--depth=1)

Shallow clones are a great way to speed up clones when you don’t need the full commit history. By cloning with a limited depth, you eliminate the commit history and thereby reduce the amount of data transferred, making cloning much faster. This is extremely useful in CI/CD pipelines or for quickly getting the latest codebase where the commit history is not relevant.

Benefits:

  • Reduced Cloning Time: Save time, especially for repositories with a long history.
  • Lower Bandwidth: Less data to transfer
  • Reduced Disk Footprint: Shallow clones take up significantly less disk space.

Experiment on Perl 5: As an experiment, I cloned the Perl 5 repository (https://github.com/Perl/perl5). This repository contains the source code for the latest Perl language implementation. First I cloned perl5 with a full clone and then with a shallow clone (–depth=1 which downloads the latest commit only). I ran these tests on a z/OS 3.1 z15 machine.

1
2
3
4
5
# Baseline: Full Clone
time git clone https://github.com/Perl/perl5 perl5-full-clone

# Shallow Clone with depth 1
time git clone --depth=1 https://github.com/Perl/perl5 perl5-shallow-clone

The results were as follows:

1
2
3
4
5
6
7
8
9
# Baseline: Full Clone
real     0m46.611s
user     1m16.479s
sys      0m25.493s

# Shallow Clone with depth 1
real     0m4.447s
user     0m2.328s
sys      0m0.776s

Shallow cloning the Perl 5 repository showed a dramatic improvement! The clone time was reduced from over 46 seconds to just over 4 seconds, a reduction of over 90%! The repository size was also significantly reduced as only the latest commit was downloaded, instead of the entire history. For scenarios where commit history isn’t immediately needed, shallow clones make a clear difference, especially in CI/CD pipelines.

Inspecting the Size of a Shallow Clone To gain further insight into the volume of data being processed during your shallow clones, you can execute the following command within your Git repository: git count-objects -vH.

After performing a clone, navigate into the newly created repository directory (e.g., cd perl5-shallow-clone) and run the command:

1
git count-objects -vH 

This command will provide a human-readable summary of the number and size of the various Git objects in your repository’s object database.

Now let’s compare the output from a full clone vs a shallow clone:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Full clone output:
count: 0
size: 0 bytes
in-pack: 530133
packs: 1
size-pack: 319.96 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

# Shallow clone output:
count: 0
size: 0 bytes
in-pack: 7703
packs: 1
size-pack: 22.09 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

As shown, a shallow clone drastically reduces the data downloaded by fetching only recent commits instead of the entire history. This is reflected in the in-pack count, which drops from 530,133 objects in a full clone to just 7,703 objects in a shallow clone. Since fewer commits, trees, and blobs are included, the total size-pack also shrinks significantly, from 319.96 MiB to 22.09 MiB.

This reduction happens because Git stores repository data in packfiles, which group and compress objects for efficiency. In a full clone, all historical commits and their associated objects are included, leading to a large size-pack. In contrast, a shallow clone fetches only recent objects, leaving out older history, resulting in fewer packed objects and a much smaller repository footprint.

Metric Full Clone Shallow Clone Reduction
Clone Time 46s 4s 91%
Size 319MB 22MB 93%
Objects 530,133 7,703 98.5%

Avoiding Downloading Large Binaries (--filter=blob:none)

Repositories can sometimes accumulate large binary files over time. If these binaries (e.g pax files, object files, executables) are not needed for every operation or during development, you can instruct Git to filter these out during the cloning process. This can help reduce bandwidth, disk space and clone time.

The --filter=blob:none option prevents Git from downloading any large file blobs.

I cloned the Perl 5 repository again, this time using the --filter=blob:none option to exclude binary blobs.

1
time git clone --filter=blob:none  git@github.com:Perl/perl5.git

The results were as follows:

1
2
3
real     0m15.333s
user     0m9.057s
sys      0m3.019s

Using --filter=blob:none reduced the clone time to approximately 15 seconds, significantly faster than the baseline full clone of 46 seconds. This is a great way to reduce clone time, while maintaining history for repositories where binaries are not essential for maintenance.

Cloning Only a Single Branch and Avoiding Tags (--no-tags and --single-branch)

By default, git clone clones all branches and tags. If you are only interested in a single branch, and tags are not immediately necessary, cloning only a single branch and excluding tags can reduce the amount of data transferred and speed up the clone operation. The --no-tags option skips downloading tags, and --single-branch clones only the specified branch.

I cloned the Perl 5 repository again, this time using the --no-tags and --single-branch options.

1
time git clone  --no-tags --single-branch git@github.com:Perl/perl5.git

The results were:

1
2
3
real     0m34.867s
user     1m2.789s
sys      0m20.930s

Cloning a single branch without tags took approximately 35 seconds, which is faster than the full clone baseline, but not as fast as --filter=blob:none. However, when combined with --filter=blob:none, the total clone time was reduced to 13s.

Sparse Checkouts: Clone What You Need (git sparse-checkout)

Sparse checkouts allow you to download only a specific subset of directories from a repository into your working directory. This is great for monorepos or large projects where you only work within certain modules or components.

Benefits:

  • Reduced Working Directory Size: Save disk space by only checking out necessary files.
  • Faster git status and Checkout Operations: Git has less data to track and manage in your working directory, improving command speed.

Using a sparse checkout on the Perl 5 repository, I wanted to exclude the cpan/ directory, which contains CPAN module-related code and is approximately 26mb in size. I used the following script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# sparse-checkout.sh

# Initialize an empty repository
rm -rf perl5-sparse-empty-test
mkdir perl5-sparse-empty-test
cd perl5-sparse-empty-test
git init

# Add the Perl 5 repository as a remote
git remote add origin [https://github.com/Perl/perl5](https://github.com/Perl/perl5)

# Ignore the cpan directory
git config core.sparseCheckoutCone false
git sparse-checkout init --no-cone
git sparse-checkout set "/*"
git sparse-checkout add '!cpan' # ignore cpan dir

git fetch origin blead

git checkout blead

Running this script with sparse checkout commands yielded a total time of:

1
2
3
real     0m42.361s
user     1m7.524s
sys      0m22.508s

When the sparse-checkout commands are commented out, essentially performing a full clone, the time taken was:

1
2
3
real     0m43.325s
user     1m8.081s
sys      0m22.694s

While the time difference isn’t dramatically significant in this specific test, sparse checkout successfully saved approximately 26mb of data by excluding the cpan directory.

Working-Tree-Encoding: Consider the Conversion Costs (.gitattributes)

If your project uses working-tree-encoding or zos-working-tree-encoding, be aware of the potential performance overhead. Encoding conversions handled by iconv can add CPU processing time, especially when global wildcards are used.

To test this, I created a fork of the Perl 5 repository with a .gitattributes file that would apply zos-working-tree-encoding=ibm-1047 to all text files:

https://github.com/IgorTodorovskiIBM/perl5

1
2
# .gitattributes
* text zos-working-tree-encoding=ibm-1047

Cloning this repository:

1
time git clone git@github.com:IgorTodorovskiIBM/perl5.git

The encoding conversion, when applied globally with the * wildcard, added about 3 seconds to the total clone time compared to cloning the original repository without the encoding attribute. While 3 seconds might seem small, it’s another factor to consider, especially in scenarios involving large repositories or frequent Git operations. For projects using working-tree-encoding, consider the scope of encoding conversions to reduce potential performance impacts.

However, an important counterpoint to skipping conversion at the Git level means each compile step must handle encoding conversion instead. For languages like C, which support auto-conversion of source files, this can introduce overhead every time a file is compiled. This ongoing cost might outweigh the one-time Git conversion delay.

Profiling and Diagnostics (GIT_TRACE=1 and GIT_PERFORMANCE=1)

Git’s diagnostic environment variables such as GIT_TRACE and GIT_PERFORMANCE are great ways for identifying performance bottlenecks within Git operations. By enabling these variables, Git generates detailed logs that can reveal which specific steps are consuming the most time and resources.

To enable tracing and performance logging, set the corresponding environment variables before running your Git commands.

1
2
export GIT_TRACE=1
export GIT_PERFORMANCE=1

After setting these variables, execute the Git command you want to analyze. Git will output detailed trace information to stderr.

For example, to trace a clone operation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
export GIT_TRACE_PERFORMANCE=1 GIT_TRACE=1
12:33:53.398786 exec-cmd.c:267          trace: resolved executable dir: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/bin
12:33:53.399026 git.c:476               trace: built-in: git clone https://github.com/Perl/perl5
Cloning into 'perl5'...
12:33:53.406725 run-command.c:667       trace: run_command: git remote-https origin https://github.com/Perl/perl5
12:33:53.406752 run-command.c:759       trace: start_command: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core/git remote-https origin https://github.com/Perl/perl5
12:33:53.433854 exec-cmd.c:267          trace: resolved executable dir: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core
12:33:53.434307 git.c:769               trace: exec: git-remote-https origin https://github.com/Perl/perl5
12:33:53.434315 run-command.c:667       trace: run_command: git-remote-https origin https://github.com/Perl/perl5
12:33:53.434338 run-command.c:759       trace: start_command: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core/git-remote-https origin https://github.com/Perl/perl5
12:33:53.468060 exec-cmd.c:267          trace: resolved executable dir: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core
12:33:53.938639 run-command.c:667       trace: run_command: git index-pack --stdin -v --fix-thin '--keep=fetch-pack 33562439 on ZOSCAN2B' --check-self-contained-and-connected
12:33:53.938676 run-command.c:759       trace: start_command: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core/git index-pack --stdin -v --fix-thin '--keep=fetch-pack 33562439 on ZOSCAN2B' --check-self-contained-and-connected
12:33:53.965929 exec-cmd.c:267          trace: resolved executable dir: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core
12:33:53.966507 git.c:476               trace: built-in: git index-pack --stdin -v --fix-thin '--keep=fetch-pack 33562439 on ZOSCAN2B' --check-self-contained-and-connected
remote: Enumerating objects: 613690, done.
remote: Counting objects: 100% (16/16), done.
remote: Compressing objects: 100% (16/16), done.
remote: Total 613690 (delta 5), reused 5 (delta 0), pack-reused 613674 (from 1)
12:34:02.103539 trace.c:417             performance: 8.669476706 s: git command: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core/git remote-https origin https://github.com/Perl/perl5
Receiving objects: 100% (613690/613690), 340.21 MiB | 42.31 MiB/s, done.
Resolving deltas: 100% (464433/464433), done.
12:34:31.897026 trace.c:417             performance: 37.930887480 s: git command: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core/git index-pack --stdin -v --fix-thin '--keep=fetch-pack 33562439 on ZOSCAN2B' --check-self-contained-and-connected
12:34:31.898222 run-command.c:667       trace: run_command: git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
12:34:31.898281 run-command.c:759       trace: start_command: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core/git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
12:34:31.932503 exec-cmd.c:267          trace: resolved executable dir: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core
12:34:31.933121 git.c:476               trace: built-in: git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
12:34:31.933623 trace.c:417             performance: 0.000879117 s: git command: /home/itodoro/zopen/usr/local/zopen/git/git-heads.v2.48.1.20250131_080517.zos/libexec/git-core/git rev-list --objects --stdin --not --all --quiet --alternate-refs '--progress=Checking connectivity'
12:34:31.999553 unpack-trees.c:2010     performance: 0.022947339 s:   traverse_trees
Updating files: 100% (6895/6895), done.
12:34:33.150223 unpack-trees.c:512      performance: 1.150647878 s:   check_updates
12:34:33.157202 cache-tree.c:492        performance: 0.006916968 s:   cache_tree_update
12:34:33.157222 unpack-trees.c:2107     performance: 1.180711913 s:  unpack_trees
12:34:33.160126 read-cache.c:3115       performance: 0.002893479 s:  write index, changed mask = 2a
12:34:33.160601 trace.c:417             performance: 39.761582911 s: git command: git clone https://github.com/Perl/perl5

real    0m39.793s
user    1m12.404s
sys     0m24.135s

You can now examine the log output to identify performance bottlenecks and areas for potential optimization.

Conclusion

Whenever possible, use shallow clones with a depth of 1 as it provides a significant performance benefit.

For those working within z/OS, a more detailed guide outlining specific performance recommendations can be found in the Git on z/OS guide: https://github.com/zopencommunity/gitport?tab=readme-ov-file#git-performance-considerations.

Git also provides various performance-related configuration settings, such as adjusting the number of parallel threads and other optimizations: Git Configuration Options

If you have any other tips or strategies for improving Git performance, I’d love to hear them!

A special thanks to Mike Fulton for his contributions to this article.

Additional Resources