Runs 100% on z/OS — no cloud, no data leaving the LPAR

Semantic Search for the Mainframe

A RAG-powered search engine that runs natively on z/OS. Index documents, search with natural language, and enrich operator console messages with AI-powered context — right where your data lives.

$ zopen install z-vector-search
24,565
IBM z/OS Messages Indexed
C++17
Pure Native Code
84 MB
Embedding Model Size
<0.5s
Per-Message Enrichment

Console messages shouldn't require tribal knowledge

  • Too many messages, not enough context. Operators sift through hundreds of console messages per hour — ABENDs, RACF violations, CICS abends — with no easy way to know which ones matter.
  • Knowledge is scattered. The answer lives across IBM manuals, runbooks, ticket histories, and people's heads. What if you could just ask?
  • Data can't leave the LPAR. For air-gapped, regulated workloads, shipping logs to a cloud LLM is a non-starter. Everything must run on z/OS.
z/OS Operator Console — SYS1
N SYS1 17:30:42 STC00010 $HASP373 PAYROLL STARTED
N SYS1 17:30:45 STC00123 IEF450I PAYROLL - ABEND=S0C7
N SYS1 17:30:45 STC00001 IEA404W REAL STORAGE SHORTAGE
N SYS1 17:30:46 STC00200 ICH408I USER(BATCH1) ACCESS REVOKED
N SYS1 17:30:48 STC00080 DFH1501I CICS TRANSACTION COMPLETED

847 messages in the last hour. Which ones matter?

The RAG Pipeline

Every query flows through a pipeline that turns raw text into semantic understanding. Everything runs locally on z/OS — no external API calls, no cloud dependencies.

📄
Ingest
Console messages, documents, or IBM manuals
✂️
Chunk
256 tokens per chunk, 64-token overlap
🧠
Embed
Nomic Embed Text v1.5 via llama.cpp
📐
Normalize
L2 normalization for cosine distance
💾
Store
SQLite + sqlite-vec — single .db file
🔀
Classify & Search
Keyword, semantic, or hybrid — auto-detected
🏆
Rank & Return
Reciprocal Rank Fusion → top-K results

Architecture

Pure C++17 with vendored dependencies. No Python runtime, no Java, no external services beyond what zopen provides.

🧬

Embedding Engine

Nomic Embed Text v1.5, an encoder-only model purpose-built for turning text into vectors. Quantized to Q4_K_M at just 84 MB. Uses MEAN pooling and document/query prefixes for optimal retrieval quality.

llama.cpp Nomic v1.5 Q4_K_M
🗄️

Vector Store

SQLite extended with sqlite-vec for KNN similarity search. No database server, no network dependencies — just a single .db file with text chunks, embeddings, and structured metadata side by side.

SQLite sqlite-vec Cosine Distance
🔀

Hybrid Search

Auto-classifies queries as keyword, semantic, or hybrid. Exact message IDs use SQL LIKE; natural language uses vector similarity; mixed queries merge both with Reciprocal Rank Fusion (RRF).

RRF k=60 Auto-classify KNN

SIMD Acceleration

Custom s390x VXE intrinsics for vector math and quantized matrix-vector multiplies. 128-bit SIMD processes 8 floats per iteration, turning scalar hot paths into vectorized operations on z15+.

VXE s390x vec_mule/vec_mulo

The Tools

A suite of command-line tools that work together. Index once, query repeatedly. All tools default to ~/.z-vector-search/ for zero-config usage.

📥
z-index

Index documents into the persistent vector store. Supports incremental indexing — only new or modified files are re-encoded.

🔍
z-query

Search the store with natural language or structured queries. Auto-detects keyword, semantic, or hybrid mode.

🖥️
z-console

Enrich z/OS console messages with IBM documentation and operational history. Reads live SYSLOG via pcon.

⚙️
z-setup

One-time setup: downloads the embedding model from Hugging Face and unpacks the IBM messages knowledge base.

z-console Demo

Feed z-console a RACF violation message and watch it instantly return the IBM documentation and your operational history.

z-console — enriched output
$ z-console "ICH408I USER(BATCH1) GROUP(PROD) LOGON/JOB INITIATION - ACCESS REVOKED"

Parsed 1 message, 1 interesting, 1 unique ID to look up.

━━━ ICH408I (severity: E) ━━━
ICH408I USER(BATCH1) GROUP(PROD) NAME(BATCH JOB)
LOGON/JOB INITIATION - ACCESS REVOKED

IBM Documentation (keyword match):
ICH408I — A RACF-defined user has been revoked. The user's access
authority has been removed, typically because consecutive incorrect
password attempts exceeded the SETROPTS PASSWORD limit.
System Action: The logon or job is rejected.
Operator Response: Contact the security administrator to reinstate
access via ALTUSER userid RESUME.
(distance: 0.12)

Operational History (semantic match):
[2026-04-03 14:22] ICH408I,ICH409I, BATCH1 revoked on SYS1,
resolved by security team reset. Related: RACF password policy
change ticket INC-4421.
(distance: 0.31)

Making llama.cpp Faster on z/OS

The s390x backend was running pure scalar code through every hot path. ~900 lines of new VXE intrinsics changed that.

On x86, llama.cpp vectorizes everything with AVX2/AVX-512. On ARM, it uses NEON. On z/OS? Nothing. The quantized matrix-vector multiplies and elementwise float helpers that dominate every forward pass were all scalar.

IBM Z processors from z13 onwards include the Vector Facility for z/Architecture (VXE) — a 128-bit SIMD instruction set. The new s390x implementations process 8 floats per loop iteration using vec_xl/vec_add/vec_xst.

The core trick for Q4_K quantized multiply: vec_mule/vec_mulo widen int8 → int16 cleanly, then a second pair horizontally reduces into int32 — the whole sequence retires in a handful of cycles on z15.

Metrics output (JSON)
// z-query --metrics "what does abend S0C4 mean"
{
  "mode": "semantic",
  "model_load_ms": 2341.5,
  "embed_ms": 287.3,
  "search_ms": 42.1,
  "total_ms": 2812.4,
  "results": 5,
  "store_chunks": 34102
}

Vector Helpers Vectorized

add, sub, mul, scale, mad — 8 floats per iteration via vec_xl/vec_xst

🧮

Q4_K × Q8_K GEMV

Brand new ggml_gemv_q4_K_8x4_q8_K with VXE intrinsics for the hot path

📊

Q8_K Row Quantization

__builtin_s390_vfisb for round-and-convert in a single instruction

🔧

CMake Integration

OS390 build path: -fzvector -m64 -march=z15 with optional MASSV linkage

🤖

AI-Assisted Development

IBM Bob helped navigate VXE intrinsics and cross-check quantized format details

Get Started in 5 Minutes

You'll need the zopen package manager set up on your z/OS system. The QuickStart Guide takes about five minutes.

1

Install via zopen

The simplest way — pulls in llama.cpp and all tools in one shot.

$ zopen install z-vector-search
2

Run setup

Downloads the embedding model (~84 MB) and unpacks the IBM messages knowledge base (~160 MB).

$ z-setup
3

Query the knowledge base

Search 24,565 IBM z/OS messages with natural language — out of the box.

$ z-query "what does abend S0C4 mean"
4

Enrich console messages

Feed z-console a single message or read your live SYSLOG for instant context.

$ z-console --pcon -l
Build from source (alternative)
# 1. Install llama.cpp via zopen
zopen install llamacpp

# 2. Build z-vector-search
cmake -B build -DLLAMA_ROOT=$ZOPEN_PKGINSTALL/llamacpp
cmake --build build

# 3. Run setup and start searching
z-setup
z-query "what does abend S0C4 mean"