A RAG-powered search engine that runs natively on z/OS. Index documents, search with natural language, and enrich operator console messages with AI-powered context — right where your data lives.
Every query flows through a pipeline that turns raw text into semantic understanding. Everything runs locally on z/OS — no external API calls, no cloud dependencies.
Pure C++17 with vendored dependencies. No Python runtime, no Java, no external services beyond what zopen provides.
Nomic Embed Text v1.5, an encoder-only model purpose-built for turning text into vectors. Quantized to Q4_K_M at just 84 MB. Uses MEAN pooling and document/query prefixes for optimal retrieval quality.
llama.cpp Nomic v1.5 Q4_K_M
SQLite extended with sqlite-vec for KNN similarity search. No database server, no network dependencies — just a single .db file with text chunks, embeddings, and structured metadata side by side.
Auto-classifies queries as keyword, semantic, or hybrid. Exact message IDs use SQL LIKE; natural language uses vector similarity; mixed queries merge both with Reciprocal Rank Fusion (RRF).
RRF k=60 Auto-classify KNNCustom s390x VXE intrinsics for vector math and quantized matrix-vector multiplies. 128-bit SIMD processes 8 floats per iteration, turning scalar hot paths into vectorized operations on z15+.
VXE s390x vec_mule/vec_mulo
A suite of command-line tools that work together. Index once, query repeatedly. All tools default to ~/.z-vector-search/ for zero-config usage.
Index documents into the persistent vector store. Supports incremental indexing — only new or modified files are re-encoded.
Search the store with natural language or structured queries. Auto-detects keyword, semantic, or hybrid mode.
Enrich z/OS console messages with IBM documentation and operational history. Reads live SYSLOG via pcon.
One-time setup: downloads the embedding model from Hugging Face and unpacks the IBM messages knowledge base.
Feed z-console a RACF violation message and watch it instantly return the IBM documentation and your operational history.
The s390x backend was running pure scalar code through every hot path. ~900 lines of new VXE intrinsics changed that.
On x86, llama.cpp vectorizes everything with AVX2/AVX-512. On ARM, it uses NEON. On z/OS? Nothing. The quantized matrix-vector multiplies and elementwise float helpers that dominate every forward pass were all scalar.
IBM Z processors from z13 onwards include the Vector Facility for z/Architecture (VXE) — a 128-bit SIMD instruction set. The new s390x implementations process 8 floats per loop iteration using vec_xl/vec_add/vec_xst.
The core trick for Q4_K quantized multiply: vec_mule/vec_mulo widen int8 → int16 cleanly, then a second pair horizontally reduces into int32 — the whole sequence retires in a handful of cycles on z15.
// z-query --metrics "what does abend S0C4 mean" { "mode": "semantic", "model_load_ms": 2341.5, "embed_ms": 287.3, "search_ms": 42.1, "total_ms": 2812.4, "results": 5, "store_chunks": 34102 }
add, sub, mul, scale, mad — 8 floats per iteration via vec_xl/vec_xst
Brand new ggml_gemv_q4_K_8x4_q8_K with VXE intrinsics for the hot path
__builtin_s390_vfisb for round-and-convert in a single instruction
OS390 build path: -fzvector -m64 -march=z15 with optional MASSV linkage
IBM Bob helped navigate VXE intrinsics and cross-check quantized format details
You'll need the zopen package manager set up on your z/OS system. The QuickStart Guide takes about five minutes.
The simplest way — pulls in llama.cpp and all tools in one shot.
Downloads the embedding model (~84 MB) and unpacks the IBM messages knowledge base (~160 MB).
Search 24,565 IBM z/OS messages with natural language — out of the box.
Feed z-console a single message or read your live SYSLOG for instant context.
# 1. Install llama.cpp via zopen zopen install llamacpp # 2. Build z-vector-search cmake -B build -DLLAMA_ROOT=$ZOPEN_PKGINSTALL/llamacpp cmake --build build # 3. Run setup and start searching z-setup z-query "what does abend S0C4 mean"