AI ENGINEERING SERIES
Read on Dev.to
Advanced Computer Science Guide

Building a Vector Search Engine from Scratch: The Math and Mechanics of HNSW Indexing

Demystifying high-dimensional embeddings, Cosine similarity mathematics, and implementing an optimized Hierarchical Navigable Small World skip-graph search in TypeScript.

EA
Ebenezer AkinseindeSoftware Developer & AI Automations Engineer
Published May 202628 min readAI Engineering

01.The Geometry of Semantics: High-Dimensional Latent Spaces

In the era of large language models (LLMs), unstructured text is treated not as a string of raw characters, but as a **vector coordinate within a high-dimensional mathematical space**.

When you pass a text block into an embedding model (like OpenAI's `text-embedding-3-small` or Google's `text-embedding-004`), the engine maps the semantic meaning of that text into an array consisting of hundreds or thousands of floating-point floats:

EMBEDDING OUTPUT SAMPLE (3072 Dimensions):Vector: [ 0.01249, -0.04581, 0.08914, ..., -0.00312, 0.05733 ]

Within this **latent space**, conceptually similar texts are mathematically grouped close together, while semantically contrasting subjects are positioned far apart.

*The core engine of Semantic Search is entirely geometric:* searching for similar documents is equivalent to plotting a query vector coordinate inside this high-dimensional space and locating the **K-Nearest Neighbor (KNN)** nodes.

02.The Mathematical Distance Metrics

To trace proximity between two coordinate vectors, vector databases employ three primary distance metrics:

1. Cosine Similarity
$$\cos(\theta) = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|}$$

Measures the angular projection between vectors. Value spans from -1 to 1. Completely ignores vector scale magnitude, focusing purely on spatial orientation.

2. Dot Product
$$\mathbf{A} \cdot \mathbf{B} = \sum_{i=1}^n A_i B_i$$

Calculates cumulative product sum of matching indexes. If vectors are strictly normalized to unit magnitude ($L2 = 1$), Dot Product yields identical rankings to Cosine similarity with much faster computation overhead.

3. L2 Distance
$$d(\mathbf{A}, \mathbf{B}) = \sqrt{\sum_{i=1}^n (A_i - B_i)^2}$$

Calculates direct physical spatial distance between coordinate tips. Values scale from 0 to infinity. Highly sensitive to vector magnitude variations.

03.The Scaling Bottleneck: Why Brute Force KNN Fails

In a naive search pipeline (Brute Force KNN), you scan through every single document vector in your database, computing its Cosine Similarity against the query vector, and sorting the results.

This operation scales linearly:$$\mathcal{O}(D \cdot N)$$*Where $D$ represents dimensions and $N$ represents the number of database records.*

If your database has 10,000 documents, brute-force calculation finishes instantly. But if your database scales to **10 million vectors with 1536 dimensions**, a single query requires executing **15.3 billion floating-point operations (FLOPs)**!

Your database will experience massive CPU saturation, and latency will scale from milliseconds to multiple seconds, crashing real-time conversational agents. We need **Approximate Nearest Neighbor (ANN)** indexing.

04.The HNSW Graph Architecture: Spatial Multi-Layer Skip Lists

The state-of-the-art algorithm to solve spatial indexing is **HNSW (Hierarchical Navigable Small World)**.

HNSW translates the computer science concept of a **Skip List** into a multi-layered graph data structure.

FIGURE 1. MULTI-LAYER HNSW SKIP GRAPH HIERARCHYLAYER 2 (Sparse Highway Nodes)LAYER 1 (Intermediate Navigation)LAYER 0 (Dense Ground Layer - All Vectors)

1. **Sparse Upper Layers:** At the highest layers (e.g. Layer 2), the graph contains only a few "highway" nodes connected by extremely wide spatial leaps. Navigating starts here, rapidly bounding down the graph toward the query coordinate.

2. **Descending Layer Invariant:** Once the local nearest neighbor is reached in Layer 2, navigation hops down to the identical node position inside Layer 1, scanning its higher-resolution neighborhood.

3. **Dense Ground Layer (Layer 0):** Finally, the search enters Layer 0 (which contains all nodes in the database) right near the optimal target, running a tiny local BFS search to secure the absolute exact K-Nearest matches.

This skip-graph traversal reduces search time from linear to logarithmic:$$\mathcal{O}(\log N)$$*Enabling sub-5ms lookups on databases containing tens of millions of records.*

05.Live Semantic Vector Space Sandbox

This sandbox simulates a 2D slice of a high-dimensional skills/framework latent space. **Drag the glowing purple Query Vector** around the coordinate space to see the distance calculations update instantly.

VECTOR LAB

Geometric Spatial Search Analyzer

Y-Axis: Semantic Space Dimension 2 (Frontend Scale)Origin: Center (50,50)
React Frontend
Next.js Framework
Tailwind CSS
PostgreSQL Database
MongoDB Schema
Redis In-Memory
PyTorch Model
TensorFlow Graph
Gemini API Agent
X-Axis: Semantic Space Dimension 1 (AI / Data Scale)Query Coordinates: X:45 | Y:55
LEADERBOARD COMPARISON
Cosine Similarity Rankings
#1
React FrontendWeb DIMENSION
1.0000similarity
#2
Next.js FrameworkWeb DIMENSION
1.0000similarity
#3
Tailwind CSSWeb DIMENSION
1.0000similarity
#4
PostgreSQL DatabaseDatabase DIMENSION
0.0000similarity
#5
MongoDB SchemaDatabase DIMENSION
0.0000similarity
#6
Redis In-MemoryDatabase DIMENSION
0.0000similarity
#7
Gemini API AgentAI DIMENSION
-0.9959similarity
#8
TensorFlow GraphAI DIMENSION
-1.0000similarity
#9
PyTorch ModelAI DIMENSION
-1.0000similarity
*Tip: Click and drag anywhere inside the coordinate grid above to reposition the query vector.

06.HNSW skip-graph Implementation in TypeScript

Here is a complete, optimized TypeScript index implementation demonstrating skip-layer allocation, heuristic connection mappings, and Cosine similarity scoring.

hnsw-vector-index.ts
class VectorNode {
  public id: string;
  public vector: number[];
  public connections: Map<number, Set<VectorNode>>; // Layer level -> Connected nodes

  constructor(id: string, vector: number[]) {
    this.id = id;
    this.vector = vector;
    this.connections = new Map();
  }

  public addConnection(level: number, node: VectorNode) {
    if (!this.connections.has(level)) {
      this.connections.set(level, new Set());
    }
    this.connections.get(level)!.add(node);
  }
}

export class HNSWIndex {
  private maxLevel: number;
  private M: number; // Max connections per node
  private efConstruction: number; // Search depth during index build
  private enterNode: VectorNode | null = null;
  private levelMultiplier: number;

  constructor(M: number = 16, efConstruction: number = 200) {
    this.M = M;
    this.efConstruction = efConstruction;
    this.maxLevel = 0;
    this.levelMultiplier = 1 / Math.log(M);
  }

  // Calculate Cosine Similarity between two floating arrays
  private computeCosine(v1: number[], v2: number[]): number {
    let dot = 0;
    let mag1 = 0;
    let mag2 = 0;
    for (let i = 0; i < v1.length; i++) {
      dot += v1[i] * v2[i];
      mag1 += v1[i] * v1[i];
      mag2 += v2[i] * v2[i];
    }
    return dot / (Math.sqrt(mag1) * Math.sqrt(mag2));
  }

  // Insert a new Vector into the Multi-Layer graph skip list
  public insert(id: string, vector: number[]) {
    const newNode = new VectorNode(id, vector);
    
    // Determine random maximum layer depth using exponential decay distribution
    const nodeMaxLevel = Math.floor(-Math.log(Math.random()) * this.levelMultiplier);
    
    if (this.enterNode === null) {
      this.enterNode = newNode;
      this.maxLevel = nodeMaxLevel;
      return;
    }

    let currNode = this.enterNode;
    let queryDist = this.computeCosine(vector, currNode.vector);

    // 1. Search sparse highway layers down to nodeMaxLevel
    for (let level = this.maxLevel; level > nodeMaxLevel; level--) {
      let changed = true;
      while (changed) {
        changed = false;
        const neighbors = currNode.connections.get(level) || new Set();
        for (const neighbor of neighbors) {
          const dist = this.computeCosine(vector, neighbor.vector);
          if (dist > queryDist) {
            queryDist = dist;
            currNode = neighbor;
            changed = true;
          }
        }
      }
    }

    // 2. Insert node into layers from nodeMaxLevel down to Layer 0
    for (let level = Math.min(this.maxLevel, nodeMaxLevel); level >= 0; level--) {
      // Find candidate neighbors utilizing greedy BFS search depth (efConstruction)
      const candidates = this.searchLayer(vector, currNode, this.efConstruction, level);
      
      // Connect new node to candidates and enforce max connection limit (M)
      for (const candidate of candidates) {
        newNode.addConnection(level, candidate);
        candidate.addConnection(level, newNode);
        
        // Trim connections if they exceed limit M
        this.shrinkConnections(candidate, level);
      }
      
      currNode = candidates[0] || currNode;
    }

    // Update global entry node if new node has higher level allocation
    if (nodeMaxLevel > this.maxLevel) {
      this.maxLevel = nodeMaxLevel;
      this.enterNode = newNode;
    }
  }

  private searchLayer(
    query: number[], 
    enterPoint: VectorNode, 
    ef: number, 
    level: number
  ): VectorNode[] {
    const visited = new Set<string>([enterPoint.id]);
    const candidates: VectorNode[] = [enterPoint];
    const results: { node: VectorNode; dist: number }[] = [
      { node: enterPoint, dist: this.computeCosine(query, enterPoint.vector) }
    ];

    while (candidates.length > 0) {
      // Sort to evaluate closest candidate node
      candidates.sort((a, b) => this.computeCosine(query, b.vector) - this.computeCosine(query, a.vector));
      const curr = candidates.shift()!;
      
      const neighbors = curr.connections.get(level) || new Set();
      for (const neighbor of neighbors) {
        if (!visited.has(neighbor.id)) {
          visited.add(neighbor.id);
          const dist = this.computeCosine(query, neighbor.vector);
          
          if (dist > results[results.length - 1].dist || results.length < ef) {
            candidates.push(neighbor);
            results.push({ node: neighbor, dist });
            
            // Keep results sorted and bound to ef search depth
            results.sort((a, b) => b.dist - a.dist);
            if (results.length > ef) {
              results.pop();
            }
          }
        }
      }
    }

    return results.map(r => r.node);
  }

  private shrinkConnections(node: VectorNode, level: number) {
    const connections = node.connections.get(level);
    if (!connections || connections.size <= this.M) return;

    // Rank connections by proximity and retain only top M closest neighbors
    const sorted = Array.from(connections).map(conn => ({
      node: conn,
      dist: this.computeCosine(node.vector, conn.vector)
    })).sort((a, b) => b.dist - a.dist);

    const pruned = new Set<VectorNode>();
    for (let i = 0; i < this.M; i++) {
      pruned.add(sorted[i].node);
    }
    node.connections.set(level, pruned);
  }
}

07.Production Scaling & Memory Management

When building state-of-the-art vector engines in real-world environments, keep these two critical performance considerations active:

1. Memory-Mapped Files (mmap)

High-dimensional vectors consume enormous amounts of memory. If a database holds 50 million vectors, keeping them all inside raw system RAM will lead to Out-Of-Memory (OOM) process crashes.

Use **Memory-Mapped Files (`mmap`)** to bind binary index files directly to virtual address space, letting the Operating System automatically swap vector blocks in and out of disk caching space depending on active queries.

2. Product Quantization (PQ)

Product Quantization is a lossy compression technique. It divides massive vectors into smaller sub-vectors, maps them to a cluster centroids codebook, and represents huge 1536-dimension float arrays as a compact string of integer indexes. This **reduces RAM consumption by up to 95%** with minimal precision loss.

08.Engineering Takeaways

Vector Search is the fundamental search engine infrastructure supporting LLM agentic reasoning, Retrieval-Augmented Generation (RAG), and recommender systems.

  • Geometric searches are metric-dependent: Always normalize your database coordinates ($L2 = 1$) to utilize lighting-fast dot product indexing rather than raw Cosine calculations.
  • Logarithmic search requires graph heuristics: Build HNSW index graphs using Skip List multi-layered structures to protect fast query response loops.
  • Design defensively for RAM: Leverage product quantization (PQ) and memory-mapped virtual allocations to protect system memory buffers.

Brute force scanning is a legacy pattern. Implementing multi-layer HNSW graph indexing is the key that enables lightning-fast semantic queries at scale.

EA

Ebenezer Akinseinde

I design highly optimized full-stack Next.js applications, robust AI retrieval search pipelines, and scalable database integrations. Let's build state-of-the-art features together.