AI for Java Architects – Part 2

Tokens, Embeddings and Vector Databases for Java Developers

“Databases store data. Vector databases store meaning.”


Introduction

In the previous article, we learned:

  • What an LLM is.
  • How transformers work.
  • Tokens and context windows.
  • Hallucinations.
  • Training and inference.

At this point, many developers ask:

If LLMs are trained on old data, how can they answer questions about my company’s documents?

The answer lies in one of the most important concepts in modern AI:

Embeddings

Embeddings are the foundation of:

  • RAG
  • Semantic search
  • Document retrieval
  • AI assistants
  • Recommendation systems
  • Knowledge bases

If LLMs are the brain, embeddings are the memory system.


Why Traditional Search Fails

Suppose your document contains:

Spring Boot microservice deployment guide.

A user searches:

How do I deploy Java services?

Traditional SQL search:

SELECT *
FROM documents
WHERE content LIKE '%deploy%'

Problems:

  • “microservice” ≠ “service”
  • “deployment” ≠ “deploy”
  • “Java service” ≠ “Spring Boot”

Keyword search depends on exact matches.

Humans search by meaning.


Semantic Search

Humans understand:

Car
Vehicle
Automobile

as related concepts.

Computers traditionally do not.

Embeddings solve this problem.


What is an Embedding?

An embedding converts text into numbers.

Example:

Java Developer

becomes:

[0.234, -0.612, 0.912, 0.341, ...]

The numbers themselves are meaningless to humans.

But mathematically:

Similar concepts become close together.


Example

Consider:

Java Developer
Spring Boot Engineer
Backend Architect

These vectors may be close.

Whereas:

Football Player
Cooking Recipe
Mountain Trekking

may be far away.


Java Analogy

Imagine:

class Employee {
    String skill;
}

Traditional search:

employee.skill.equals("Java")

Embedding search:

similarity(employee.skill, "Spring Boot")

The second approach understands meaning.


The Vector Space

Imagine a graph.

Words become coordinates.

Similar meanings cluster together.

Java
Spring
Hibernate
Microservices

appear close.

While:

Cricket
Pizza
Vacation

appear elsewhere.

This is called:

Vector Space


Dimensions

Embeddings often have:

  • 384 dimensions
  • 768 dimensions
  • 1024 dimensions
  • 1536 dimensions
  • 3072 dimensions

Example:

[0.12, 0.45, -0.89, ...]

A 1536-dimensional vector contains 1536 numbers.

Higher dimensions capture more meaning.


Why Numbers?

Machines perform mathematics.

Suppose:

Java Developer
Spring Engineer

Their vectors:

A = [1, 2, 3]

B = [1.1, 2.1, 3.1]

These are close.

Meanwhile:

Pizza Recipe

C = [10, 20, 30]

is far away.


Similarity Search

The fundamental question:

How close are two vectors?

This leads to:

Cosine Similarity


Cosine Similarity

Range:

1.0     Very Similar

0.0     Unrelated

-1.0    Opposite

Example:

QuerySimilarity
Spring Boot0.95
Java API0.91
Hibernate0.88
Cricket0.10

The AI retrieves the highest scores.


Why This Matters

Suppose the user asks:

How do I scale Java services?

The document contains:

Spring Boot microservices scaling guide.

Keyword search may fail.

Embedding search succeeds.


How Embeddings Are Created

Architecture:

Text
   ↓
Embedding Model
   ↓
Vector

Examples:

Java Developer

becomes:

[0.123, 0.555, -0.873 ...]

Popular Embedding Models

  • OpenAI embeddings
  • Amazon Titan embeddings
  • BGE
  • E5
  • Instructor
  • Sentence Transformers

These models specialize in meaning extraction.


The Document Pipeline

Enterprise AI usually follows:

PDF
Word
Wiki
Confluence
Database
       ↓
Text Extraction
       ↓
Chunking
       ↓
Embedding
       ↓
Vector Database

What is Chunking?

LLMs cannot read 500-page documents.

Therefore:

Documents are split.

Example:

Page 1 → Chunk 1
Page 2 → Chunk 2
Page 3 → Chunk 3

Each chunk gets an embedding.


Example

Document:

Spring Boot Deployment Guide

Chunk 1:

Docker configuration

Chunk 2:

Kubernetes deployment

Chunk 3:

AWS deployment

Each chunk becomes searchable.


Chunking Strategies

Fixed Size

500 words.


Sliding Window

Overlap:

Chunk 1: 1–500

Chunk 2: 450–950

Maintains context.


Semantic Chunking

Split by:

  • Sections
  • Headings
  • Topics

Usually produces better results.


Metadata

Documents also contain metadata.

Example:

{
  "document": "Architecture.pdf",
  "author": "Rahul",
  "team": "Platform",
  "year": 2026
}

This allows filtering.

Example:

Search only finance documents.


What is a Vector Database?

Traditional databases store rows.

Vector databases store:

Vector + Metadata + Document

Example:

IDVectorDocument
1[0.22…]Spring Boot
2[0.56…]Kafka
3[0.91…]AWS

Why SQL Databases Struggle

SQL excels at:

WHERE department = 'IT'

But struggles with:

Find documents similar to this sentence.

Vector databases are optimized for:

  • Nearest neighbors
  • Similarity search
  • Semantic retrieval

Popular Vector Databases

PGVector

PostgreSQL extension.

Excellent for enterprise applications.


Chroma

Simple and lightweight.

Great for learning.


FAISS

Built by Meta.

Extremely fast.


Pinecone

Managed cloud service.


Weaviate

Enterprise vector platform.


Why PGVector Excites Java Developers

You already know:

  • PostgreSQL
  • JDBC
  • JPA

Example:

SELECT *
FROM documents
ORDER BY embedding <-> query_vector
LIMIT 5;

Semantic search using SQL.


The Retrieval Process

Suppose the user asks:

How do I deploy Spring Boot on AWS?

Steps:

Question
      ↓
Embedding
      ↓
Vector Search
      ↓
Top Documents
      ↓
LLM
      ↓
Answer

This is the foundation of RAG.


Java Example

String query = "Deploy Spring Boot on AWS";

Vector vector = embeddingModel.embed(query);

List<Document> docs =
        vectorStore.similaritySearch(vector);

Simple.

Powerful.


Spring AI Example

List<Document> documents =
        vectorStore.similaritySearch(
                SearchRequest.query(question)
        );

Spring AI abstracts the complexity.


Why Vector Search is Revolutionary

Traditional systems answer:

What exactly matches?

AI systems answer:

What means something similar?

That is a fundamental shift.


Enterprise Applications

Knowledge Assistant

Ask:

What is our leave policy?


Architecture Assistant

Ask:

Explain this design.


Production Support Bot

Ask:

Similar incidents in the past?


API Assistant

Ask:

Which endpoint creates users?


Challenges

Embeddings are not perfect.

Problems:

  • Poor chunking
  • Wrong embedding models
  • Missing metadata
  • Low-quality documents

Good RAG systems depend heavily on retrieval quality.


Interview Questions

1. What is an embedding?

A numerical representation of text.


2. Why are embeddings needed?

To perform semantic search.


3. What is cosine similarity?

A measure of vector similarity.


4. Why do vector databases exist?

To efficiently search similar vectors.


5. What is chunking?

Splitting documents into smaller sections.


Hands-On Exercise

Take 20 architecture documents.

Try:

  1. Extract text.
  2. Split into chunks.
  3. Generate embeddings.
  4. Store in PGVector.
  5. Search semantically.

This single exercise will teach more than hours of theory.


Key Takeaways

✔ LLMs understand tokens.

✔ Embeddings understand meaning.

✔ Similar concepts produce similar vectors.

✔ Vector databases store semantic information.

✔ Chunking is critical.

✔ Similarity search powers modern AI.

✔ Embeddings are the foundation of RAG.


Coming Next

Part 3 — Retrieval Augmented Generation (RAG)

We will finally build:

  • Document search systems.
  • Company knowledge assistants.
  • PDF chatbots.
  • Architecture assistants.

Topics:

  • RAG architecture.
  • Retrieval pipelines.
  • Context assembly.
  • Re-ranking.
  • Citations.
  • Hallucination reduction.
  • Enterprise RAG patterns.

Because once we understand RAG, we stop asking:

What does AI know?

and begin asking:

What knowledge should AI use?


“The database revolution taught us how to store data. The vector revolution teaches us how to store meaning.”

Leave a Reply

Your email address will not be published. Required fields are marked *