AI for Java Architects – Part 3

Retrieval Augmented Generation (RAG): Building AI Systems That Know Your Data

“The most valuable enterprise AI systems are not trained on the internet. They are built on your organization’s knowledge.”


Introduction

In the previous articles we learned:

Part 1

  • LLMs
  • Tokens
  • Context windows
  • Transformers
  • Hallucinations

Part 2

  • Embeddings
  • Vector databases
  • Similarity search
  • Chunking
  • Semantic retrieval

Now we combine everything.

This is the technology that transformed AI from:

“Chat with the internet”

into

“Chat with your company’s knowledge.”

Welcome to:

Retrieval Augmented Generation (RAG)


The Fundamental Problem

Suppose you ask ChatGPT:

What is Rahul’s architecture for the iEvent platform?

The model doesn’t know.

Why?

Because:

  • Your documents are private.
  • Your architecture diagrams are internal.
  • Your APIs are unpublished.
  • Your knowledge isn’t in the training data.

Traditional LLM:

Question
    ↓
LLM
    ↓
Answer

Result:

  • Hallucination
  • Wrong answers
  • Generic responses

What RAG Solves

RAG adds knowledge retrieval.

Question
      ↓
Retriever
      ↓
Relevant Documents
      ↓
LLM
      ↓
Answer

The model answers using:

  • Company documents
  • PDFs
  • Wikis
  • Architecture documents
  • Knowledge bases
  • APIs

Why Enterprises Love RAG

RAG provides:

✅ Current information

✅ Private knowledge

✅ Fewer hallucinations

✅ Citations

✅ No expensive model training

✅ Better accuracy


The Complete RAG Architecture

Documents
      ↓
Text Extraction
      ↓
Chunking
      ↓
Embeddings
      ↓
Vector Database
---------------------
User Question
      ↓
Question Embedding
      ↓
Similarity Search
      ↓
Top Chunks
      ↓
LLM Prompt
      ↓
Response

Example: Company HR Policy

Suppose the document contains:

Employees receive 24 annual leave days.

User asks:

How many vacation days do we get?

The steps:

  1. Question converted to vector.
  2. Similar chunks retrieved.
  3. Context added.
  4. LLM answers.

Response:

Employees receive 24 annual leave days.

The model did not “know” this.

It retrieved it.


Step 1: Document Ingestion

Documents may come from:

  • PDFs
  • Word documents
  • Confluence
  • SharePoint
  • Databases
  • Wikis
  • APIs
  • Emails

Example:

Architecture.pdf
Deployment.docx
API-Specification.pdf

Step 2: Text Extraction

AI cannot read PDFs directly.

We extract text.

Example:

Spring Boot services communicate through Kafka.

Step 3: Chunking

Large documents are divided.

Example:

Chunk 1: Architecture

Chunk 2: Deployment

Chunk 3: Monitoring

Why Chunking Matters

Poor chunk:

Pages 1–50

Good chunk:

Spring Boot deployment on AWS.

Smaller chunks:

  • Improve accuracy.
  • Reduce costs.
  • Improve retrieval.

Chunk Size Recommendations

ContentChunk Size
APIs300 tokens
Documentation500 tokens
Books1000 tokens
Code200 tokens
Policies400 tokens

Overlapping Chunks

Without overlap:

Chunk 1:
Spring Boot deployment

Chunk 2:
using Kubernetes.

The meaning is lost.

With overlap:

Chunk 1:
Spring Boot deployment using

Chunk 2:
deployment using Kubernetes

Step 4: Embedding Generation

Each chunk becomes a vector.

Chunk
    ↓
Embedding Model
    ↓
Vector

Step 5: Storage

Store:

  • Vector
  • Text
  • Metadata

Example:

TextTeamYear
API GuidePlatform2026

Step 6: User Question

User asks:

How are deadlines calculated?

Question becomes:

Embedding

Step 7: Similarity Search

The vector database returns:

  1. Holiday calculation logic.
  2. Working day rules.
  3. Deadline service documentation.

Step 8: Context Construction

The prompt becomes:

Answer using only the provided context.

Context:
------------
Deadline service calculates the nth
working day using holiday data.

Question:
How are deadlines calculated?

Step 9: LLM Generates Response

Output:

The system calculates deadlines using the nth working day logic and excludes holidays loaded from the holiday service.


RAG Prompt Template

You are a technical assistant.

Answer only from the context.

If the answer is unavailable, say:
"I don't know."

Context:
{documents}

Question:
{question}

This reduces hallucinations.


Naive RAG

Simple approach:

Question
     ↓
Vector Search
     ↓
Top 5 Chunks
     ↓
LLM

Works surprisingly well.


Advanced RAG

Enterprise systems often add:

  • Metadata filters
  • Hybrid search
  • Re-ranking
  • Context compression
  • Citation generation

Metadata Filtering

Example:

Search only finance documents.

department = Finance
year = 2026

Improves precision.


Hybrid Search

Combines:

  • Keyword search
  • Vector search

Example:

AWS deployment

Keyword:

  • AWS

Semantic:

  • cloud deployment

Together:

Better results.


Re-ranking

Vector search may return:

  1. AWS deployment.
  2. Kubernetes.
  3. Docker.

Re-rankers improve relevance.


Context Compression

Suppose:

Retrieved:

10,000 tokens

LLM limit:

4,000 tokens

Compression removes irrelevant content.


Citations

Modern RAG systems provide:

According to Architecture.pdf page 8…

This increases trust.


Spring AI RAG Example

List<Document> docs =
        vectorStore.similaritySearch(question);

String answer = chatClient.prompt()
        .user(question)
        .advisors(new QuestionAnswerAdvisor(vectorStore))
        .call()
        .content();

Very little code.

Powerful capabilities.


Architecture Example

Suppose you upload:

  • HLD
  • LLD
  • Sequence diagrams

Ask:

Explain the request flow.

AI can answer:

  • Gateway
  • Mediator
  • Choreo
  • Atomic

based entirely on your documents.


Enterprise RAG Use Cases

Knowledge Assistant

Company policies.


Architecture Assistant

Technical documentation.


Production Support Assistant

Incident history.


API Assistant

Swagger documentation.


Compliance Assistant

Regulations and rules.


HR Assistant

Employee handbooks.


Problems in RAG

RAG is not magic.

Problems:

  • Poor chunking.
  • Bad embeddings.
  • Wrong documents.
  • Missing context.
  • Duplicate chunks.

Garbage In, Garbage Out

Bad document:

Scanned image with OCR errors.

Bad results.

Good documents matter.


Hallucination Reduction

RAG reduces hallucinations because:

  • Answers come from documents.
  • Knowledge is grounded.
  • Context is explicit.

Why RAG Beats Fine-Tuning

Fine TuningRAG
ExpensiveCheaper
Slow updatesInstant updates
Retraining requiredJust add documents
DifficultSimpler

Most enterprises choose:

RAG first.


Real Architecture

S3
  ↓
Document Loader
  ↓
Chunking Service
  ↓
Embedding Service
  ↓
PGVector
  ↓
Spring Boot API
  ↓
LLM

AWS Architecture

S3
 ↓
Lambda
 ↓
Titan Embeddings
 ↓
OpenSearch
 ↓
Bedrock
 ↓
Spring AI Service

Interview Questions

What is RAG?

Retrieval Augmented Generation combines document retrieval and LLM generation.


Why use RAG?

To provide external knowledge.


What reduces hallucinations?

Context.


Why chunk documents?

LLMs have context limits.


Why embeddings?

To perform semantic search.


Hands-On Exercise

Build:

Employee Policy Bot

Documents:

  • Leave policy
  • Travel policy
  • WFH policy

Features:

  • Upload PDF.
  • Generate embeddings.
  • Store vectors.
  • Ask questions.

Project Ideas

Architecture Copilot

Upload:

  • HLD
  • LLD
  • API specs

Ask:

Explain this design.


Meeting Assistant

Upload transcripts.

Ask:

What decisions were made?


Production Support Bot

Upload:

  • RCA documents.
  • Incident reports.

Ask:

Similar incidents?


Key Takeaways

✔ RAG gives AI external knowledge.

✔ Embeddings enable retrieval.

✔ Chunking affects accuracy.

✔ Vector databases enable semantic search.

✔ Context reduces hallucinations.

✔ Most enterprise AI systems are RAG systems.


Coming Next

Part 4 — Building Your First RAG Application Using Spring AI

We will build:

  • Spring Boot application.
  • PGVector integration.
  • OpenAI embeddings.
  • Document ingestion.
  • Similarity search.
  • Question answering API.

For the first time in this series, we will move from concepts to code and build a complete AI application using technologies familiar to Java developers.


“Databases store facts. RAG turns those facts into conversations.”

Leave a Reply

Your email address will not be published. Required fields are marked *