AI & LLM Integration | CampusToAI Academy

🧠

Concept

What is an LLM?

A Large Language Model (LLM) is like a very well-read person who has read almost every book, article, and webpage on the internet. You can ask them questions and they give thoughtful, human-like answers. They're not "thinking" the way humans do — they're predicting the most likely next word, over and over, extremely well.

Popular LLMs: GPT-4o (OpenAI), Claude (Anthropic), Gemini (Google), Llama (Meta). As developers, we call these models via APIs.

Term	Meaning
LLM	Large Language Model — the AI model (GPT-4o, Claude, etc.)
Prompt	The text you send to the model
Completion	The model's response
Token	~4 characters or ~3/4 of a word. Pricing is per token.
Context window	Max tokens the model can "see" at once (GPT-4o: 128k tokens)
Temperature	Randomness: 0=deterministic/factual, 1=creative/varied
Embedding	A numeric vector representation of text (used for semantic search)

🔌

Step 1

Calling OpenAI API Directly

Understanding the raw API before using LangChain4j

pom.xml — add dependency

<dependency>
    <groupId>com.openai</groupId>
    <artifactId>openai-java</artifactId>
    <version>0.8.0</version>
</dependency>

application.properties

openai.api.key=${OPENAI_API_KEY} # Never hardcode! Use env variable

Java — Simple chat request

@Service
public class AiService {

    @Value("${openai.api.key}")
    private String apiKey;

    public String chat(String userMessage) {
        OpenAI client = OpenAI.builder()
            .apiKey(apiKey)
            .build();

        ChatCompletion response = client.chat().completions().create(
            ChatCompletionCreateParams.builder()
                .model("gpt-4o-mini")
                .addUserMessage(userMessage)
                .temperature(0.7)
                .maxTokens(500)
                .build()
        );

        return response.choices().get(0).message().content().orElse("");
    }
}

// Usage in controller
@GetMapping("/ask")
public String ask(@RequestParam String question) {
    return aiService.chat(question);
}

API key security: Set OPENAI_API_KEY as an environment variable. NEVER commit it to Git. Add it to .gitignore. In production, use Azure Key Vault or App Service environment variables.

🦜

Step 2 — Recommended

LangChain4j with Spring Boot

The best way to build AI apps in Java

LangChain4j is like Spring Boot for AI. Just as Spring Boot auto-configures Tomcat, database connections, and JSON serialization, LangChain4j auto-configures LLM clients, memory, tools, and vector stores. Much less boilerplate.

pom.xml

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-spring-boot-starter</artifactId>
    <version>0.32.0</version>
</dependency>
<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-open-ai-spring-boot-starter</artifactId>
    <version>0.32.0</version>
</dependency>

application.properties

langchain4j.open-ai.chat-model.api-key=${OPENAI_API_KEY}
langchain4j.open-ai.chat-model.model-name=gpt-4o-mini
langchain4j.open-ai.chat-model.temperature=0.7

Create an AI Service interface — Spring magic!

// LangChain4j generates the implementation automatically
@AiService
public interface StudentAssistant {

    @SystemMessage("You are a helpful assistant for CampusToAI Academy students. "
                   + "Help them with Java, Spring Boot, and cloud topics. "
                   + "Be concise and give code examples when helpful.")
    String chat(@UserMessage String message);
}

// Inject and use it like any Spring bean!
@RestController
public class ChatController {

    private final StudentAssistant assistant;

    public ChatController(StudentAssistant assistant) {
        this.assistant = assistant;
    }

    @PostMapping("/api/chat")
    public String chat(@RequestBody String message) {
        return assistant.chat(message);
    }
}

This is it — that's your AI chatbot endpoint! Send POST to /api/chat with a question and get an AI answer. LangChain4j handles the API call, serialization, and response parsing.

✍️

Skill

Prompt Engineering

How to write effective prompts

Prompt engineering is the skill of crafting inputs that get the best output from an LLM. The quality of your prompt directly determines the quality of the response.

Key Techniques

Technique	Example
Be specific	"Explain Java generics" → "Explain Java generics to a student who knows basic OOP but has never seen <T>, with one code example"
Assign a role	"You are a senior Java developer reviewing code for a fresher"
Provide context	"Given this Spring Boot controller: [code], what is wrong with it?"
Chain of thought	"Think step by step before answering"
Few-shot	Provide 1-2 examples of input→output before asking your question

System vs User prompt in code

@AiService
public interface CodeReviewer {

    @SystemMessage(
        "You are an expert Java code reviewer. "
        + "Review the provided code and identify: bugs, security issues, "
        + "performance problems, and style violations. "
        + "Format your response as: [Issue Type]: description. Fix: suggestion."
    )
    String reviewCode(@UserMessage String code);
}

📚

Advanced Pattern

RAG — Retrieval Augmented Generation

Problem: LLMs don't know about YOUR data (your course materials, your company docs, your database). Their knowledge has a cutoff date.

RAG solution: Before asking the AI a question, search your own database for relevant documents and include them in the prompt. "Here is some context from our docs: [relevant text]. Now answer this question: [user question]"

RAG Flow

User asks: "What are the payment options for CampusToAI?"

1. Convert question to embedding (vector)
2. Search vector database for similar content
   → Finds: course-brochure.md, payment-terms.md
3. Build augmented prompt:
   "Context: [content from found documents]
    Question: What are the payment options for CampusToAI?"
4. Send to LLM → accurate, grounded answer

LangChain4j RAG with document ingestion

@Bean
public EmbeddingStoreIngestor ingestor(EmbeddingModel model, EmbeddingStore<TextSegment> store) {
    return EmbeddingStoreIngestor.builder()
        .embeddingModel(model)
        .embeddingStore(store)
        .build();
}

// Ingest your documents (once, at startup or via API)
public void ingestDocuments() {
    Document doc = FileSystemDocumentLoader.loadDocument("course-faq.txt");
    ingestor.ingest(doc);
}

// AI Service with RAG — content retriever auto-injects context
@AiService
public interface CourseAssistant {
    @SystemMessage("You are a CampusToAI Academy assistant. Use only the provided context to answer.")
    String answer(@UserMessage String question);
}

🎯

Interview Prep

Common Interview Questions

QWhat is an LLM and how is it different from traditional programming?

An LLM (Large Language Model) is a deep learning model trained on massive text datasets. It generates responses by predicting the next token based on patterns learned during training.

Traditional programming: explicit rules and logic coded by developer → deterministic output.

LLM: trained on examples → statistical pattern matching → probabilistic output. They're better at open-ended tasks (summarisation, Q&A, code generation) where explicit rules are impractical.

QWhat is a token in LLM context?

A token is the basic unit of text that an LLM processes. Approximately 1 token ≈ 4 characters ≈ 3/4 of a word in English.

"Hello world" ≈ 2 tokens. A 500-word essay ≈ 375 tokens.

Why it matters: API pricing is per token (input + output). Context window limits are in tokens. GPT-4o has a 128,000 token context window — roughly a 96,000-word book.

QWhat is RAG (Retrieval Augmented Generation)?

RAG is a technique that improves LLM responses by injecting relevant external knowledge into the prompt before querying the model.

Why: LLMs have a training cutoff and don't know your proprietary data. RAG lets you: ask questions about your own documents, get up-to-date information, reduce hallucinations (making up answers), and keep sensitive data in your own database rather than training the model on it.

Flow: user question → embed question → search vector DB → retrieve relevant docs → augment prompt with docs → LLM gives grounded answer.

QWhat is an embedding?

An embedding is a numeric vector (list of floating-point numbers) that represents the semantic meaning of text. Texts with similar meanings produce similar vectors.

Example: "Hello" and "Hi" have similar embeddings. "Hello" and "Database" have very different embeddings.

Used in RAG: convert documents to embeddings and store in a vector database. When a user asks a question, convert the question to an embedding and find the most similar document embeddings (semantic search).

QWhat is prompt engineering?

Prompt engineering is the practice of crafting effective inputs (prompts) to get desired outputs from LLMs. Key techniques:

System message — set the AI's role and behaviour
Few-shot prompting — provide examples of input→output format
Chain of thought — ask the model to "think step by step"
Specificity — vague prompts get vague answers
Output format — ask for JSON, markdown, bullet points