AI & LLM Integration
This is what makes you stand out. Integrating AI into your Java application turns a standard CRUD app into an intelligent product. These skills are the highest-demand in the market.
Popular LLMs: GPT-4o (OpenAI), Claude (Anthropic), Gemini (Google), Llama (Meta). As developers, we call these models via APIs.
| Term | Meaning |
|---|---|
| LLM | Large Language Model β the AI model (GPT-4o, Claude, etc.) |
| Prompt | The text you send to the model |
| Completion | The model's response |
| Token | ~4 characters or ~3/4 of a word. Pricing is per token. |
| Context window | Max tokens the model can "see" at once (GPT-4o: 128k tokens) |
| Temperature | Randomness: 0=deterministic/factual, 1=creative/varied |
| Embedding | A numeric vector representation of text (used for semantic search) |
Prompt engineering is the skill of crafting inputs that get the best output from an LLM. The quality of your prompt directly determines the quality of the response.
Key Techniques
| Technique | Example |
|---|---|
| Be specific | "Explain Java generics" β "Explain Java generics to a student who knows basic OOP but has never seen <T>, with one code example" |
| Assign a role | "You are a senior Java developer reviewing code for a fresher" |
| Provide context | "Given this Spring Boot controller: [code], what is wrong with it?" |
| Chain of thought | "Think step by step before answering" |
| Few-shot | Provide 1-2 examples of inputβoutput before asking your question |
RAG solution: Before asking the AI a question, search your own database for relevant documents and include them in the prompt. "Here is some context from our docs: [relevant text]. Now answer this question: [user question]"
RAG Flow
An LLM (Large Language Model) is a deep learning model trained on massive text datasets. It generates responses by predicting the next token based on patterns learned during training.
Traditional programming: explicit rules and logic coded by developer β deterministic output.
LLM: trained on examples β statistical pattern matching β probabilistic output. They're better at open-ended tasks (summarisation, Q&A, code generation) where explicit rules are impractical.
A token is the basic unit of text that an LLM processes. Approximately 1 token β 4 characters β 3/4 of a word in English.
"Hello world" β 2 tokens. A 500-word essay β 375 tokens.
Why it matters: API pricing is per token (input + output). Context window limits are in tokens. GPT-4o has a 128,000 token context window β roughly a 96,000-word book.
RAG is a technique that improves LLM responses by injecting relevant external knowledge into the prompt before querying the model.
Why: LLMs have a training cutoff and don't know your proprietary data. RAG lets you: ask questions about your own documents, get up-to-date information, reduce hallucinations (making up answers), and keep sensitive data in your own database rather than training the model on it.
Flow: user question β embed question β search vector DB β retrieve relevant docs β augment prompt with docs β LLM gives grounded answer.
An embedding is a numeric vector (list of floating-point numbers) that represents the semantic meaning of text. Texts with similar meanings produce similar vectors.
Example: "Hello" and "Hi" have similar embeddings. "Hello" and "Database" have very different embeddings.
Used in RAG: convert documents to embeddings and store in a vector database. When a user asks a question, convert the question to an embedding and find the most similar document embeddings (semantic search).
Prompt engineering is the practice of crafting effective inputs (prompts) to get desired outputs from LLMs. Key techniques:
- System message β set the AI's role and behaviour
- Few-shot prompting β provide examples of inputβoutput format
- Chain of thought β ask the model to "think step by step"
- Specificity β vague prompts get vague answers
- Output format β ask for JSON, markdown, bullet points