SQL & Database Design | CampusToAI Academy

🧠

Concept

What is SQL and why relational databases?

A database is like an Excel spreadsheet on steroids — multiple linked sheets (tables), with rules that prevent bad data, and the ability to query millions of rows in milliseconds. SQL is the language you use to ask questions and manipulate that data.

SQL Category	Commands	Purpose
DDL — Data Definition	CREATE, ALTER, DROP	Define and modify table structure
DML — Data Manipulation	INSERT, UPDATE, DELETE	Add and modify data
DQL — Data Query	SELECT	Read and retrieve data
DCL — Data Control	GRANT, REVOKE	Manage permissions
TCL — Transaction Control	COMMIT, ROLLBACK	Manage transactions

🏗️

DDL

Creating Tables

-- Create the students table
CREATE TABLE students (
    id         SERIAL PRIMARY KEY,    -- Auto-increment ID
    name       VARCHAR(100) NOT NULL, -- Required field
    email      VARCHAR(255) UNIQUE NOT NULL,
    city       VARCHAR(50),
    age        INTEGER CHECK(age >= 18 AND age <= 60),
    enrolled_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Create courses table
CREATE TABLE courses (
    id    SERIAL PRIMARY KEY,
    title VARCHAR(200) NOT NULL,
    fee   DECIMAL(10,2)
);

-- Enrollment table (junction table for many-to-many)
CREATE TABLE enrollments (
    id         SERIAL PRIMARY KEY,
    student_id INTEGER REFERENCES students(id) ON DELETE CASCADE,
    course_id  INTEGER REFERENCES courses(id) ON DELETE CASCADE,
    enrolled_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(student_id, course_id)  -- Prevent duplicate enrollment
);

Constraint	Purpose
`PRIMARY KEY`	Unique + NOT NULL — main row identifier
`FOREIGN KEY` / `REFERENCES`	Link to another table's primary key
`NOT NULL`	Value is required
`UNIQUE`	No duplicate values allowed
`CHECK`	Value must satisfy a condition
`DEFAULT`	Use this value if none provided

✍️

DML + DQL

INSERT and SELECT

-- INSERT: add rows
INSERT INTO students (name, email, city, age)
VALUES ('Rahul Sharma', 'rahul@email.com', 'Pune', 22);

-- Insert multiple rows at once
INSERT INTO students (name, email, city, age) VALUES
    ('Priya Singh',  'priya@email.com',  'Mumbai', 21),
    ('Amit Patil',   'amit@email.com',   'Pune',   23);

-- SELECT: read rows
SELECT * FROM students;                     -- All columns, all rows
SELECT name, email FROM students;          -- Specific columns
SELECT name, age FROM students ORDER BY age DESC; -- Sort descending
SELECT * FROM students LIMIT 10 OFFSET 20;  -- Page 3 (20 records in)

-- UPDATE: modify rows
UPDATE students SET city = 'Bangalore' WHERE id = 1;

-- DELETE: remove rows
DELETE FROM students WHERE id = 1;

Always use WHERE with UPDATE and DELETE. Without it, you update or delete ALL rows. Always test your WHERE clause with a SELECT first.

🔍

Filtering

WHERE Clause and Operators

SELECT * FROM students WHERE city = 'Pune';
SELECT * FROM students WHERE age >= 21 AND city = 'Pune';
SELECT * FROM students WHERE city = 'Pune' OR city = 'Mumbai';
SELECT * FROM students WHERE city IN ('Pune', 'Mumbai', 'Delhi');
SELECT * FROM students WHERE age BETWEEN 20 AND 25;
SELECT * FROM students WHERE name LIKE 'R%';       -- Starts with R
SELECT * FROM students WHERE name LIKE '%Kumar%';   -- Contains Kumar
SELECT * FROM students WHERE phone IS NULL;          -- Missing phone
SELECT * FROM students WHERE phone IS NOT NULL;

🔗

Most Important

JOINs — Combining Tables

INNER JOIN = only rows that match in BOTH tables. Like finding people who attended both Party A AND Party B.
LEFT JOIN = all rows from the left table, plus matching rows from the right. Like listing all students and their course (if enrolled).

-- INNER JOIN: only enrolled students (those in enrollments table)
SELECT s.name, s.email, c.title AS course
FROM students s
INNER JOIN enrollments e ON s.id = e.student_id
INNER JOIN courses c    ON e.course_id = c.id;

-- LEFT JOIN: ALL students, even those not enrolled
-- (course column will be NULL for non-enrolled students)
SELECT s.name, c.title
FROM students s
LEFT JOIN enrollments e ON s.id = e.student_id
LEFT JOIN courses c    ON e.course_id = c.id;

-- Find students who are NOT enrolled
SELECT s.name
FROM students s
LEFT JOIN enrollments e ON s.id = e.student_id
WHERE e.student_id IS NULL;

JOIN Type	Returns
INNER JOIN	Rows matching in BOTH tables
LEFT JOIN	All left rows + matching right rows (NULL if no match)
RIGHT JOIN	All right rows + matching left rows
FULL OUTER JOIN	All rows from both tables

📊

Aggregates

GROUP BY and Aggregate Functions

-- Count students per city
SELECT city, COUNT(*) AS student_count
FROM students
GROUP BY city
ORDER BY student_count DESC;

-- Cities with more than 5 students (HAVING filters groups)
SELECT city, COUNT(*) AS student_count
FROM students
GROUP BY city
HAVING COUNT(*) > 5;

-- Other aggregate functions
SELECT
    COUNT(*) AS total_students,
    AVG(age)  AS avg_age,
    MIN(age)  AS youngest,
    MAX(age)  AS oldest,
    SUM(age)  AS age_sum
FROM students;

WHERE vs HAVING: WHERE filters individual rows BEFORE grouping. HAVING filters groups AFTER GROUP BY. You can't use aggregate functions (COUNT, AVG) in WHERE — use HAVING instead.

⚡

Performance

Indexes

An index in a database is like the index at the back of a textbook. Without it, to find "JPA" you'd read every single page (full table scan). With an index, you flip to "J" and go directly to page 234. Much faster.

-- Create index on frequently-searched column
CREATE INDEX idx_students_city ON students(city);
CREATE INDEX idx_students_email ON students(email);

-- Composite index — useful when you filter by multiple columns together
CREATE INDEX idx_students_city_age ON students(city, age);

-- Check if query uses index
EXPLAIN ANALYZE SELECT * FROM students WHERE city = 'Pune';

Don't over-index! Every index speeds up reads but slows down writes (INSERT, UPDATE, DELETE must update all indexes). Index only columns used frequently in WHERE, JOIN ON, or ORDER BY.

🔒

Reliability

Transactions & ACID Properties

A bank transfer is a classic example: debit ₹1000 from Account A, credit ₹1000 to Account B. Both must happen together or neither should happen. A transaction guarantees this "all or nothing" behaviour.

BEGIN;  -- Start transaction

UPDATE accounts SET balance = balance - 1000 WHERE id = 1;
UPDATE accounts SET balance = balance + 1000 WHERE id = 2;

-- If everything is OK
COMMIT;   -- Make changes permanent

-- If something goes wrong
ROLLBACK; -- Undo all changes since BEGIN

ACID Properties

Property	Meaning
Atomicity	All operations succeed or all fail — no partial state
Consistency	DB is in a valid state before and after the transaction
Isolation	Concurrent transactions don't interfere with each other
Durability	Committed transactions survive system failures

In Spring Boot, annotate your service methods with @Transactional — Spring automatically wraps the method in a database transaction and commits or rolls back based on success/exception.

📐

Design

Database Design & Normalization

Normalization is the process of organizing database tables to reduce data redundancy and improve data integrity.

Normal Form	Rule	Problem it solves
1NF	Each column has atomic values, no repeating groups	Storing multiple phone numbers in one cell
2NF	1NF + no partial dependency on composite key	Redundant data depending on only part of key
3NF	2NF + no transitive dependencies	Column depends on another non-key column

In practice: Design to 3NF. Sometimes you denormalize intentionally for performance. Most Spring Boot apps with JPA naturally end up well-normalized if you model entities correctly.

🎯

Interview Prep

Common Interview Questions

QWhat is the difference between INNER JOIN and LEFT JOIN?

INNER JOIN returns only rows where there is a match in both tables. If a student has no enrollment, they won't appear in the result.

LEFT JOIN returns all rows from the left table, plus matching rows from the right. If a student has no enrollment, they still appear with NULL for the course columns.

Use INNER JOIN when you only want records that have related data. Use LEFT JOIN when you want all records from the primary table regardless of related data.

QWhat is the difference between WHERE and HAVING?

WHERE filters individual rows before any grouping occurs. You cannot use aggregate functions (COUNT, SUM) in WHERE.

HAVING filters groups after GROUP BY is applied. You can use aggregate functions here.

Example: WHERE age > 18 filters students before grouping. HAVING COUNT(*) > 5 filters city groups that have more than 5 students.

QWhat is a primary key and a foreign key?

Primary Key — A column (or set of columns) that uniquely identifies each row in a table. Must be unique and NOT NULL. A table can have only one primary key.

Foreign Key — A column that references the primary key of another table. It enforces referential integrity — you can't insert an enrollment with a student_id that doesn't exist in the students table. ON DELETE CASCADE means child rows are deleted when the parent is deleted.

QWhat is an index and when should you use one?

An index is a data structure that improves the speed of data retrieval at the cost of additional storage and slower writes.

Use indexes on columns that are: frequently in WHERE clauses, used in JOIN conditions, used in ORDER BY/GROUP BY, foreign key columns.

Don't index columns with low cardinality (e.g., a boolean column), small tables, or columns that are rarely queried.

QWhat are ACID properties?

Atomicity — Transaction is all-or-nothing. Either all statements succeed, or none take effect. A failure in the middle rolls back all changes.

Consistency — A transaction brings the database from one valid state to another. All constraints, rules, and cascades are respected.

Isolation — Concurrent transactions are isolated from each other. The intermediate state of a transaction is not visible to other transactions.

Durability — Once committed, a transaction's changes persist even if the system crashes (written to disk/WAL logs).

QWhat is normalization? Why is it important?

Normalization is the process of organizing a database to reduce data redundancy (storing the same data in multiple places) and ensure data integrity.

Without normalization, updating a student's city in one place but not another causes inconsistency. Normalized design stores each piece of data in exactly one place, referenced everywhere it's needed.

3NF is the standard target for most applications.