โ† Back | SQL & Database Design
Week 3
Week 3 ยท Core Backend

SQL & Database Design

Every application stores data. SQL is the universal language for talking to relational databases. Understanding SQL deeply is what separates junior developers from strong hires.

๐Ÿ›ข๏ธ PostgreSQL / H2 JOINs & Aggregates Normalization
๐Ÿง 
Concept
What is SQL and why relational databases?
A database is like an Excel spreadsheet on steroids โ€” multiple linked sheets (tables), with rules that prevent bad data, and the ability to query millions of rows in milliseconds. SQL is the language you use to ask questions and manipulate that data.
SQL CategoryCommandsPurpose
DDL โ€” Data DefinitionCREATE, ALTER, DROPDefine and modify table structure
DML โ€” Data ManipulationINSERT, UPDATE, DELETEAdd and modify data
DQL โ€” Data QuerySELECTRead and retrieve data
DCL โ€” Data ControlGRANT, REVOKEManage permissions
TCL โ€” Transaction ControlCOMMIT, ROLLBACKManage transactions
๐Ÿ—๏ธ
DDL
Creating Tables
-- Create the students table CREATE TABLE students ( id SERIAL PRIMARY KEY, -- Auto-increment ID name VARCHAR(100) NOT NULL, -- Required field email VARCHAR(255) UNIQUE NOT NULL, city VARCHAR(50), age INTEGER CHECK(age >= 18 AND age <= 60), enrolled_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Create courses table CREATE TABLE courses ( id SERIAL PRIMARY KEY, title VARCHAR(200) NOT NULL, fee DECIMAL(10,2) ); -- Enrollment table (junction table for many-to-many) CREATE TABLE enrollments ( id SERIAL PRIMARY KEY, student_id INTEGER REFERENCES students(id) ON DELETE CASCADE, course_id INTEGER REFERENCES courses(id) ON DELETE CASCADE, enrolled_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, UNIQUE(student_id, course_id) -- Prevent duplicate enrollment );
ConstraintPurpose
PRIMARY KEYUnique + NOT NULL โ€” main row identifier
FOREIGN KEY / REFERENCESLink to another table's primary key
NOT NULLValue is required
UNIQUENo duplicate values allowed
CHECKValue must satisfy a condition
DEFAULTUse this value if none provided
โœ๏ธ
DML + DQL
INSERT and SELECT
-- INSERT: add rows INSERT INTO students (name, email, city, age) VALUES ('Rahul Sharma', 'rahul@email.com', 'Pune', 22); -- Insert multiple rows at once INSERT INTO students (name, email, city, age) VALUES ('Priya Singh', 'priya@email.com', 'Mumbai', 21), ('Amit Patil', 'amit@email.com', 'Pune', 23); -- SELECT: read rows SELECT * FROM students; -- All columns, all rows SELECT name, email FROM students; -- Specific columns SELECT name, age FROM students ORDER BY age DESC; -- Sort descending SELECT * FROM students LIMIT 10 OFFSET 20; -- Page 3 (20 records in) -- UPDATE: modify rows UPDATE students SET city = 'Bangalore' WHERE id = 1; -- DELETE: remove rows DELETE FROM students WHERE id = 1;
Always use WHERE with UPDATE and DELETE. Without it, you update or delete ALL rows. Always test your WHERE clause with a SELECT first.
๐Ÿ”
Filtering
WHERE Clause and Operators
SELECT * FROM students WHERE city = 'Pune'; SELECT * FROM students WHERE age >= 21 AND city = 'Pune'; SELECT * FROM students WHERE city = 'Pune' OR city = 'Mumbai'; SELECT * FROM students WHERE city IN ('Pune', 'Mumbai', 'Delhi'); SELECT * FROM students WHERE age BETWEEN 20 AND 25; SELECT * FROM students WHERE name LIKE 'R%'; -- Starts with R SELECT * FROM students WHERE name LIKE '%Kumar%'; -- Contains Kumar SELECT * FROM students WHERE phone IS NULL; -- Missing phone SELECT * FROM students WHERE phone IS NOT NULL;
๐Ÿ”—
Most Important
JOINs โ€” Combining Tables
INNER JOIN = only rows that match in BOTH tables. Like finding people who attended both Party A AND Party B.
LEFT JOIN = all rows from the left table, plus matching rows from the right. Like listing all students and their course (if enrolled).
-- INNER JOIN: only enrolled students (those in enrollments table) SELECT s.name, s.email, c.title AS course FROM students s INNER JOIN enrollments e ON s.id = e.student_id INNER JOIN courses c ON e.course_id = c.id; -- LEFT JOIN: ALL students, even those not enrolled -- (course column will be NULL for non-enrolled students) SELECT s.name, c.title FROM students s LEFT JOIN enrollments e ON s.id = e.student_id LEFT JOIN courses c ON e.course_id = c.id; -- Find students who are NOT enrolled SELECT s.name FROM students s LEFT JOIN enrollments e ON s.id = e.student_id WHERE e.student_id IS NULL;
JOIN TypeReturns
INNER JOINRows matching in BOTH tables
LEFT JOINAll left rows + matching right rows (NULL if no match)
RIGHT JOINAll right rows + matching left rows
FULL OUTER JOINAll rows from both tables
๐Ÿ“Š
Aggregates
GROUP BY and Aggregate Functions
-- Count students per city SELECT city, COUNT(*) AS student_count FROM students GROUP BY city ORDER BY student_count DESC; -- Cities with more than 5 students (HAVING filters groups) SELECT city, COUNT(*) AS student_count FROM students GROUP BY city HAVING COUNT(*) > 5; -- Other aggregate functions SELECT COUNT(*) AS total_students, AVG(age) AS avg_age, MIN(age) AS youngest, MAX(age) AS oldest, SUM(age) AS age_sum FROM students;
WHERE vs HAVING: WHERE filters individual rows BEFORE grouping. HAVING filters groups AFTER GROUP BY. You can't use aggregate functions (COUNT, AVG) in WHERE โ€” use HAVING instead.
โšก
Performance
Indexes
An index in a database is like the index at the back of a textbook. Without it, to find "JPA" you'd read every single page (full table scan). With an index, you flip to "J" and go directly to page 234. Much faster.
-- Create index on frequently-searched column CREATE INDEX idx_students_city ON students(city); CREATE INDEX idx_students_email ON students(email); -- Composite index โ€” useful when you filter by multiple columns together CREATE INDEX idx_students_city_age ON students(city, age); -- Check if query uses index EXPLAIN ANALYZE SELECT * FROM students WHERE city = 'Pune';
Don't over-index! Every index speeds up reads but slows down writes (INSERT, UPDATE, DELETE must update all indexes). Index only columns used frequently in WHERE, JOIN ON, or ORDER BY.
๐Ÿ”’
Reliability
Transactions & ACID Properties
A bank transfer is a classic example: debit โ‚น1000 from Account A, credit โ‚น1000 to Account B. Both must happen together or neither should happen. A transaction guarantees this "all or nothing" behaviour.
BEGIN; -- Start transaction UPDATE accounts SET balance = balance - 1000 WHERE id = 1; UPDATE accounts SET balance = balance + 1000 WHERE id = 2; -- If everything is OK COMMIT; -- Make changes permanent -- If something goes wrong ROLLBACK; -- Undo all changes since BEGIN

ACID Properties

PropertyMeaning
AtomicityAll operations succeed or all fail โ€” no partial state
ConsistencyDB is in a valid state before and after the transaction
IsolationConcurrent transactions don't interfere with each other
DurabilityCommitted transactions survive system failures
In Spring Boot, annotate your service methods with @Transactional โ€” Spring automatically wraps the method in a database transaction and commits or rolls back based on success/exception.
๐Ÿ“
Design
Database Design & Normalization

Normalization is the process of organizing database tables to reduce data redundancy and improve data integrity.

Normal FormRuleProblem it solves
1NFEach column has atomic values, no repeating groupsStoring multiple phone numbers in one cell
2NF1NF + no partial dependency on composite keyRedundant data depending on only part of key
3NF2NF + no transitive dependenciesColumn depends on another non-key column
In practice: Design to 3NF. Sometimes you denormalize intentionally for performance. Most Spring Boot apps with JPA naturally end up well-normalized if you model entities correctly.
๐ŸŽฏ
Interview Prep
Common Interview Questions
QWhat is the difference between INNER JOIN and LEFT JOIN?

INNER JOIN returns only rows where there is a match in both tables. If a student has no enrollment, they won't appear in the result.

LEFT JOIN returns all rows from the left table, plus matching rows from the right. If a student has no enrollment, they still appear with NULL for the course columns.

Use INNER JOIN when you only want records that have related data. Use LEFT JOIN when you want all records from the primary table regardless of related data.

QWhat is the difference between WHERE and HAVING?

WHERE filters individual rows before any grouping occurs. You cannot use aggregate functions (COUNT, SUM) in WHERE.

HAVING filters groups after GROUP BY is applied. You can use aggregate functions here.

Example: WHERE age > 18 filters students before grouping. HAVING COUNT(*) > 5 filters city groups that have more than 5 students.

QWhat is a primary key and a foreign key?

Primary Key โ€” A column (or set of columns) that uniquely identifies each row in a table. Must be unique and NOT NULL. A table can have only one primary key.

Foreign Key โ€” A column that references the primary key of another table. It enforces referential integrity โ€” you can't insert an enrollment with a student_id that doesn't exist in the students table. ON DELETE CASCADE means child rows are deleted when the parent is deleted.

QWhat is an index and when should you use one?

An index is a data structure that improves the speed of data retrieval at the cost of additional storage and slower writes.

Use indexes on columns that are: frequently in WHERE clauses, used in JOIN conditions, used in ORDER BY/GROUP BY, foreign key columns.

Don't index columns with low cardinality (e.g., a boolean column), small tables, or columns that are rarely queried.

QWhat are ACID properties?

Atomicity โ€” Transaction is all-or-nothing. Either all statements succeed, or none take effect. A failure in the middle rolls back all changes.

Consistency โ€” A transaction brings the database from one valid state to another. All constraints, rules, and cascades are respected.

Isolation โ€” Concurrent transactions are isolated from each other. The intermediate state of a transaction is not visible to other transactions.

Durability โ€” Once committed, a transaction's changes persist even if the system crashes (written to disk/WAL logs).

QWhat is normalization? Why is it important?

Normalization is the process of organizing a database to reduce data redundancy (storing the same data in multiple places) and ensure data integrity.

Without normalization, updating a student's city in one place but not another causes inconsistency. Normalized design stores each piece of data in exactly one place, referenced everywhere it's needed.

3NF is the standard target for most applications.