Introduction to DBMS
Basics of Databases
Database Concepts and Architecture
- Database: A collection of organized data that can be easily accessed, managed, and updated.
- DBMS (Database Management System): A software system that allows users to create, manage, and interact with databases.
- Architecture:
- Single-tier Architecture: The database is directly accessed by the user.
- Two-tier Architecture: The application and the database are separate (e.g., client-server model).
- Three-tier Architecture: Introduces a middleware layer between the user and the database for enhanced security and scalability.
File System vs. DBMS
Feature | File System | DBMS |
---|---|---|
Data Management | Manual file handling | Automated management |
Data Redundancy | High | Reduced |
Data Integrity | Difficult to maintain | Ensured via constraints |
Security | Low | High with authentication |
Querying | Manual search | SQL-based queries |
Advantages of DBMS
- Data Consistency: Avoids duplication and inconsistency.
- Data Security: Provides access control mechanisms.
- Efficient Data Retrieval: Uses indexing and queries for quick access.
- Concurrency Control: Multiple users can access the database simultaneously.
- Backup and Recovery: Ensures data safety in case of failures.
Database Users & Administrators
- End Users: Regular users who interact with the database through applications.
- Database Administrators (DBA): Manage user access, security, and database performance.
- Developers: Design and develop database applications.
- System Analysts: Define database requirements and structure.
Database Models
Hierarchical Model
- Data is structured in a tree-like format (parent-child relationship).
- Each parent can have multiple children, but each child has only one parent.
- Example: Organization hierarchy (CEO → Managers → Employees).
Network Model
- Data is structured as a graph where each record can have multiple parent and child records.
- More flexible than the hierarchical model.
- Example: University system where a student can enroll in multiple courses, and each course has multiple students.
Relational Model
- Data is stored in tables (relations) with rows (tuples) and columns (attributes).
- Uses Primary Keys and Foreign Keys to establish relationships.
- Example: Student table with columns (ID, Name, Course) and Course table with columns (CourseID, CourseName).
Questions for Practice
- What are the main differences between a file system and a DBMS?
- Explain the advantages of using a DBMS.
- What is the role of a database administrator?
- Describe the hierarchical model with an example.
- How does the relational model ensure data consistency?
1. Relational Data Model
The Relational Data Model is a way to structure data into tables (relations) consisting of rows (tuples) and columns (attributes).
1.1 Relational Schema, Tuples, Attributes
- Relational Schema: Defines the structure of a table, including column names and data types.
- Tuples: Rows in a table that store records.
- Attributes: Columns that define the properties of data stored.
Example:
Student(SID: INT, Name: VARCHAR, Age: INT, Dept: VARCHAR)
Here, SID, Name, Age, Dept are attributes, and a single row represents a tuple.
1.2 Keys
- Primary Key: A unique identifier for each tuple (e.g., SID in Student).
- Foreign Key: An attribute that references a primary key in another table.
- Candidate Key: A set of attributes that can uniquely identify a tuple.
- Super Key: A set of one or more attributes that uniquely identify a tuple (includes candidate keys and primary keys).
Example:
Student(SID is the Primary Key)
Course(CID is the Primary Key, SID is a Foreign Key referencing Student)
1.3 Integrity Constraints
- Entity Integrity: Primary key cannot be NULL.
- Referential Integrity: Foreign keys must reference an existing primary key.
- Domain Integrity: Attributes must have valid data types and constraints.
Example:
CREATE TABLE Student (
SID INT PRIMARY KEY,
Name VARCHAR(50) NOT NULL,
Age INT CHECK (Age > 18)
);
2. Structured Query Language (SQL)
SQL is used to manage relational databases.
2.1 DDL (Data Definition Language)
Used for defining structures.
- CREATE: Creates a table.
- ALTER: Modifies a table.
- DROP: Deletes a table.
Example:
CREATE TABLE Employee (
ID INT PRIMARY KEY,
Name VARCHAR(50),
Salary DECIMAL(10,2)
);
2.2 DML (Data Manipulation Language)
Used for modifying data.
- INSERT: Adds new records.
- UPDATE: Modifies records.
- DELETE: Removes records.
Example:
INSERT INTO Employee VALUES (1, 'John Doe', 50000);
UPDATE Employee SET Salary = 55000 WHERE ID = 1;
DELETE FROM Employee WHERE ID = 1;
2.3 DCL (Data Control Language)
Used for access control.
- GRANT: Gives permissions.
- REVOKE: Removes permissions.
Example:
GRANT SELECT ON Employee TO User1;
REVOKE SELECT ON Employee FROM User1;
2.4 TCL (Transaction Control Language)
Used for transaction management.
- COMMIT: Saves changes.
- ROLLBACK: Reverts changes.
Example:
BEGIN;
UPDATE Employee SET Salary = 60000 WHERE ID = 1;
ROLLBACK; -- Reverts the update
COMMIT; -- Confirms the change
2.5 SQL Queries
- SELECT: Retrieves data.
- INSERT: Adds data.
- DELETE: Removes data.
- UPDATE: Modifies data.
Example:
SELECT * FROM Employee WHERE Salary > 50000;
2.6 Joins
- INNER JOIN: Returns matching rows.
- LEFT JOIN: Returns all rows from the left table, matching rows from the right.
- RIGHT JOIN: Returns all rows from the right table, matching rows from the left.
- FULL JOIN: Returns all rows from both tables.
Example:
SELECT Employee.ID, Employee.Name, Department.DeptName
FROM Employee
INNER JOIN Department ON Employee.DeptID = Department.DeptID;
2.7 Subqueries & Views
- Subquery: A query inside another query.
- View: A virtual table derived from a query.
Example:
CREATE VIEW HighSalary AS
SELECT Name, Salary FROM Employee WHERE Salary > 60000;
ER Model & Normalization
ER Model (Entity-Relationship Model)
1. Entities, Attributes, Relationships
- Entity: A real-world object (e.g., Student, Course).
- Attribute: A property of an entity (e.g., Student has Name, Roll No).
- Relationship: Association between entities (e.g., Student enrolls in Course).
Example:
- Entity: Student
- Attributes: ID, Name, Age
- Relationship: Student enrolls in Course
2. Cardinality & Participation
- Cardinality defines the number of entities in a relationship (1:1, 1:M, M:N).
- Participation can be total (all entities participate) or partial (some entities participate).
Example:
- A student enrolls in multiple courses (1:M cardinality)
- A professor must be assigned to at least one department (total participation)
3. ER to Relational Mapping
- Convert entities to tables.
- Convert attributes to columns.
- Convert relationships using foreign keys.
- Handle weak entities using primary keys from the strong entity.
Example:
Student (ID, Name) | Course (Code, Title) | Enrollment (Student_ID, Course_Code) |
---|---|---|
Normalization
1. Functional Dependencies
- A dependency between attributes in a relation (e.g., _RollNo → Name means Roll_No determines Name).
2. Normal Forms
- 1NF: No repeating groups; all attributes have atomic values.
- 2NF: 1NF + No partial dependency.
- 3NF: 2NF + No transitive dependency.
- BCNF: 3NF + Every determinant is a candidate key.
- 4NF: BCNF + No multi-valued dependencies.
- 5NF: 4NF + No join dependencies.
Example (Normalization Steps):
Order_ID | Item | Supplier | Quantity |
---|---|---|---|
101 | A | X | 10 |
101 | B | Y | 5 |
- 1NF: Convert multi-valued attributes into separate rows.
- 2NF: Ensure all non-key attributes depend on the whole key.
- 3NF: Remove transitive dependencies.
Transactions & Recovery
Transaction Management
1. ACID Properties
- Atomicity: All operations succeed or none do.
- Consistency: Database remains valid.
- Isolation: Transactions don’t interfere.
- Durability: Committed changes persist.
2. States of a Transaction
- Active → Partially Committed → Committed (Success)
- Active → Failed → Aborted → Rolled Back
Concurrency Control
1. Problems in Concurrency
- Lost Update: Two transactions overwrite each other.
- Dirty Read: One transaction reads uncommitted data.
- Non-Repeatable Read: A value changes after being read.
2. Locking Protocols
- 2PL (Two-Phase Locking): Growing phase (acquires locks), shrinking phase (releases locks).
- Time-Stamp Ordering: Transactions execute based on assigned timestamps.
3. Deadlocks & Prevention
- Deadlock: Two transactions wait indefinitely for each other’s resources.
- Prevention Techniques: Timeout, Wait-Die, Wound-Wait.
Recovery Techniques
1. Log-based Recovery
- Write-Ahead Logging (WAL): Changes are recorded in logs before applying to the database.
- Undo (Rollback) and Redo (Reapply) operations ensure consistency.
2. Checkpointing
- A checkpoint saves the database state periodically to speed up recovery.
Example Question:
- Define ACID properties with examples.
- What is the difference between 3NF and BCNF?
- Explain the Two-Phase Locking (2PL) protocol.
- How does checkpointing help in database recovery?
Storage & Indexing
File Organization & Indexing
Heap File Organization
- Records are stored in any order without sorting.
- Easy insertion but slow search.
- Used when fast insertion is required.
- Example: Unordered customer records in a database.
Sequential File Organization
- Records are stored in sorted order.
- Fast for range queries but slow insertion.
- Used in applications that require frequent searching.
- Example: Employee records sorted by ID.
Hashing
- Uses a hash function to map keys to specific locations.
- Fast retrieval but inefficient for range queries.
- Used in key-value storage.
- Example: Caching in databases.
B+ Trees & B-Trees
- B-Trees: Balanced tree structure, used for indexing.
- B+ Trees: Enhanced version of B-Trees with all keys stored in leaf nodes, making range queries faster.
- Used in databases and file systems.
- Example: File directory indexing.
Indexing Techniques
Clustered Indexing
- Data is physically stored based on the index.
- Faster access for range queries.
- Example: Primary key-based indexing.
Non-Clustered Indexing
- Index stores pointers to actual data locations.
- Slower than clustered indexes but supports multiple indexes.
- Example: Secondary indexes on email addresses.
NoSQL & Advanced Topics
Introduction to NoSQL Databases
Key-Value Stores
- Uses key-value pairs for data storage.
- Fast lookups but limited query capability.
- Example: Redis, Amazon DynamoDB.
Document-Based Stores (MongoDB)
- Stores data as JSON/BSON documents.
- Flexible schema and suitable for semi-structured data.
- Example: Storing user profiles.
Column-Oriented Stores
- Stores data in columns rather than rows.
- Efficient for analytical queries.
- Example: Google Bigtable, Apache Cassandra.
Graph Databases
- Stores data in nodes and edges.
- Efficient for relationship-based queries.
- Example: Social media connections in Neo4j.
Big Data & Distributed Databases
Distributed DBMS Concepts
- Data is distributed across multiple nodes for scalability.
- Ensures fault tolerance and high availability.
- Example: Google Spanner.
CAP Theorem
- Consistency: All nodes see the same data.
- Availability: Every request gets a response.
- Partition Tolerance: System functions even if network fails.
- Trade-offs must be made; databases optimize two out of three.
- Example: MongoDB prioritizes availability and partition tolerance.
Example Questions
- Explain the differences between heap, sequential, and hashing file organizations.
- What are the advantages of using B+ Trees over B-Trees?
- Compare clustered and non-clustered indexing with examples.
- How does a key-value store differ from a document-based store?
- What are the trade-offs involved in CAP theorem?
- Why are column-oriented stores preferred for analytical queries?
- How do graph databases efficiently store and retrieve relationship-based data?
- What is the role of distributed DBMS in handling big data?