Introduction to DBMS

Basics of Databases

Database Concepts and Architecture

Database: A collection of organized data that can be easily accessed, managed, and updated.
DBMS (Database Management System): A software system that allows users to create, manage, and interact with databases.
Architecture:
- Single-tier Architecture: The database is directly accessed by the user.
- Two-tier Architecture: The application and the database are separate (e.g., client-server model).
- Three-tier Architecture: Introduces a middleware layer between the user and the database for enhanced security and scalability.

File System vs. DBMS

Feature	File System	DBMS
Data Management	Manual file handling	Automated management
Data Redundancy	High	Reduced
Data Integrity	Difficult to maintain	Ensured via constraints
Security	Low	High with authentication
Querying	Manual search	SQL-based queries

Advantages of DBMS

Data Consistency: Avoids duplication and inconsistency.
Data Security: Provides access control mechanisms.
Efficient Data Retrieval: Uses indexing and queries for quick access.
Concurrency Control: Multiple users can access the database simultaneously.
Backup and Recovery: Ensures data safety in case of failures.

Database Users & Administrators

End Users: Regular users who interact with the database through applications.
Database Administrators (DBA): Manage user access, security, and database performance.
Developers: Design and develop database applications.
System Analysts: Define database requirements and structure.

Database Models

Hierarchical Model

Data is structured in a tree-like format (parent-child relationship).
Each parent can have multiple children, but each child has only one parent.
Example: Organization hierarchy (CEO → Managers → Employees).

Network Model

Data is structured as a graph where each record can have multiple parent and child records.
More flexible than the hierarchical model.
Example: University system where a student can enroll in multiple courses, and each course has multiple students.

Relational Model

Data is stored in tables (relations) with rows (tuples) and columns (attributes).
Uses Primary Keys and Foreign Keys to establish relationships.
Example: Student table with columns (ID, Name, Course) and Course table with columns (CourseID, CourseName).

Questions for Practice

What are the main differences between a file system and a DBMS?
Explain the advantages of using a DBMS.
What is the role of a database administrator?
Describe the hierarchical model with an example.
How does the relational model ensure data consistency?
1. Relational Data Model

The Relational Data Model is a way to structure data into tables (relations) consisting of rows (tuples) and columns (attributes).

1.1 Relational Schema, Tuples, Attributes

Relational Schema: Defines the structure of a table, including column names and data types.
Tuples: Rows in a table that store records.
Attributes: Columns that define the properties of data stored.

Example:

Student(SID: INT, Name: VARCHAR, Age: INT, Dept: VARCHAR)

Here, SID, Name, Age, Dept are attributes, and a single row represents a tuple.

1.2 Keys

Primary Key: A unique identifier for each tuple (e.g., SID in Student).
Foreign Key: An attribute that references a primary key in another table.
Candidate Key: A set of attributes that can uniquely identify a tuple.
Super Key: A set of one or more attributes that uniquely identify a tuple (includes candidate keys and primary keys).

Example:

Student(SID is the Primary Key)
Course(CID is the Primary Key, SID is a Foreign Key referencing Student)

1.3 Integrity Constraints

Entity Integrity: Primary key cannot be NULL.
Referential Integrity: Foreign keys must reference an existing primary key.
Domain Integrity: Attributes must have valid data types and constraints.

Example:

CREATE TABLE Student (
    SID INT PRIMARY KEY,
    Name VARCHAR(50) NOT NULL,
    Age INT CHECK (Age > 18)
);

2. Structured Query Language (SQL)

SQL is used to manage relational databases.

2.1 DDL (Data Definition Language)

Used for defining structures.

CREATE: Creates a table.
ALTER: Modifies a table.
DROP: Deletes a table.

Example:

CREATE TABLE Employee (
    ID INT PRIMARY KEY,
    Name VARCHAR(50),
    Salary DECIMAL(10,2)
);

2.2 DML (Data Manipulation Language)

Used for modifying data.

INSERT: Adds new records.
UPDATE: Modifies records.
DELETE: Removes records.

Example:

INSERT INTO Employee VALUES (1, 'John Doe', 50000);
UPDATE Employee SET Salary = 55000 WHERE ID = 1;
DELETE FROM Employee WHERE ID = 1;

2.3 DCL (Data Control Language)

Used for access control.

GRANT: Gives permissions.
REVOKE: Removes permissions.

Example:

GRANT SELECT ON Employee TO User1;
REVOKE SELECT ON Employee FROM User1;

2.4 TCL (Transaction Control Language)

Used for transaction management.

COMMIT: Saves changes.
ROLLBACK: Reverts changes.

Example:

BEGIN;
UPDATE Employee SET Salary = 60000 WHERE ID = 1;
ROLLBACK;  -- Reverts the update
COMMIT;    -- Confirms the change

2.5 SQL Queries

SELECT: Retrieves data.
INSERT: Adds data.
DELETE: Removes data.
UPDATE: Modifies data.

Example:

SELECT * FROM Employee WHERE Salary > 50000;

2.6 Joins

INNER JOIN: Returns matching rows.
LEFT JOIN: Returns all rows from the left table, matching rows from the right.
RIGHT JOIN: Returns all rows from the right table, matching rows from the left.
FULL JOIN: Returns all rows from both tables.

Example:

SELECT Employee.ID, Employee.Name, Department.DeptName
FROM Employee
INNER JOIN Department ON Employee.DeptID = Department.DeptID;

2.7 Subqueries & Views

Subquery: A query inside another query.
View: A virtual table derived from a query.

Example:

CREATE VIEW HighSalary AS
SELECT Name, Salary FROM Employee WHERE Salary > 60000;

ER Model & Normalization

ER Model (Entity-Relationship Model)

1. Entities, Attributes, Relationships

Entity: A real-world object (e.g., Student, Course).
Attribute: A property of an entity (e.g., Student has Name, Roll No).
Relationship: Association between entities (e.g., Student enrolls in Course).

Example:

Entity: Student
Attributes: ID, Name, Age
Relationship: Student enrolls in Course

2. Cardinality & Participation

Cardinality defines the number of entities in a relationship (1:1, 1:M, M:N).
Participation can be total (all entities participate) or partial (some entities participate).

Example:

A student enrolls in multiple courses (1:M cardinality)
A professor must be assigned to at least one department (total participation)

3. ER to Relational Mapping

Convert entities to tables.
Convert attributes to columns.
Convert relationships using foreign keys.
Handle weak entities using primary keys from the strong entity.

Example:

Student (ID, Name)	Course (Code, Title)	Enrollment (Student_ID, Course_Code)

Normalization

1. Functional Dependencies

A dependency between attributes in a relation (e.g., _RollNo → Name means Roll_No determines Name).

2. Normal Forms

1NF: No repeating groups; all attributes have atomic values.
2NF: 1NF + No partial dependency.
3NF: 2NF + No transitive dependency.
BCNF: 3NF + Every determinant is a candidate key.
4NF: BCNF + No multi-valued dependencies.
5NF: 4NF + No join dependencies.

Example (Normalization Steps):

Order_ID	Item	Supplier	Quantity
101	A	X	10
101	B	Y	5

1NF: Convert multi-valued attributes into separate rows.
2NF: Ensure all non-key attributes depend on the whole key.
3NF: Remove transitive dependencies.

Transactions & Recovery

Transaction Management

1. ACID Properties

Atomicity: All operations succeed or none do.
Consistency: Database remains valid.
Isolation: Transactions don’t interfere.
Durability: Committed changes persist.

2. States of a Transaction

Active → Partially Committed → Committed (Success)
Active → Failed → Aborted → Rolled Back

Concurrency Control

1. Problems in Concurrency

Lost Update: Two transactions overwrite each other.
Dirty Read: One transaction reads uncommitted data.
Non-Repeatable Read: A value changes after being read.

2. Locking Protocols

2PL (Two-Phase Locking): Growing phase (acquires locks), shrinking phase (releases locks).
Time-Stamp Ordering: Transactions execute based on assigned timestamps.

3. Deadlocks & Prevention

Deadlock: Two transactions wait indefinitely for each other’s resources.
Prevention Techniques: Timeout, Wait-Die, Wound-Wait.

Recovery Techniques

1. Log-based Recovery

Write-Ahead Logging (WAL): Changes are recorded in logs before applying to the database.
Undo (Rollback) and Redo (Reapply) operations ensure consistency.

2. Checkpointing

A checkpoint saves the database state periodically to speed up recovery.

Example Question:

Define ACID properties with examples.
What is the difference between 3NF and BCNF?
Explain the Two-Phase Locking (2PL) protocol.
How does checkpointing help in database recovery?
Storage & Indexing

File Organization & Indexing

Heap File Organization

Records are stored in any order without sorting.
Easy insertion but slow search.
Used when fast insertion is required.
Example: Unordered customer records in a database.

Sequential File Organization

Records are stored in sorted order.
Fast for range queries but slow insertion.
Used in applications that require frequent searching.
Example: Employee records sorted by ID.

Hashing

Uses a hash function to map keys to specific locations.
Fast retrieval but inefficient for range queries.
Used in key-value storage.
Example: Caching in databases.

B+ Trees & B-Trees

B-Trees: Balanced tree structure, used for indexing.
B+ Trees: Enhanced version of B-Trees with all keys stored in leaf nodes, making range queries faster.
Used in databases and file systems.
Example: File directory indexing.

Indexing Techniques

Clustered Indexing

Data is physically stored based on the index.
Faster access for range queries.
Example: Primary key-based indexing.

Non-Clustered Indexing

Index stores pointers to actual data locations.
Slower than clustered indexes but supports multiple indexes.
Example: Secondary indexes on email addresses.

NoSQL & Advanced Topics

Introduction to NoSQL Databases

Key-Value Stores

Uses key-value pairs for data storage.
Fast lookups but limited query capability.
Example: Redis, Amazon DynamoDB.

Document-Based Stores (MongoDB)

Stores data as JSON/BSON documents.
Flexible schema and suitable for semi-structured data.
Example: Storing user profiles.

Column-Oriented Stores

Stores data in columns rather than rows.
Efficient for analytical queries.
Example: Google Bigtable, Apache Cassandra.

Graph Databases

Stores data in nodes and edges.
Efficient for relationship-based queries.
Example: Social media connections in Neo4j.

Big Data & Distributed Databases

Distributed DBMS Concepts

Data is distributed across multiple nodes for scalability.
Ensures fault tolerance and high availability.
Example: Google Spanner.

CAP Theorem

Consistency: All nodes see the same data.
Availability: Every request gets a response.
Partition Tolerance: System functions even if network fails.
Trade-offs must be made; databases optimize two out of three.
Example: MongoDB prioritizes availability and partition tolerance.

Example Questions

Explain the differences between heap, sequential, and hashing file organizations.
What are the advantages of using B+ Trees over B-Trees?
Compare clustered and non-clustered indexing with examples.
How does a key-value store differ from a document-based store?
What are the trade-offs involved in CAP theorem?
Why are column-oriented stores preferred for analytical queries?
How do graph databases efficiently store and retrieve relationship-based data?
What is the role of distributed DBMS in handling big data?

DBMS (Database Management System) Notes

Introduction to DBMS

Basics of Databases

Database Concepts and Architecture

File System vs. DBMS

Advantages of DBMS

Database Users & Administrators

Database Models

Hierarchical Model

Network Model

Relational Model

Questions for Practice

1. Relational Data Model

1.1 Relational Schema, Tuples, Attributes

1.2 Keys

1.3 Integrity Constraints

2. Structured Query Language (SQL)

2.1 DDL (Data Definition Language)

2.2 DML (Data Manipulation Language)

2.3 DCL (Data Control Language)

2.4 TCL (Transaction Control Language)

2.5 SQL Queries

2.6 Joins

2.7 Subqueries & Views

ER Model & Normalization

ER Model (Entity-Relationship Model)

1. Entities, Attributes, Relationships

2. Cardinality & Participation

3. ER to Relational Mapping

Normalization

1. Functional Dependencies

2. Normal Forms

Transactions & Recovery

Transaction Management

1. ACID Properties

2. States of a Transaction

Concurrency Control

1. Problems in Concurrency

2. Locking Protocols

3. Deadlocks & Prevention

Recovery Techniques

1. Log-based Recovery

2. Checkpointing

Storage & Indexing

File Organization & Indexing

Heap File Organization

Sequential File Organization

Hashing

B+ Trees & B-Trees

Indexing Techniques

Clustered Indexing

Non-Clustered Indexing

NoSQL & Advanced Topics

Introduction to NoSQL Databases

Key-Value Stores

Document-Based Stores (MongoDB)

Column-Oriented Stores

Graph Databases

Big Data & Distributed Databases

Distributed DBMS Concepts

CAP Theorem

Example Questions

You may like these posts

Post a Comment

Let's Make Magic Happen! ✨

Contact form