DBMS (Database Management System) Notes

0

Introduction to DBMS

Basics of Databases

Database Concepts and Architecture

  • Database: A collection of organized data that can be easily accessed, managed, and updated.
  • DBMS (Database Management System): A software system that allows users to create, manage, and interact with databases.
  • Architecture:
    • Single-tier Architecture: The database is directly accessed by the user.
    • Two-tier Architecture: The application and the database are separate (e.g., client-server model).
    • Three-tier Architecture: Introduces a middleware layer between the user and the database for enhanced security and scalability.

File System vs. DBMS

Feature File System DBMS
Data Management Manual file handling Automated management
Data Redundancy High Reduced
Data Integrity Difficult to maintain Ensured via constraints
Security Low High with authentication
Querying Manual search SQL-based queries

Advantages of DBMS

  1. Data Consistency: Avoids duplication and inconsistency.
  2. Data Security: Provides access control mechanisms.
  3. Efficient Data Retrieval: Uses indexing and queries for quick access.
  4. Concurrency Control: Multiple users can access the database simultaneously.
  5. Backup and Recovery: Ensures data safety in case of failures.

Database Users & Administrators

  • End Users: Regular users who interact with the database through applications.
  • Database Administrators (DBA): Manage user access, security, and database performance.
  • Developers: Design and develop database applications.
  • System Analysts: Define database requirements and structure.

Database Models

Hierarchical Model

  • Data is structured in a tree-like format (parent-child relationship).
  • Each parent can have multiple children, but each child has only one parent.
  • Example: Organization hierarchy (CEO → Managers → Employees).

Network Model

  • Data is structured as a graph where each record can have multiple parent and child records.
  • More flexible than the hierarchical model.
  • Example: University system where a student can enroll in multiple courses, and each course has multiple students.

Relational Model

  • Data is stored in tables (relations) with rows (tuples) and columns (attributes).
  • Uses Primary Keys and Foreign Keys to establish relationships.
  • Example: Student table with columns (ID, Name, Course) and Course table with columns (CourseID, CourseName).

Questions for Practice

  1. What are the main differences between a file system and a DBMS?
  2. Explain the advantages of using a DBMS.
  3. What is the role of a database administrator?
  4. Describe the hierarchical model with an example.
  5. How does the relational model ensure data consistency?

    1. Relational Data Model

The Relational Data Model is a way to structure data into tables (relations) consisting of rows (tuples) and columns (attributes).

1.1 Relational Schema, Tuples, Attributes

  • Relational Schema: Defines the structure of a table, including column names and data types.
  • Tuples: Rows in a table that store records.
  • Attributes: Columns that define the properties of data stored.

Example:

Student(SID: INT, Name: VARCHAR, Age: INT, Dept: VARCHAR)

Here, SID, Name, Age, Dept are attributes, and a single row represents a tuple.

1.2 Keys

  • Primary Key: A unique identifier for each tuple (e.g., SID in Student).
  • Foreign Key: An attribute that references a primary key in another table.
  • Candidate Key: A set of attributes that can uniquely identify a tuple.
  • Super Key: A set of one or more attributes that uniquely identify a tuple (includes candidate keys and primary keys).

Example:

Student(SID is the Primary Key)
Course(CID is the Primary Key, SID is a Foreign Key referencing Student)

1.3 Integrity Constraints

  • Entity Integrity: Primary key cannot be NULL.
  • Referential Integrity: Foreign keys must reference an existing primary key.
  • Domain Integrity: Attributes must have valid data types and constraints.

Example:

CREATE TABLE Student (
    SID INT PRIMARY KEY,
    Name VARCHAR(50) NOT NULL,
    Age INT CHECK (Age > 18)
);

2. Structured Query Language (SQL)

SQL is used to manage relational databases.

2.1 DDL (Data Definition Language)

Used for defining structures.

  • CREATE: Creates a table.
  • ALTER: Modifies a table.
  • DROP: Deletes a table.

Example:

CREATE TABLE Employee (
    ID INT PRIMARY KEY,
    Name VARCHAR(50),
    Salary DECIMAL(10,2)
);

2.2 DML (Data Manipulation Language)

Used for modifying data.

  • INSERT: Adds new records.
  • UPDATE: Modifies records.
  • DELETE: Removes records.

Example:

INSERT INTO Employee VALUES (1, 'John Doe', 50000);
UPDATE Employee SET Salary = 55000 WHERE ID = 1;
DELETE FROM Employee WHERE ID = 1;

2.3 DCL (Data Control Language)

Used for access control.

  • GRANT: Gives permissions.
  • REVOKE: Removes permissions.

Example:

GRANT SELECT ON Employee TO User1;
REVOKE SELECT ON Employee FROM User1;

2.4 TCL (Transaction Control Language)

Used for transaction management.

  • COMMIT: Saves changes.
  • ROLLBACK: Reverts changes.

Example:

BEGIN;
UPDATE Employee SET Salary = 60000 WHERE ID = 1;
ROLLBACK;  -- Reverts the update
COMMIT;    -- Confirms the change

2.5 SQL Queries

  • SELECT: Retrieves data.
  • INSERT: Adds data.
  • DELETE: Removes data.
  • UPDATE: Modifies data.

Example:

SELECT * FROM Employee WHERE Salary > 50000;

2.6 Joins

  • INNER JOIN: Returns matching rows.
  • LEFT JOIN: Returns all rows from the left table, matching rows from the right.
  • RIGHT JOIN: Returns all rows from the right table, matching rows from the left.
  • FULL JOIN: Returns all rows from both tables.

Example:

SELECT Employee.ID, Employee.Name, Department.DeptName
FROM Employee
INNER JOIN Department ON Employee.DeptID = Department.DeptID;

2.7 Subqueries & Views

  • Subquery: A query inside another query.
  • View: A virtual table derived from a query.

Example:

CREATE VIEW HighSalary AS
SELECT Name, Salary FROM Employee WHERE Salary > 60000;

ER Model & Normalization

ER Model (Entity-Relationship Model)

1. Entities, Attributes, Relationships

  • Entity: A real-world object (e.g., Student, Course).
  • Attribute: A property of an entity (e.g., Student has Name, Roll No).
  • Relationship: Association between entities (e.g., Student enrolls in Course).

Example:

  • Entity: Student
  • Attributes: ID, Name, Age
  • Relationship: Student enrolls in Course

2. Cardinality & Participation

  • Cardinality defines the number of entities in a relationship (1:1, 1:M, M:N).
  • Participation can be total (all entities participate) or partial (some entities participate).

Example:

  • A student enrolls in multiple courses (1:M cardinality)
  • A professor must be assigned to at least one department (total participation)

3. ER to Relational Mapping

  1. Convert entities to tables.
  2. Convert attributes to columns.
  3. Convert relationships using foreign keys.
  4. Handle weak entities using primary keys from the strong entity.

Example:

Student (ID, Name) Course (Code, Title) Enrollment (Student_ID, Course_Code)

Normalization

1. Functional Dependencies

  • A dependency between attributes in a relation (e.g., _RollNo → Name means Roll_No determines Name).

2. Normal Forms

  • 1NF: No repeating groups; all attributes have atomic values.
  • 2NF: 1NF + No partial dependency.
  • 3NF: 2NF + No transitive dependency.
  • BCNF: 3NF + Every determinant is a candidate key.
  • 4NF: BCNF + No multi-valued dependencies.
  • 5NF: 4NF + No join dependencies.

Example (Normalization Steps):

Order_ID Item Supplier Quantity
101 A X 10
101 B Y 5
  • 1NF: Convert multi-valued attributes into separate rows.
  • 2NF: Ensure all non-key attributes depend on the whole key.
  • 3NF: Remove transitive dependencies.

Transactions & Recovery

Transaction Management

1. ACID Properties

  • Atomicity: All operations succeed or none do.
  • Consistency: Database remains valid.
  • Isolation: Transactions don’t interfere.
  • Durability: Committed changes persist.

2. States of a Transaction

  • Active → Partially Committed → Committed (Success)
  • Active → Failed → Aborted → Rolled Back

Concurrency Control

1. Problems in Concurrency

  • Lost Update: Two transactions overwrite each other.
  • Dirty Read: One transaction reads uncommitted data.
  • Non-Repeatable Read: A value changes after being read.

2. Locking Protocols

  • 2PL (Two-Phase Locking): Growing phase (acquires locks), shrinking phase (releases locks).
  • Time-Stamp Ordering: Transactions execute based on assigned timestamps.

3. Deadlocks & Prevention

  • Deadlock: Two transactions wait indefinitely for each other’s resources.
  • Prevention Techniques: Timeout, Wait-Die, Wound-Wait.

Recovery Techniques

1. Log-based Recovery

  • Write-Ahead Logging (WAL): Changes are recorded in logs before applying to the database.
  • Undo (Rollback) and Redo (Reapply) operations ensure consistency.

2. Checkpointing

  • A checkpoint saves the database state periodically to speed up recovery.

Example Question:

  • Define ACID properties with examples.
  • What is the difference between 3NF and BCNF?
  • Explain the Two-Phase Locking (2PL) protocol.
  • How does checkpointing help in database recovery?

    Storage & Indexing

File Organization & Indexing

Heap File Organization

  • Records are stored in any order without sorting.
  • Easy insertion but slow search.
  • Used when fast insertion is required.
  • Example: Unordered customer records in a database.

Sequential File Organization

  • Records are stored in sorted order.
  • Fast for range queries but slow insertion.
  • Used in applications that require frequent searching.
  • Example: Employee records sorted by ID.

Hashing

  • Uses a hash function to map keys to specific locations.
  • Fast retrieval but inefficient for range queries.
  • Used in key-value storage.
  • Example: Caching in databases.

B+ Trees & B-Trees

  • B-Trees: Balanced tree structure, used for indexing.
  • B+ Trees: Enhanced version of B-Trees with all keys stored in leaf nodes, making range queries faster.
  • Used in databases and file systems.
  • Example: File directory indexing.

Indexing Techniques

Clustered Indexing

  • Data is physically stored based on the index.
  • Faster access for range queries.
  • Example: Primary key-based indexing.

Non-Clustered Indexing

  • Index stores pointers to actual data locations.
  • Slower than clustered indexes but supports multiple indexes.
  • Example: Secondary indexes on email addresses.

NoSQL & Advanced Topics

Introduction to NoSQL Databases

Key-Value Stores

  • Uses key-value pairs for data storage.
  • Fast lookups but limited query capability.
  • Example: Redis, Amazon DynamoDB.

Document-Based Stores (MongoDB)

  • Stores data as JSON/BSON documents.
  • Flexible schema and suitable for semi-structured data.
  • Example: Storing user profiles.

Column-Oriented Stores

  • Stores data in columns rather than rows.
  • Efficient for analytical queries.
  • Example: Google Bigtable, Apache Cassandra.

Graph Databases

  • Stores data in nodes and edges.
  • Efficient for relationship-based queries.
  • Example: Social media connections in Neo4j.

Big Data & Distributed Databases

Distributed DBMS Concepts

  • Data is distributed across multiple nodes for scalability.
  • Ensures fault tolerance and high availability.
  • Example: Google Spanner.

CAP Theorem

  • Consistency: All nodes see the same data.
  • Availability: Every request gets a response.
  • Partition Tolerance: System functions even if network fails.
  • Trade-offs must be made; databases optimize two out of three.
  • Example: MongoDB prioritizes availability and partition tolerance.

Example Questions

  1. Explain the differences between heap, sequential, and hashing file organizations.
  2. What are the advantages of using B+ Trees over B-Trees?
  3. Compare clustered and non-clustered indexing with examples.
  4. How does a key-value store differ from a document-based store?
  5. What are the trade-offs involved in CAP theorem?
  6. Why are column-oriented stores preferred for analytical queries?
  7. How do graph databases efficiently store and retrieve relationship-based data?
  8. What is the role of distributed DBMS in handling big data?

Tags

Post a Comment

0Comments
Post a Comment (0)
Let's Chat
Let's Chat

Let's Make Magic Happen! ✨

Drop us a message or join our whatsapp group