Computer Architecture & Organization – Notes

1: Introduction to Computer Architecture

Basic Organization of a Computer System 🏗️

A computer system consists of four main components:

CPU (Central Processing Unit): The "brain" that executes instructions
Memory: Stores data and programs (RAM, ROM, Cache)
Input/Output Devices: Allow communication with the external world
System Bus: Connects all components for data transfer

Example: When you press a key (input), the CPU processes this information, retrieves data from memory, and displays the result on your screen (output).

Evolution of Computers 📈

First Generation (1940s-1950s): Vacuum tubes, bulky, high power consumption
Second Generation (1950s-1960s): Transistors, more reliable, less power
Third Generation (1960s-1970s): Integrated circuits
Fourth Generation (1970s-present): Microprocessors, VLSI technology
Fifth Generation (Present and future): AI, quantum computing, neural networks

Example: The ENIAC (1946) weighed 30 tons and used vacuum tubes, while today's smartphones are millions of times more powerful at a fraction of the size.

Von Neumann Architecture 🧠

Key principles:

Stored program concept (instructions and data in same memory)
Sequential instruction execution
Single data path between CPU and memory

Components:

CPU (with ALU and Control Unit)
Memory unit
Input/Output system
System bus

Example: Most modern computers follow this architecture, where programs and data are loaded from storage into RAM before execution.

Harvard vs. Von Neumann Architecture 🔄

Von Neumann:

Single memory for both instructions and data
One bus for memory access
Simpler design

Harvard:

Separate memories for instructions and data
Separate buses for each memory
Allows simultaneous access to instructions and data

Example: Most desktop computers use Von Neumann architecture, while many microcontrollers and DSPs (like Arduino) use Harvard architecture for better performance.

Performance Metrics 📊

Latency: Time to complete a single task
Throughput: Number of tasks completed per unit time
CPI (Cycles Per Instruction): Average clock cycles needed per instruction
MIPS (Million Instructions Per Second): Measure of CPU performance

Example: A CPU with a CPI of 2.0 and clock speed of 4 GHz would have: MIPS = (Clock rate) / (CPI × 10^6) = 4 × 10^9 / (2.0 × 10^6) = 2000 MIPS

2: Digital Logic and Data Representation

Number Systems 🔢

Binary (Base 2): Uses only 0 and 1 (e.g., 1010₂ = 10₁₀)
Octal (Base 8): Uses digits 0-7 (e.g., 12₈ = 10₁₀)
Hexadecimal (Base 16): Uses digits 0-9 and letters A-F (e.g., A₁₆ = 10₁₀)

Conversion Example: Decimal 25 to binary: 25₁₀ = 11001₂ Binary 1101 to decimal: 1101₂ = 1×2³ + 1×2² + 0×2¹ + 1×2⁰ = 8 + 4 + 0 + 1 = 13₁₀

Boolean Algebra and Logic Gates ⚙️

Basic Operations:

AND (·): Output is 1 only if all inputs are 1
OR (+): Output is 1 if at least one input is 1
NOT (¬): Inverts the input

Basic Logic Gates:

AND, OR, NOT, NAND, NOR, XOR, XNOR

Example: Boolean expression: Y = A·B + C If A=1, B=0, C=1, then Y = 1·0 + 1 = 0 + 1 = 1

Combinational Circuits 🔌

Multiplexers (MUX): Selects one of many input signals and forwards it to a single output
Demultiplexers (DEMUX): Takes a single input and directs it to one of many outputs
Encoders: Converts 2ⁿ input lines to n output lines
Decoders: Converts n input lines to 2ⁿ output lines

Example: A 4-to-1 multiplexer has 4 data inputs, 2 selection inputs, and 1 output. The selection inputs determine which data input is connected to the output.

Sequential Circuits ⏱️

Flip-Flops: Basic memory element that stores one bit of data
- Types: SR, JK, D, T
Registers: Group of flip-flops used to store multiple bits
Counters: Circuits that count pulses, sequentially going through states

Example: A 4-bit register using D flip-flops can store values from 0000 to 1111, representing numbers 0-15.

Data Representation 📊

Signed Numbers:

Sign-Magnitude: First bit indicates sign, rest represent magnitude
1's Complement: Invert all bits to negate
2's Complement: Invert all bits and add 1 to negate (most common)

Example: -5 in 8-bit 2's complement:

+5 = 00000101
Invert: 11111010
Add 1: 11111011 (this is -5)

Floating-Point Representation (IEEE 754):

Sign bit (1 bit)
Exponent (8 bits for single precision, 11 for double)
Mantissa/Fraction (23 bits for single precision, 52 for double)

Example: The decimal number 10.5 in IEEE 754 single precision:

Convert to binary: 1010.1
Normalize: 1.0101 × 2³
Sign bit = 0 (positive)
Exponent = 3 + 127 (bias) = 130 = 10000010
Mantissa = 0101 (rest are zeros)
Result: 0 10000010 01010000000000000000000

Practice Questions ✍️

What are the key differences between Harvard and Von Neumann architectures?
Convert the decimal number 43 to binary, octal, and hexadecimal.
Implement a full adder circuit using basic logic gates.
How is the number -25 represented in 8-bit 2's complement?
Calculate the CPI of a processor that executes 30% arithmetic, 40% data transfer, and 30% control instructions with CPIs of 1, 2, and 3 respectively.

3: Processor Organization and Instruction Set Architecture

CPU Organization 🧩

Main Components:

ALU (Arithmetic Logic Unit): Performs arithmetic and logical operations
Control Unit: Fetches, decodes, and executes instructions
Registers: Small, high-speed storage locations within the CPU
Buses: Communication pathways between components
- Data Bus: Transfers actual data
- Address Bus: Transfers memory addresses
- Control Bus: Carries control signals

Common CPU Registers:

Program Counter (PC): Points to the next instruction
Instruction Register (IR): Holds the current instruction
Memory Address Register (MAR): Holds memory addresses
Memory Data Register (MDR): Holds data being transferred to/from memory
Accumulator (ACC): Stores results of operations
General Purpose Registers: Used for various operations

Example: During instruction execution, the PC points to the instruction in memory, the Control Unit fetches it into the IR, decodes it, and directs the ALU to perform the required operation using data from registers.

Instruction Format and Types 📝

Common Instruction Formats:

Single-address: Operation code + one address
Two-address: Operation code + two addresses
Three-address: Operation code + three addresses
Zero-address: Just the operation code (stack-based)

Instruction Types:

Data Transfer: Move data between locations (e.g., MOV, LOAD, STORE)
Arithmetic/Logical: Perform calculations (e.g., ADD, SUB, AND, OR)
Control Flow: Change program execution (e.g., JMP, CALL, RET)
I/O Instructions: Interact with external devices

Example:

ADD R1, R2, R3  # Three-address format: R1 = R2 + R3
ADD R1, R2      # Two-address format: R1 = R1 + R2

Addressing Modes 🎯

Ways to specify the location of an operand:

Immediate: Operand is in the instruction (e.g., ADD R1, #5)
Register: Operand is in a register (e.g., ADD R1, R2)
Direct: Instruction contains the memory address (e.g., ADD R1, [100])
Indirect: Instruction contains address where the actual address is stored (e.g., ADD R1, [R2])
Indexed: Address = Base address + Index (e.g., ADD R1, [100+R2])
Base-relative: Address = Base register + Offset (e.g., ADD R1, [R2+10])

Example: If R2 contains value 500, in indexed addressing mode, the instruction LOAD R1, [100+R2] would load data from memory address 600 into register R1.

RISC vs. CISC Architectures 🔄

RISC (Reduced Instruction Set Computer):

Simple instructions of uniform length
Fixed-format instructions
More registers, fewer memory accesses
Hardwired control unit
Emphasis on optimization by compiler
Examples: ARM, MIPS, RISC-V

CISC (Complex Instruction Set Computer):

Complex instructions of variable length
Many addressing modes
Fewer registers, more memory accesses
Microprogrammed control unit
Hardware handles complexity
Examples: x86, x86-64

Example: A CISC processor might have a single instruction MULT MEM1, MEM2 that multiplies values from two memory locations, while a RISC processor would require multiple simpler instructions to load values into registers, multiply them, and store the result.

Instruction Pipelining and Hazards ⚠️

Pipelining: Overlapping execution of multiple instructions, typically in stages:

Fetch (IF): Get instruction from memory
Decode (ID): Interpret the instruction
Execute (EX): Perform the operation
Memory Access (MEM): Read/write memory if needed
Write Back (WB): Store results in registers

Pipeline Hazards:

Data Hazards: Instruction depends on result of previous instruction not yet completed
- RAW (Read After Write): Most common, waiting for a result to be produced
- WAR (Write After Read): Register being written to before previous read completes
- WAW (Write After Write): Register written to twice in sequence
Structural Hazards: Hardware resource conflict (e.g., two stages need the same resource)
Control Hazards: Pipeline disruption due to branch instructions

Solutions:

Forwarding/Bypassing: Pass results directly between pipeline stages
Pipeline Stalls/Bubbles: Delay instruction execution
Branch Prediction: Guess outcome of branch instructions

Example:

ADD R1, R2, R3  # R1 = R2 + R3
SUB R4, R1, R5  # R4 = R1 - R5

The SUB instruction has a RAW hazard since it needs the result of ADD in R1. Forwarding can pass the result directly to the SUB instruction without waiting for the full write-back stage.

4: Memory Organization

Memory Hierarchy 📊

Arranged from fastest/smallest/most expensive to slowest/largest/least expensive:

Registers: Inside CPU, fastest access (< 1 ns)
Cache: Small, fast SRAM (L1: 1-2 ns, L2: 3-10 ns, L3: 10-20 ns)
Main Memory (RAM): Moderate speed and size (50-100 ns)
Secondary Storage: Hard drives, SSDs (ms to μs)
Tertiary Storage: Tapes, optical media (seconds)

Principle: Keep frequently accessed data in faster memory.

Example: When a program runs, frequently used variables are kept in registers, commonly accessed code segments in cache, and the full program in main memory, with less frequently used data in secondary storage.

Cache Memory 🚀

Purpose: Bridge the speed gap between CPU and main memory.

Cache Mapping Techniques:

Direct Mapping: Each main memory block maps to exactly one cache location
- Simple but prone to conflicts
Fully Associative: Any memory block can go in any cache location
- Flexible but complex search
Set Associative: Compromise between direct and fully associative
- Memory blocks map to specific sets, but can go anywhere within that set

Cache Replacement Policies (when cache is full):

Least Recently Used (LRU): Remove the least recently accessed block
First-In, First-Out (FIFO): Remove the oldest block
Least Frequently Used (LFU): Remove the least frequently accessed block
Random: Remove a random block

Cache Performance Metrics:

Hit Ratio: Percentage of memory accesses found in cache
Miss Ratio: Percentage of memory accesses not found in cache

Example: In a 2-way set associative cache with 4 sets (8 blocks total), memory address 0x12345678 might map to set 2 (based on some bits of the address), and can occupy either of the two blocks in that set.

Virtual Memory 🧠

Purpose: Allow programs to use more memory than physically available by using disk space as an extension of RAM.

Key Concepts:

Pages: Fixed-size blocks of virtual memory
Frames: Corresponding blocks in physical memory
Page Table: Maps virtual pages to physical frames
Page Fault: Occurs when accessed page is not in physical memory

Virtual Memory Techniques:

Paging: Memory divided into fixed-size pages
- Simple implementation
- May have internal fragmentation
Segmentation: Memory divided based on logical segments (code, data, stack)
- More natural program organization
- May have external fragmentation
Paged Segmentation: Combines both approaches

Example: A program using 8 GB of virtual memory might only have 2 GB of its pages in physical RAM at any time, with the rest stored on disk and swapped in as needed.

RAM & ROM Types 💾

RAM (Random Access Memory) - Volatile:

SRAM (Static RAM):
- Faster, uses flip-flops
- More expensive, larger size
- Used for cache memory
DRAM (Dynamic RAM):
- Slower, uses capacitors that need refreshing
- Less expensive, smaller size
- Used for main memory

ROM (Read-Only Memory) - Non-volatile:

Mask ROM: Programmed during manufacturing
PROM (Programmable ROM): Can be programmed once
EPROM (Erasable PROM): Erasable with UV light
EEPROM (Electrically EPROM): Electrically erasable
Flash Memory: EEPROM variant, allows block erasure
- Used in SSDs, USB drives, memory cards

Example: A computer might use SRAM for L1/L2 cache, DRAM for main memory, and Flash memory for storage (SSD).

Practice Questions ✍️

Compare RISC and CISC architectures in terms of instruction complexity, register usage, and memory access patterns.
Explain how a data hazard occurs in instruction pipelining and describe two methods to resolve it.
For a direct-mapped cache with 64 blocks and a block size of 16 bytes, where would the byte at memory address 1156 be mapped?
Describe the process of translating a virtual address to a physical address in a paging system.
Why is SRAM used for cache memory while DRAM is used for main memory?

5: Input-Output Organization

I/O Interfaces 🔌

Key Components:

I/O Ports: Connection points for external devices
I/O Buses: Data pathways between CPU and peripherals
- PCI, PCIe, USB, SATA, etc.
I/O Controllers: Specialized circuits that manage specific devices
- Handle timing, data formatting, buffering, error detection

Communication Methods:

Programmed I/O: CPU controls transfer, checking device status repeatedly
Interrupt-Driven I/O: Device signals CPU when ready
Direct Memory Access (DMA): Device transfers data without CPU involvement

Example: When you connect a USB flash drive, the USB controller detects it, signals the CPU via an interrupt, and then the operating system loads the appropriate driver to communicate with the device.

Memory-Mapped vs. I/O-Mapped I/O 🗺️

Memory-Mapped I/O:

I/O devices share address space with memory
Same instructions for memory and I/O operations
Flexible addressing capabilities
Examples: ARM, MIPS

I/O-Mapped I/O (Isolated I/O):

Separate address spaces for memory and I/O
Special instructions for I/O operations (e.g., IN, OUT)
Limited addressing range for I/O
Examples: x86 architecture

Example:

// Memory-mapped I/O
MOV R1, [0xFFFF0004]  // Read from I/O device at address 0xFFFF0004

// I/O-mapped I/O
IN R1, 0x60           // Read from I/O port 0x60 (keyboard controller in x86)

Interrupts and DMA ⚡

Interrupts:

Signals that cause CPU to pause current execution and handle an event
Types:
- Hardware Interrupts: From external devices (keyboard, timer, etc.)
- Software Interrupts: Generated by programs (syscalls, exceptions)
- Maskable: Can be disabled
- Non-maskable: Critical, cannot be disabled

Interrupt Handling Process:

Device asserts interrupt signal
CPU finishes current instruction
CPU saves context (PC, flags, registers)
CPU jumps to interrupt service routine (ISR)
ISR handles the interrupt
CPU restores context and resumes normal execution

Direct Memory Access (DMA):

Allows devices to transfer data directly to/from memory without CPU involvement
CPU initiates transfer, then continues other tasks
DMA controller manages the transfer
Significantly reduces CPU overhead for large data transfers

Example: When printing a large document, instead of the CPU transferring each byte to the printer, it configures the DMA controller to transfer the entire document while the CPU handles other tasks.

Peripheral Devices 🖨️

Keyboards:

Matrix of switches scanned by controller
Generates scan codes for key press/release
Controller sends data serially to computer

Display Units:

CRT (Cathode Ray Tube): Electron beam scans phosphor screen
LCD (Liquid Crystal Display): Liquid crystals alter polarization of light
LED (Light Emitting Diode): Arrays of light-emitting diodes
OLED (Organic LED): Self-illuminating pixels, no backlight needed

Printers:

Impact: Physical contact with paper (dot matrix, daisy wheel)
Non-impact: No physical contact (laser, inkjet, thermal)
Connected via USB, network, or wireless interfaces

Example: A laser printer receives data via USB, stores it in buffer memory, then uses a laser to create a pattern on a photosensitive drum, which attracts toner that is transferred to paper and fused with heat.

6: Parallel and Advanced Architectures

Flynn's Classification 🧩

Based on instruction and data streams:

SISD (Single Instruction, Single Data):
- Traditional uniprocessor
- One instruction stream, one data stream
- Example: Classic single-core CPU
SIMD (Single Instruction, Multiple Data):
- Same operation performed on multiple data points simultaneously
- Vector processors, GPU operations
- Examples: SSE/AVX instructions, NVIDIA CUDA
MISD (Multiple Instruction, Single Data):
- Different operations on the same data
- Rare in practice
- Example: Some fault-tolerant systems
MIMD (Multiple Instruction, Multiple Data):
- Multiple processors executing different instructions on different data
- Most common parallel architecture
- Examples: Multicore CPUs, computer clusters

Example: In SIMD processing, a single instruction like "ADD" might be applied to 8 pairs of numbers simultaneously, making operations like video processing much faster.

Multicore Processors 🔄

Architecture:

Multiple CPU cores on a single chip
Shared or separate caches
Interconnection network between cores

Key Features:

Shared Memory: All cores access the same physical memory
Cache Coherency: Mechanisms to keep multiple caches consistent
Thread-Level Parallelism: Multiple threads execute simultaneously

Challenges:

Synchronization between cores
Load balancing
Memory contention
Power and heat management

Example: A quad-core processor can run four threads simultaneously. If an image processing application is designed to use multiple threads, it can process different parts of an image on different cores, completing the task much faster than a single-core system.

GPU Architecture Basics 🎮

Graphics Processing Units (GPUs):

Specialized processors optimized for parallel operations
Thousands of simple cores vs. few complex cores in CPUs
Optimized for floating-point calculations and matrix operations

Architecture Components:

Streaming Multiprocessors (SMs): Groups of cores
CUDA Cores/Stream Processors: Individual processing units
Memory Hierarchy: Global memory, shared memory, registers
Scheduler: Manages work distribution

Applications:

Graphics rendering
Scientific computing
Machine learning/AI
Cryptocurrency mining

Example: Training a neural network on a GPU can be 10-100x faster than on a CPU because matrix multiplications can be highly parallelized across thousands of GPU cores.

Introduction to Quantum Computing ⚛️

Key Concepts:

Qubits: Quantum bits, can exist in superposition of states (both 0 and 1 simultaneously)
Superposition: Ability to represent multiple states at once
Entanglement: Qubits linked regardless of distance
Quantum Gates: Operations on qubits

Potential Advantages:

Exponentially faster for certain problems
Efficient factoring of large numbers (Shor's algorithm)
Quantum search (Grover's algorithm)
Quantum simulation

Challenges:

Decoherence and error correction
Scaling up number of qubits
Programming quantum algorithms

Example: While a classical computer would need to try each possible solution to break a 256-bit encryption key (2^256 operations), a quantum computer using Shor's algorithm could potentially do it with far fewer operations, threatening current cryptographic systems.

7: Advanced Topics in Computer Organization

Pipelining and Superscalar Architectures 🚀

Pipelining (Review):

Overlapping instruction execution stages
Increases throughput but not individual instruction latency

Superscalar Architecture:

Multiple pipelines operating in parallel
Can issue and execute multiple instructions per cycle
Dynamic scheduling of instructions
Examples: Modern x86 and ARM processors

Challenges:

Instruction dependencies
Limited instruction-level parallelism in code
Complex scheduling logic

Example: A 4-way superscalar processor can potentially execute 4 instructions per clock cycle if they are independent, achieving up to 4 times the performance of a scalar processor at the same clock speed.

Vector Processors and Array Processing 📊

Vector Processors:

Optimized for operations on data arrays (vectors)
Single instruction operates on multiple data elements
Deep pipelines for vector operations
Examples: NEC SX series, historical Cray supercomputers

Array Processors:

Multiple processing elements arranged in an array
Each processor executes the same instruction on different data
Good for problems with regular data patterns
Examples: Systolic arrays, some AI accelerators

Applications:

Scientific simulations
Weather forecasting
Computational fluid dynamics
Signal processing

Example: A vector processor can add two 64-element vectors using a single vector addition instruction, whereas a scalar processor would need 64 separate addition instructions.

Performance Enhancement Techniques 📈

Branch Prediction:

Predicts direction of conditional branches
Static: Based on instruction syntax
Dynamic: Based on runtime behavior
Branch Target Buffer (BTB): Caches branch destinations

Out-of-Order Execution:

Executes instructions based on data availability rather than program order
Uses techniques like:
- Register Renaming: Eliminates false dependencies
- Reservation Stations: Hold instructions waiting for operands
- Reorder Buffer: Ensures results are committed in program order

Speculative Execution:

Executes instructions before knowing if they are needed
Results are committed only if speculation was correct
Discarded if incorrect (e.g., branch misprediction)

Example: When encountering a branch instruction, a processor might predict it will be taken based on past behavior, then speculatively execute instructions from the predicted path. If the prediction is correct, execution continues without delay; if wrong, the speculative work is discarded and execution restarts from the correct path.

Cloud Computing and Server Architectures ☁️

Server Architecture Components:

High-performance CPUs: Multiple sockets, many cores
Large memory capacity: Hundreds of GB to TB of RAM
High-bandwidth networking: 10/40/100 Gbps connections
Redundant power and cooling: For reliability

Cloud Computing Models:

Infrastructure as a Service (IaaS): Virtual machines, storage
Platform as a Service (PaaS): Development and deployment environments
Software as a Service (SaaS): Applications delivered over the internet

Virtualization:

Hypervisors: Software that creates and manages virtual machines
Containers: Lightweight virtualization without full OS
Resource management: CPU, memory, I/O allocation

Data Center Architecture:

Racks of servers connected by high-speed networks
Storage area networks (SANs)
Load balancers and firewalls
Power distribution and cooling systems

Example: A cloud provider like AWS might use thousands of multi-socket servers, each with 64+ CPU cores and 512GB+ RAM, arranged in racks and connected by high-speed networks. These physical resources are then partitioned using virtualization to provide virtual machines of various sizes to customers.

Practice Questions ✍️

Compare and contrast memory-mapped I/O and I/O-mapped I/O approaches, including their advantages and use cases.
Explain how DMA improves system performance compared to programmed I/O, and describe a scenario where it would be particularly beneficial.
Based on Flynn's taxonomy, classify the following: (a) a standard laptop CPU, (b) a GPU running a shader program across multiple pixels, (c) a dual-processor server.
How does branch prediction contribute to CPU performance, and what happens during a branch misprediction in a modern pipelined processor?
Describe three key differences between traditional server architecture and cloud computing infrastructure.

Computer Architecture & Organization – Notes - Free Download

1: Introduction to Computer Architecture

Basic Organization of a Computer System 🏗️

Evolution of Computers 📈

Von Neumann Architecture 🧠

Harvard vs. Von Neumann Architecture 🔄

Performance Metrics 📊

2: Digital Logic and Data Representation

Number Systems 🔢

Boolean Algebra and Logic Gates ⚙️

Combinational Circuits 🔌

Sequential Circuits ⏱️

Data Representation 📊

Practice Questions ✍️

3: Processor Organization and Instruction Set Architecture

CPU Organization 🧩

Instruction Format and Types 📝

Addressing Modes 🎯

RISC vs. CISC Architectures 🔄

Instruction Pipelining and Hazards ⚠️

4: Memory Organization

Memory Hierarchy 📊

Cache Memory 🚀

Virtual Memory 🧠

RAM & ROM Types 💾

Practice Questions ✍️

5: Input-Output Organization

I/O Interfaces 🔌

Memory-Mapped vs. I/O-Mapped I/O 🗺️

Interrupts and DMA ⚡

Peripheral Devices 🖨️

6: Parallel and Advanced Architectures

Flynn's Classification 🧩

Multicore Processors 🔄

GPU Architecture Basics 🎮

Introduction to Quantum Computing ⚛️

7: Advanced Topics in Computer Organization

Pipelining and Superscalar Architectures 🚀

Vector Processors and Array Processing 📊

Performance Enhancement Techniques 📈

Cloud Computing and Server Architectures ☁️

Practice Questions ✍️

You may like these posts

Post a Comment

Let's Make Magic Happen! ✨

Contact form