Unit 5
Unit 5
In operating systems, input/output (I/O) buffers are critical for managing the flow of data between
the CPU and I/O devices like keyboards, printers, or storage.
An I/O buffer is a temporary storage area that holds data while it is being transferred between the
CPU and peripheral devices.
It helps to match the speed differences between fast CPUs and slower I/O devices by allowing devices
to work independently of the CPU.
Types:
Input Buffer: Holds data from input devices before processing.
Output Buffer: Holds data to be sent to output devices.
The OS collects data from devices into the buffer and processes it in batches to minimize delays and
reduce CPU idle time.
Double Buffering: Uses two buffers to handle data transfer more efficiently, allowing one buffer to
be filled while the other is processed.
Circular Buffers: A common method to implement buffering where a fixed-size buffer operates in a
circular fashion, useful for streaming data.
Hardware vs. Software Buffers: Buffers can be implemented in hardware (in devices like network
cards) or software (managed by the OS).
Blocking vs. Non-Blocking I/O: In blocking I/O, processes wait for the buffer to fill or empty, while
in non-blocking I/O, processes continue execution without waiting for the buffer.
Spooling: In some cases (e.g., printing), the buffer can store data on disk, a process called spooling,
to manage I/O more efficiently.
Importance for Performance: Proper buffering helps improve system performance by reducing the
frequency of context switches, increasing data throughput, and optimizing resource utilization.
Disk Scheduling refers to the algorithms used by operating systems to manage the order in
which disk I/O requests are serviced. Efficient disk scheduling is crucial for optimizing system
performance, reducing seek time, and improving the overall speed of accessing data from the disk
1. Seek Time: The time it takes for the disk’s read/write head to move to the correct
track on the disk where data is stored.
2. Rotational Latency: The delay waiting for the disk platter to rotate to the correct
position under the read/write head.
3. Throughput: The number of I/O requests that are completed in a given period of
time.
Common Disk Scheduling Algorithms:
4. First Come, First Served (FCFS):
• Description: The disk processes requests in the order they arrive, without
considering the location of the requests.
• Advantages: Simple and fair.
• Disadvantages: Inefficient as it can lead to long seek times (disk arm might
need to move across the entire disk multiple times).
5. Shortest Seek Time First (SSTF):
• Description: The disk services the request closest to the current position of
the read/write head.
• Advantages: Minimizes seek time and improves performance.
• Disadvantages: Can cause starvation of requests far from the current head
position if closer requests keep arriving.
6. SCAN (Elevator Algorithm):
• Description: The disk head moves in one direction servicing requests until it
reaches the end, then reverses direction (like an elevator).
• Advantages: Reduces seek time by covering all pending requests in one
direction before reversing.
• Disadvantages: Can be biased toward requests at the middle tracks, as it
changes direction at the ends.
7. C-SCAN (Circular SCAN):
• Description: Similar to SCAN, but the disk head always moves in one
direction. Once it reaches the end of the disk, it returns to the beginning
without servicing any requests on the return trip.
• Advantages: Provides more uniform wait times, preventing starvation at the
extreme ends of the disk.
• Disadvantages: Involves longer seek time on the return trip to the start of the
disk.
Disk Cache is a type of cache memory used to speed up the process of reading from and writing
to a disk. It temporarily stores frequently accessed or recently accessed data from the disk to allow
faster access by the system without needing to interact with the slower disk directly.
Purpose: The disk cache aims to reduce the amount of time the CPU spends waiting for data to be
read from or written to the disk, which is slower compared to memory (RAM). It improves system
performance by minimizing disk access time.
Location:
• Hardware Disk Cache: Built into hard drives or SSDs (Solid-State Drives), using a
small amount of high-speed memory (typically DRAM).
• Software Disk Cache: Managed by the operating system and stored in the system’s
main memory (RAM). It works in conjunction with the hardware cache.
Operation:
• When data is requested, the OS first checks the disk cache.
• If the data is present in the cache (a cache hit), it is retrieved quickly.
• If the data is not present (a cache miss), it is fetched from the disk and stored in the
cache for future access.
Write-Back Cache:
• In a write-back cache, when data is written, it is first updated in the cache, and the
actual write to the disk happens later. This method is faster but can risk data loss if the
system crashes before the data is written to the disk.
Write-Through Cache:
• In a write-through cache, data is written to both the cache and the disk
simultaneously. This method is slower but ensures data consistency and minimizes
the risk of data loss.
Read-Ahead:
• Some disk caches use a technique called read-ahead where data blocks that are likely
to be accessed next (sequential blocks) are pre-loaded into the cache even before they
are requested. This anticipates future requests and improves performance in
sequential read operations.
RAID is a data storage technology that combines multiple physical disk drives into one or more
logical units for improved performance, redundancy, or both. RAID is widely used in data centers,
servers, and high-performance computing systems to provide fault tolerance, improve data access
speeds, and protect against data loss in case of a drive failure.
RAID Levels
RAID can be configured in different levels, each offering a unique balance of performance, fault
tolerance, and storage capacity. The most common RAID levels are:
RAID 0, also known as striping, is a disk array configuration that enhances storage performance by
distributing data across multiple drives. Here’s a comprehensive look at RAID 0, including its
configuration, advantages, disadvantages, and ideal use cases.
1. How RAID 0 Works
• Data Distribution: In RAID 0, data is split into blocks (or stripes) and evenly
distributed across two or more drives. Each drive stores a portion of the data, which
allows for simultaneous read and write operations.
• Example: If you have two drives in a RAID 0 array and you save a file, part of the
file is written to Drive 1 and the other part to Drive 2. This parallel operation can
significantly boost performance.
2. Advantages of RAID 0
• High Performance:
o Increased data transfer rates due to simultaneous access to multiple drives.
o Ideal for applications requiring high speed, such as video editing, gaming,
and graphic design.
• Full Utilization of Storage:
o All the capacity of the combined drives is available for data storage, as there
is no overhead for redundancy.
3. Disadvantages of RAID 0
• No Redundancy:
o If one drive fails, all data in the array is lost. There is no fault tolerance in
RAID 0, making it a risky option for critical data storage.
• Higher Risk of Data Loss:
o The more drives you add to the RAID 0 array, the higher the probability of a
drive failure, as the risk is multiplied across the number of drives.
RAID 1, also known as mirroring, is a disk array configuration that provides data redundancy by
duplicating (or mirroring) the same data across two or more drives.
1. How RAID 1 Works
• Data Duplication: In RAID 1, data is written identically to two or more drives. Each
drive contains an exact copy of the data, so if one drive fails, the system can continue
to operate using the other drive(s) without data loss.
• Example: If you save a file, the same file is stored on both Drive 1 and Drive 2
simultaneously.
2. Advantages of RAID 1
• Data Redundancy:
o If one drive fails, the data remains intact on the other drive(s). This provides a
high level of data protection and reliability.
• Simple Recovery:
o In the event of a drive failure, recovery is straightforward. The failed drive
can be replaced, and data can be rebuilt from the surviving drives.
• Improved Read Performance:
o Read operations can be faster since data can be read from multiple drives
simultaneously. This can enhance performance for applications that require
high data throughput.
3. Disadvantages of RAID 1
• Storage Capacity:
o Only 50% of the total drive capacity is usable. For example, if you have two
1TB drives in a RAID 1 configuration, you will have only 1TB of usable
space.
• Cost:
o Requires twice the number of drives for the same amount of usable storage,
which can be more expensive compared to non-redundant configurations.
RAID 2 is a less common RAID configuration that uses bit-level striping across multiple drives along
with Hamming code for error correction. While RAID 2 is conceptually interesting, it is not widely
implemented in modern storage systems due to its complexity and inefficiency.
1. How RAID 2 Works
• Bit-Level Striping: Data is split into individual bits and striped across multiple
drives. Each drive stores one bit of the data.
• Error Correction with Hamming Code: RAID 2 uses Hamming code to provide
error detection and correction. This involves the use of additional drives to store
parity bits, which are calculated based on the bits stored on the data drives.
For example, if you have a file consisting of 8 bits, RAID 2 would require at least 9
drives: 8 for storing the bits and 1 for the Hamming code.
2. Advantages of RAID 2
• Error Correction:
o The use of Hamming code allows for error detection and correction, which
can help maintain data integrity.
• Data Recovery:
o In the event of a single drive failure, the lost data can be reconstructed using
the remaining bits and parity information.
3. Disadvantages of RAID 2
• High Complexity:
o The architecture is complex due to bit-level striping and the need for
Hamming code, making it challenging to implement and manage.
• Inefficiency:
o RAID 2 requires a large number of drives for small amounts of data, leading
to inefficient use of storage space. For example, for every byte of data,
multiple drives are used.
• Obsolete:
o Due to its complexity and inefficiency, RAID 2 is rarely used today. Other
RAID levels, like RAID 5 and RAID 6, provide similar error correction
capabilities with better performance and efficiency.
RAID 3 is a disk array configuration that utilizes byte-level striping across multiple drives combined
with a dedicated parity drive for error detection and correction. While RAID 3 was once a popular
choice for certain applications, it has become less common with the advent of more efficient RAID
levels.
1. How RAID 3 Works
• Byte-Level Striping: In RAID 3, data is divided into bytes and striped across
multiple drives. Each drive holds a byte of the data, allowing for parallel access to the
data blocks.
• Dedicated Parity Drive: A single drive is used to store parity information, which is
calculated based on the data stored on the other drives. This parity information
enables data recovery in the event of a drive failure.
For example, if you have three drives, data blocks might be distributed as follows:
o Drive 1: Byte 1
o Drive 2: Byte 2
o Drive 3: Byte 3 (parity for the above data)
2. Advantages of RAID 3
• Good Read Performance:
o Reading data can be fast because multiple drives can be accessed
simultaneously, especially for sequential data access.
• Data Recovery:
o If one drive fails, the data can be reconstructed using the parity information
stored on the dedicated parity drive.
3. Disadvantages of RAID 3
• Single Parity Bottleneck:
o The dedicated parity drive can become a performance bottleneck during write
operations because every write operation requires updating the parity
information on that single drive.
• Not Fault-Tolerant to Multiple Drive Failures:
o RAID 3 can only tolerate the failure of one drive. If the parity drive fails at
the same time as another drive, data will be lost.
• Complexity:
o The implementation and management of RAID 3 can be more complex than
simpler RAID configurations.
RAID 4 is a disk array configuration that utilizes block-level striping across multiple drives combined
with a dedicated parity drive for error correction. While it offers certain advantages, RAID 4 has
limitations that have led to its decreased usage in modern systems.
1. How RAID 4 Works
• Block-Level Striping: In RAID 4, data is divided into blocks and striped across
multiple drives. Each block of data is written to different drives, allowing for parallel
data access.
• Dedicated Parity Drive: A single drive is used to store parity information. The parity
data is calculated based on the blocks of data stored on the other drives. This parity
enables data recovery in the event of a drive failure.
For example, if you have four drives, data blocks might be distributed as follows:
o Drive 1: Block 1
o Drive 2: Block 2
o Drive 3: Block 3
o Drive 4: Parity for Blocks 1, 2, and 3
2. Advantages of RAID 4
• Data Redundancy:
o Provides redundancy; if one drive fails, data can be reconstructed using the
parity information stored on the dedicated parity drive.
• Good Read Performance:
o Like RAID 3, RAID 4 can achieve good read performance since data can be
read from multiple drives simultaneously.
3. Disadvantages of RAID 4
• Single Parity Bottleneck:
o The dedicated parity drive can become a performance bottleneck during write
operations, as every write operation requires updating the parity information
on that single drive.
• Not Fault-Tolerant to Multiple Drive Failures:
o RAID 4 can only tolerate the failure of one drive. If the parity drive fails at
the same time as another drive, data will be lost.
• Lower Write Performance:
o Write operations can be slower than RAID 0 or RAID 1 due to the overhead
of updating the parity drive after every write.
RAID 5 is a widely used disk array configuration that combines block-level striping with distributed
parity. It provides a good balance between performance, redundancy, and storage efficiency, making it
one of the most popular RAID levels for both enterprise and personal use.
1. How RAID 5 Works
• Block-Level Striping: In RAID 5, data is divided into blocks and striped across
multiple drives. Each block is written to a different drive, allowing for parallel data
access.
• Distributed Parity: Unlike RAID 4, which uses a dedicated parity drive, RAID 5
distributes parity information across all drives in the array. This means that each drive
contains both data and parity information, providing fault tolerance while maximizing
performance.
For example, with three data drives (D1, D2, D3) and parity (P), the data might be
stored as follows:
o Block 1: D1
o Block 2: D2
o Block 3: P (parity for blocks 1 and 2)
o Block 4: D2
o Block 5: D3
o Block 6: P (parity for blocks 4 and 5)
This distribution of parity allows for better performance during write
operations.
2. Advantages of RAID 5
• Data Redundancy:
o Provides fault tolerance, allowing for the failure of one drive without data
loss. The lost data can be reconstructed using the parity information stored on
the other drives.
• Improved Write Performance:
o The distributed parity reduces the bottleneck that a dedicated parity drive
would create, making write operations more efficient compared to RAID 4.
• Efficient Use of Storage:
o Only one drive’s worth of storage is used for parity, which means you get
better usable storage capacity compared to RAID 1 and RAID 4.
3. Disadvantages of RAID 5
• Write Performance Overhead:
o While write performance is improved over RAID 4, there is still some
overhead due to the need to calculate and write parity information.
• Rebuild Time:
o In the event of a drive failure, rebuilding the array can take a considerable
amount of time, during which the performance may be degraded and the
array may be vulnerable to a second drive failure.
• Not Fault-Tolerant to Multiple Drive Failures:
o RAID 5 can tolerate the loss of only one drive. If a second drive fails during a
rebuild, data will be lost.
RAID 6 is an advanced disk array configuration that builds on the principles of RAID 5 by adding an
extra layer of data protection through double distributed parity. This makes RAID 6 a popular choice
for environments that require high availability and data redundancy.
1. How RAID 6 Works
• Block-Level Striping: Like RAID 5, RAID 6 divides data into blocks and stripes it
across multiple drives. Each block is stored on a different drive, allowing for parallel
access.
• Double Distributed Parity: RAID 6 stores two sets of parity information across all
the drives. This means that it can tolerate the failure of up to two drives without data
loss, making it more fault-tolerant than RAID 5.
For example, with four data drives (D1, D2, D3, D4) and two parity blocks (P1, P2),
the data might be distributed as follows:
o Block 1: D1
o Block 2: D2
o Block 3: P1 (parity for blocks 1 and 2)
o Block 4: D3
o Block 5: D4
o Block 6: P2 (parity for blocks 4 and 5)
2. Advantages of RAID 6
• Enhanced Fault Tolerance:
o RAID 6 can withstand the failure of up to two drives, providing greater data
protection compared to RAID 5. This is particularly important in larger arrays
where the likelihood of multiple drive failures is higher.
• Data Integrity:
o The double parity provides an additional layer of error detection and
correction, ensuring data integrity over time.
3. Disadvantages of RAID 6
• Write Performance Overhead:
o The need to calculate and write two sets of parity data means that write
performance is slower than RAID 5, as there is more overhead involved.
• Storage Efficiency:
o Similar to RAID 5, RAID 6 uses the equivalent of two drives’ worth of
storage for parity, reducing the overall usable capacity. Therefore, the more
drives you have, the better the efficiency becomes.
• Rebuild Time:
o Rebuilding a RAID 6 array after a drive failure can be time-consuming, and
during this period, the array remains vulnerable to additional drive failures.
7 objectives of file management systems.
Data storage and operations:
The system should support efficient storage and access to data, whether it's files, databases, or
structured data. This includes basic operations such as reading, writing, deleting, and modifying data,
while ensuring scalability and efficiency.
Data validation:
Ensuring that data is accurate and valid is critical. Techniques like checksums, data integrity checks,
file locking, and validation algorithms are necessary to prevent corruption and inconsistencies,
especially when handling multiple sources or users.
Performance optimization:
The system should maximize throughput (how much data the system can handle over time) and
optimize response time for users. This could be achieved by using caching, buffering, disk scheduling
algorithms, and efficient I/O operations to ensure the system handles high traffic smoothly.
Support for diverse storage devices:
The system must be able to interface with different types of storage, such as SSDs, HDDs, networked
storage (NAS), and cloud-based storage. Drivers and APIs should be in place to abstract the details of
specific devices, allowing for uniform access to data regardless of the hardware.
Minimizing data loss:
To prevent data loss, the system could implement redundancy mechanisms (like RAID
configurations), regular backups, failover systems, and robust error handling to manage both hardware
and software failures.
Standardized I/O interface routines:
The system should offer a standardized set of APIs or system calls to user applications for performing
I/O operations. This abstraction allows developers to interact with the storage system without
worrying about the underlying hardware complexities.
Support for multiple users:
In multi-user environments, the system should manage concurrent access to shared resources
efficiently. File locking, user permissions, and isolation mechanisms like virtual machines or
containers help ensure that multiple users can work without conflict or resource contention.
Depending on the system's purpose, you may need to add extra layers of defense:
Antivirus software: Protects against malware and viruses.
Host-based firewalls: Filters incoming and outgoing traffic on a per-host basis, such as iptables
(Linux) or Windows Defender Firewall (Windows).
Intrusion Detection Systems (IDS): Monitors the system for suspicious activities or security policy
violations. Host-based IDS like OSSEC or AIDE can detect and alert administrators to potential
threats.
Authentication
The process of verifying an identity claimed by or for a system entity. An
authentication process consists of two steps:
1. Identification step: Presenting an identifier to the security system (Identifiers
should be assigned carefully, because authenticated identities are the basis for
other security services, such as access control service.)
2. Verification step: Presenting or generating authentication information that corroborates the binding
between the entity and the identifier
There are four general means of authenticating a user’s identity, which can be
used alone or in combination:
1. Something the individual knows: Examples include a password, a personal identification number
(PIN), or answers to a prearranged set of questions.
2. Something the individual possesses: Examples include electronic keycards,
smart cards, and physical keys. This type of authenticator is referred to as a
token.
3. Something the individual is (static biometrics): Examples include recognition
by fingerprint, retina, and face.
4. Something the individual does (dynamic biometrics): Examples include recognition by voice
pattern, handwriting characteristics, and typing rhythm.
Access Control
Access control implements a security policy that specifies who or
what (e.g., in the case of a process) may have access to each specific system resource
and the type of access that is permitted in each instance.
An access control mechanism mediates between a user (or a process executing
on behalf of a user) and system resources, such as applications, operating systems,
firewalls, routers, files, and databases. The system must first authenticate a user seeking access.
Typically, the authentication function determines whether the user is permitted to access the system at
all
Firewalls
Firewalls can be an effective means of protecting a local system or network of systems from network-
based security threats while affording access to the outside world via wide area networks and the
Internet. Traditionally, a firewall is a dedicated computer that interfaces with computers outside a
network and has special security precautions built into it in order to protect sensitive files on
computers within the network.