OS Interview Questions Compilation
OS Interview Questions Compilation
what a semaphore is may pass a screening, but a candidate who can articulate why it
is a more general synchronization primitive than a mutex and how to implement one
from scratch demonstrates the depth required for a top-tier role.
This guide is structured to build that deeper, holistic understanding. It moves beyond
a simple list of questions and answers to provide a structured curriculum for
mastering OS concepts. The questions are thematically organized, progressing from
foundational architecture to applied concurrency and system-specific challenges. The
objective is to help you construct a coherent mental model of an operating system—a
model where you can see the causal links between different components. For
instance, understanding how the choice of a CPU scheduling algorithm can directly
influence the probability of thrashing in a virtual memory system is the kind of
connective knowledge that distinguishes exceptional candidates. This report should
be approached not as a list to be memorized, but as a strategic guide to building the
intuition and expertise necessary to excel in the most demanding technical interviews.
Table 1: Core Operating Systems Interview Questions and Concepts Matrix
The following table provides a high-level roadmap for your preparation. It maps each
key interview question to the primary and secondary concepts it tests, allowing for a
structured approach to identifying and strengthening areas of weakness.
28. Explain the Network Subsystem System Calls, File Part VIII
relationship between Descriptors
the Sockets API and
the kernel.
As a resource manager, the operating system is the software layer responsible for
managing all the hardware and software resources of a computer.4 It acts as a master
controller, ensuring that the system's finite resources—such as CPU time, memory
space, file storage, and I/O devices—are allocated efficiently and fairly among the
various applications and users competing for them. Without an OS, a system would
suffer from poor resource management, leading to chaos and inefficiency.6
As an extended machine or abstraction layer, the OS hides the complex and messy
details of the hardware from the application programmer and the end-user.6 For
example, a programmer does not need to know the specific commands for a
particular hard disk model to write data; they simply use the OS's file system API (e.g.,
write()). The OS provides a cleaner, simpler, and more portable set of services for
programs to use.4
2. What is the difference between User Mode and Kernel Mode? Why is this
separation necessary?
This question probes a candidate's understanding of the fundamental protection
mechanisms that ensure system stability and security. The "why" is more critical than
the "what," as it reveals an understanding of system design principles.
User Mode and Kernel Mode are the two distinct operational modes of a CPU.7
● User Mode: This is the standard, unprivileged mode where most applications run.
In this mode, the CPU has restricted access to hardware and memory. A program
running in user mode cannot directly access hardware devices or critical regions
of memory. If it attempts to execute a privileged instruction, the hardware will
generate a trap to the operating system.7
● Kernel Mode (also known as Supervisor, System, or Privileged Mode): This is
the privileged mode in which the operating system kernel executes. In this mode,
the CPU has unrestricted access to all hardware and memory in the system. The
kernel runs in this mode to perform its core functions, such as managing memory,
handling interrupts, and scheduling processes.7
The separation is necessary for protection and stability. If user applications could
run in kernel mode, they could inadvertently or maliciously compromise the entire
system. For example, a buggy program could overwrite the kernel's memory, crashing
the system. A malicious program could disable interrupts, monopolize the CPU, or
access the private data of other processes. The dual-mode architecture creates a
protective barrier. User applications are confined to the "sandbox" of user mode, and
the only way they can perform privileged operations is by making a controlled and
validated request to the kernel via a system call.8 This ensures that the kernel remains
in control, maintaining overall system integrity and preventing user programs from
interfering with one another or the OS itself.
3. What is a System Call? Walk through the lifecycle of a simple system call like
read().
This question tests the practical understanding of the user-kernel interface. While
many candidates can define a system call, fewer can accurately trace its execution
path, which separates rote memorization from true comprehension.
The lifecycle of a simple system call, such as read(fd, buffer, count), involves a
carefully orchestrated transition between user mode and kernel mode:
1. User-Level Invocation: The application program calls the read() function. This is
typically a wrapper function provided by a standard library (like libc in C).
2. Library Wrapper Prepares for Trap: The library function's code prepares for the
transition to kernel mode. It places the system call number for read into a specific
CPU register and the arguments (file descriptor fd, buffer address, count) into
other designated registers.
3. Trap to Kernel Mode: The library function then executes a special TRAP or
SYSCALL instruction. This instruction causes a software interrupt, which forces
the CPU to switch from user mode to kernel mode.
4. Kernel's System Call Handler: The hardware transfers control to a specific
location in the kernel's memory, which is defined in an interrupt or trap vector
table. This location contains the kernel's system call dispatcher. The dispatcher
reads the system call number from the register to identify which service is being
requested (in this case, read).
5. Execution of Kernel Service: The dispatcher invokes the appropriate kernel
function (the implementation of the read system call). This kernel code validates
the parameters (e.g., checks if the file descriptor is valid and if the buffer address
is in the user's address space) and then performs the actual I/O operation by
interacting with the relevant device driver.
6. Return from Kernel: Once the kernel function completes its task, it places the
return value (e.g., number of bytes read, or an error code) in a designated
register.
7. Switch Back to User Mode: The kernel executes a special return-from-trap
instruction (like RTI or IRET). This instruction causes the CPU to switch back from
kernel mode to user mode.
8. Resumption of User Process: Control is returned to the user-level library
function, which then returns the value provided by the kernel to the original
application code. The application continues its execution from the point
immediately after the read() call.
Monolithic Kernel:
In a monolithic architecture, the entire operating system—including core services like the
scheduler, memory manager, file systems, network stack, and device drivers—runs as a single
large program in a single address space (kernel space).4 Linux and traditional Unix systems
are primary examples.
● Pros:
○ High Performance: Communication between different OS components is as
fast as a simple function call within the same address space. There is no
overhead from context switching or inter-process communication (IPC) for
internal OS operations.8
● Cons:
○ Low Reliability and Stability: Because all components share the same
address space, a bug in one component (e.g., a faulty device driver) can
corrupt data in another component or bring down the entire system.7
○ Difficult to Maintain and Develop: The codebase is large and tightly
coupled, making it difficult to modify or extend one part of the system without
affecting others.
Microkernel:
In a microkernel architecture, only the most essential services—such as basic memory
management, scheduling, and inter-process communication (IPC)—reside in the kernel. All
other OS services (like file systems, device drivers, and network stacks) run as separate
user-space processes called servers.5 QNX and MINIX are well-known examples.
● Pros:
○ High Reliability and Security: Services are isolated in separate address
spaces. A failure in one server (e.g., a file system crash) does not crash the
entire OS; it can often be restarted independently.5 The smaller kernel has a
smaller trusted computing base, making it easier to secure.
○ Modularity and Extensibility: Services can be developed, tested, and
updated independently, making the system easier to maintain and extend.
● Cons:
○ Performance Overhead: Communication between services requires IPC,
which involves context switches between user mode and kernel mode. This
frequent communication can be significantly slower than the simple function
calls in a monolithic kernel.5
The theoretical debate between these two architectures has led to a practical
convergence. Most modern, mainstream operating systems like Windows and macOS
are not purely one or the other but are better described as hybrid kernels. They keep
performance-critical components like the network stack and file system in kernel
space (like a monolithic kernel) but are designed with a modular, layered structure
that allows for dynamically loading components like drivers (borrowing from the
philosophy of microkernels). This approach attempts to achieve a pragmatic balance,
gaining much of the performance of a monolithic design while incorporating some of
the modularity and reliability benefits of a microkernel. Acknowledging this real-world
evolution demonstrates a level of understanding beyond simple textbook definitions.
This is one of the most frequently asked OS interview questions, serving as a litmus
test for understanding concurrency fundamentals.5 A comprehensive answer must
detail the differences across several dimensions: resource ownership, execution
context, creation cost, and communication.
A thread, on the other hand, is the unit of execution or scheduling. A thread exists
within the context of a process and is often called a "lightweight process." Multiple
threads can exist within a single process, and they share the process's resources,
including its address space (code and data sections) and open files.3 However, each
thread has its own independent execution context: a program counter, a set of
registers, and a stack. This allows threads within the same process to execute
different parts of the program concurrently.
6. Describe the states of a process and the transitions between them. What
information is stored in a Process Control Block (PCB)?
This question assesses a candidate's understanding of the process lifecycle and the
core data structure the OS uses to manage it. A good answer includes a clear
description or diagram of the state model and a thorough list of the contents of a
PCB.
A process transitions through several states during its lifetime. The most common
model includes the following states 3:
● New: The process is being created. The OS has not yet admitted it to the pool of
executable processes.
● Ready: The process is loaded into main memory and is waiting to be assigned to
a CPU for execution. It has all the resources it needs except the CPU itself. Ready
processes are typically kept in a queue.
● Running: The process's instructions are being executed by the CPU.
● Waiting (or Blocked): The process is waiting for some event to occur, such as
the completion of an I/O operation, the availability of a resource, or a signal from
another process. It cannot proceed even if the CPU is free.
● Terminated: The process has finished execution. Its resources are being
deallocated by the OS.
The Process Control Block (PCB), also known as a Task Control Block, is a data
structure within the kernel that stores all the information the OS needs to manage a
specific process.6 When the OS performs a context switch, the context of the
outgoing process is saved in its PCB, and the context of the incoming process is
loaded from its PCB. The PCB contains 6:
● Process State: The current state of the process (e.g., New, Ready, Running).
● Process ID (PID): A unique identifier for the process.
● Program Counter (PC): The address of the next instruction to be executed for
this process.
● CPU Registers: The contents of the processor's registers (e.g., accumulators,
index registers, stack pointers).
● CPU Scheduling Information: Process priority, pointers to scheduling queues,
and other scheduling parameters.
● Memory Management Information: Information such as page tables or segment
tables that define the process's virtual address space.
● Accounting Information: CPU time used, time limits, account numbers, etc.
● I/O Status Information: A list of I/O devices allocated to the process, a list of
open files, etc.
This question targets a core mechanism of multitasking operating systems and its
performance implications. An interviewer wants to confirm that the candidate
understands not just the mechanism but also its cost.
A context switch is the process of storing the current state (or context) of a process
or thread and restoring the state of another so that execution can be switched from
one to the other.5 This mechanism is what allows a single CPU to be shared among
multiple concurrently running processes, creating the illusion of parallelism.3 The
"context" is the complete set of information needed to restart the process, which is
stored in its Process Control Block (PCB). This includes the program counter, CPU
registers, and memory management information.6
A classic example is the count++ operation on a shared integer variable, which is not
atomic. It typically decomposes into three machine instructions:
1. Load the value of count from memory into a register.
2. Increment the value in the register.
3. Store the new value from the register back into memory.
If two threads execute this sequence concurrently, they might both load the same
initial value, both increment it, and both store back the same result, causing one
of the increments to be lost.
Race conditions are prevented by enforcing mutual exclusion on the critical section
of code—the part of the program that accesses the shared resource.8 By ensuring
that only one thread can execute the critical section at any given time, the operation
becomes effectively atomic. The primary mechanisms for preventing race conditions
are synchronization primitives 5:
● Mutexes (Mutual Exclusion Locks): The most common solution. A thread must
acquire the mutex before entering the critical section and release it upon exiting.
● Semaphores: Can be used to control access to a resource, effectively acting as a
lock.
● Monitors (or Synchronized Blocks/Methods in Java): Higher-level language
constructs that bundle a mutex with the data it protects, simplifying
synchronization.
This is a subtle but critical question that separates candidates who have a deep
understanding of synchronization from those with only a superficial one. While a
binary semaphore can be used to achieve mutual exclusion like a mutex, their design
intent and properties are different.
wait (or P, down) operation, which attempts to decrement the semaphore's value (and
blocks if it's 0), and any thread can perform a signal (or V, up) operation, which
increments the value. This means one thread can signal a semaphore to wake up
another thread that is waiting on it. While a binary semaphore initialized to 1 can be
used to provide mutual exclusion, its more general purpose is for synchronization
between threads, such as signaling the completion of an event or handing off control.
To summarize:
● Purpose: A mutex is for locking (mutual exclusion); a semaphore is for signaling
(general synchronization).
● Ownership: A mutex is owned by the thread that locks it; a semaphore has no
owner.
● Usage: Only the owner of a mutex can unlock it. Any thread can signal a
semaphore.
This distinction is not just academic. Using a semaphore when a mutex is the
appropriate tool can lead to subtle bugs, as the lock can be inadvertently released by
a thread that never acquired it. The interview question is designed to see if the
candidate understands this difference in design intent.
11. What is a deadlock? What are the four necessary conditions for a deadlock to
occur?
For a deadlock to occur, four conditions, often called the Coffman conditions, must
hold simultaneously in the system 6:
1. Mutual Exclusion: At least one resource must be held in a non-sharable mode.
That is, only one process at a time can use the resource. If another process
requests that resource, the requesting process must be delayed until the
resource has been released.
2. Hold and Wait: A process must be holding at least one resource and waiting to
acquire additional resources that are currently being held by other processes.
3. No Preemption: Resources cannot be preempted; that is, a resource can be
released only voluntarily by the process holding it, after that process has
completed its task. The OS cannot forcibly take a resource away from a process.
4. Circular Wait: There must exist a set of waiting processes {P0,P1,...,Pn} such that
P0is waiting for a resource held by P1, P1is waiting for a resource held by P2,...,
Pn−1is waiting for a resource held by Pn, and Pnis waiting for a resource held by
P0. This creates the circular chain of dependencies.
All four of these conditions must be met for a deadlock to be possible. If any one of
them is prevented, deadlock cannot occur.
12. How can you handle deadlocks? (Prevention, Avoidance, Detection &
Recovery)
This question follows naturally from the previous one and transitions from theory to
practical strategy. It assesses whether a candidate can think about system-level
approaches to solving a complex problem.
There are three primary strategies for dealing with deadlocks, plus the common
pragmatic approach of ignoring them.
1. Deadlock Prevention: This strategy involves designing the system to ensure that
at least one of the four necessary Coffman conditions can never hold, thus
making deadlocks structurally impossible.3
○ Break Mutual Exclusion: Make resources sharable (often not possible, e.g.,
for a printer).
○ Break Hold and Wait: Require a process to request all its required resources
at once (all-or-nothing). This can lead to low resource utilization and potential
starvation.
○ Break No Preemption: Allow the OS to preempt resources from a process if
another, higher-priority process needs them. This is complex to implement.
○ Break Circular Wait: Impose a total ordering on all resource types and require
that each process requests resources in an increasing order of enumeration.
This is a common and effective technique (e.g., lock ordering).
Prevention is often too restrictive and can lead to poor system performance.
2. Deadlock Avoidance: This strategy allows the system to enter states that satisfy
the four conditions but uses an algorithm to dynamically check every resource
request. A request is only granted if it leads to a "safe state"—a state from which
there is at least one sequence of execution that allows all processes to run to
completion.3 The classic example is the
Banker's Algorithm.6 Avoidance requires
a priori information about the maximum number of resources each process might
request, which is often not available in general-purpose operating systems,
making it impractical for them. It is more suited to specialized systems.
3. Deadlock Detection and Recovery: This strategy allows the system to enter a
deadlocked state, periodically runs an algorithm to detect if a deadlock has
occurred (e.g., by searching for cycles in a resource-allocation graph), and then
applies a recovery scheme.7 Recovery options include 6:
○ Process Termination: Abort one or more of the deadlocked processes. This
is a blunt but effective approach.
○ Resource Preemption: Forcibly take a resource from one process and give it
to another. This often involves rolling back the preempted process to a safe
state, which can be very complex.
In practice, most general-purpose operating systems like Linux and Windows do not
implement complex prevention or avoidance schemes. They essentially ignore the
problem, assuming that deadlocks are rare and are the result of programmer error.
They provide synchronization primitives like mutexes and semaphores and leave the
responsibility of using them correctly (e.g., by enforcing a strict lock ordering to
prevent circular waits) to the application developer. This pragmatic approach avoids
the performance overhead and restrictions of the more formal methods.
Non-Preemptive Scheduling:
In a non-preemptive scheduling system, once the CPU has been allocated to a process, that
process keeps the CPU until it voluntarily releases it. A process releases the CPU in one of two
ways: either by terminating or by switching to the waiting state (e.g., to perform an I/O
operation).5 The scheduler has no power to force a process off the CPU. This model is simple
to implement and has low overhead, as there are no forced context switches. However, it is
not suitable for time-sharing or real-time systems because a long-running process can
monopolize the CPU, making the system unresponsive to other processes. First-Come,
First-Served (FCFS) is a classic example of a non-preemptive algorithm.
Preemptive Scheduling:
In a preemptive scheduling system, the operating system can forcibly remove a process from
the CPU and reallocate it to another process. This preemption can occur for several reasons
5:
● A running process's time slice (or quantum) expires.
● A higher-priority process transitions from the waiting state to the ready state.
Preemptive scheduling is essential for modern multitasking operating systems. It
ensures that no single process can dominate the CPU, leading to better system
responsiveness and fairness. However, it introduces more overhead due to the
increased frequency of context switching. It can also lead to complexities in
managing shared data, as a process might be preempted in the middle of
updating a shared data structure. Round-Robin (RR) and Shortest Remaining Time
First (SRTF) are examples of preemptive algorithms.
14. Describe the following scheduling algorithms and their pros and cons: FCFS,
SJF, Priority, and Round-Robin.
This is a core knowledge question designed to test a candidate's familiarity with the
standard CPU scheduling algorithms and their performance characteristics. A strong
answer will not only describe how each algorithm works but also analyze its trade-offs
using standard metrics like average waiting time, turnaround time, and response time.
Aging is a common technique used to prevent starvation.6 The core idea of aging is to
gradually increase the priority of processes that have been waiting in the system for a
long time. For example, the OS could periodically scan the ready queue and increment
the priority of every process that has been waiting for a certain duration. Eventually, a
process that has been waiting for a very long time will have its priority raised high
enough that it will be selected by the scheduler, guaranteeing that it will eventually
run. This technique ensures fairness and prevents any process from being stuck in the
ready queue forever.
This question compares the two primary techniques for implementing virtual memory.
A strong answer will focus on the fundamental difference—fixed-size versus
variable-size units—and explain the consequences of this design choice, particularly
regarding fragmentation.
Paging:
Paging is a memory management scheme that divides a process's virtual address space into
fixed-size blocks called pages. Physical memory is similarly divided into fixed-size blocks
called frames, where the page size and frame size are identical.5 The OS maintains a
page table for each process, which maps each virtual page to a physical frame.
● Key Characteristic: Uses fixed-size units.
● Pros: It completely eliminates the problem of external fragmentation because
any free frame can be allocated to any page. Swapping pages is simple because
all units are the same size.
● Cons: It can suffer from internal fragmentation. If a process does not need an
amount of memory that is an exact multiple of the page size, the last page
allocated will have some unused space within it, which is wasted.6 The mapping is
purely physical and does not reflect the logical structure of the program.
Segmentation:
Segmentation is a memory management scheme that divides a process's virtual address
space into a collection of logical, variable-sized units called segments.5 These segments
typically correspond to the logical parts of a program, such as a code segment, a data
segment, and a stack segment. The OS maintains a
segment table for each process, which stores the base address and length of each
segment in physical memory.
● Key Characteristic: Uses variable-sized units based on the program's logical
structure.
● Pros: The mapping is logical and can be used to enforce protection (e.g., marking
the code segment as read-only). It allows for the sharing of entire segments (e.g.,
a shared library's code segment) between processes.
● Cons: It suffers from external fragmentation. As segments of various sizes are
loaded and unloaded from memory, the free memory space can be broken into
many small, non-contiguous holes. A new segment may not fit into any of the
available holes, even if the total free space is sufficient.6 This requires compaction
to solve, which is a costly operation.
Most modern systems use a hybrid approach, paging with segmentation, where the
address space is first divided into segments, and then each segment is further divided
into pages.
18. What is a page fault? Describe the steps the OS takes to handle it.
This is a critical "how it works" question that tests a candidate's detailed knowledge of
the mechanics of virtual memory. A precise, step-by-step answer is expected.
A page fault is a type of trap or exception generated by the hardware's Memory
Management Unit (MMU) when a running program attempts to access a piece of data
or code that is in its virtual address space but is not currently located in the system's
physical memory (RAM).4 This is not necessarily an error; it is a normal event in a
demand-paged virtual memory system.
The operating system handles a page fault through the following sequence of steps:
1. Hardware Trap: The MMU detects that the virtual address cannot be translated
because the corresponding page table entry is marked as invalid or not present.
The MMU generates a trap, which switches the CPU from user mode to kernel
mode and transfers control to the OS's page fault handler.
2. Save Process Context: The OS saves the current state of the process (program
counter, registers) so it can be resumed later.
3. Validate the Access: The OS checks an internal table (often part of the PCB) to
determine if the access was valid. It verifies that the virtual address is within the
process's legal address space and that the access type (read/write) is permitted.
If the access is illegal, the process is terminated (resulting in a "Segmentation
Fault" or "Access Violation" error).
4. Find a Free Frame: If the access was valid, the OS knows the page is on the
backing store (disk). It must now find a free frame in physical memory to load the
page into.
5. Page Replacement (if necessary): If there are no free frames, the OS must
select a victim frame to be replaced using a page replacement algorithm (such
as Least Recently Used (LRU) or a clock algorithm). If the victim page has been
modified (is "dirty"), it must be written back to the disk before the frame can be
reused.
6. Schedule Disk I/O: The OS schedules a disk read operation to load the required
page from the backing store into the now-available physical frame.
7. Block the Process: While the disk I/O is in progress, the OS will typically switch
context to another ready process, as disk access is very slow. The faulting
process is moved to the waiting state.
8. Update Page Table: Once the disk read is complete, the OS updates the
process's page table to map the virtual page to the correct physical frame and
sets the valid/present bit to indicate that the page is now in memory.
9. Resume the Process: The OS moves the faulting process from the waiting state
back to the ready queue. Eventually, the scheduler will select it to run again.
10.Restart the Instruction: The OS restores the process's saved context and
resumes its execution. The instruction that caused the fault is re-executed, and
this time, the MMU finds a valid translation and the memory access succeeds.
The root cause of thrashing is that processes do not have enough frames to hold their
working set—the set of pages that a process is actively using at a given point in time.
If a process cannot keep its working set in memory, it will fault continuously.
20. What is the difference between a hard link and a soft link (symbolic link)?
This is a classic file system question that tests a candidate's understanding of the
distinction between file metadata (specifically, inodes) and the directory entries that
point to them.
Hard Link:
A hard link is a directory entry that associates a name with a file's inode.5 When you create a
hard link, you are creating another name that points directly to the
same inode.
● Mechanism: All hard links to a file are equally valid names for it; there is no
"original" file and "linked" file. The inode itself contains a reference count that
tracks how many hard links point to it.
● Behavior: The file's data is only deleted from the disk when the reference count
in the inode drops to zero (i.e., when the last hard link to it is removed).
● Limitations: A hard link cannot be created for a directory, and it cannot cross file
system (or partition) boundaries, because inode numbers are only unique within a
single file system.
21. What is RAID? What is the difference between RAID 0, RAID 1, and RAID 5?
22. Write a C program to create a new process using fork(). Explain the output.
This question tests a candidate's practical ability to use one of the most fundamental
process management system calls in Unix-like systems. The key to a correct
explanation lies in understanding that fork() returns twice—once in the parent and
once in the child—with different return values.
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main() {
pid_t pid;
// Fork a child process
pid = fork();
if (pid < 0) {
// Error occurred
fprintf(stderr, "Fork Failed\n");
return 1;
} else if (pid == 0) {
// This is the child process
printf("I am the child process, my PID is %d\n", getpid());
printf("My parent's PID is %d\n", getppid());
// Child can do its own work here, e.g., using execlp
// execlp("/bin/ls", "ls", NULL);
} else {
// This is the parent process
printf("I am the parent process, my PID is %d\n", getpid());
printf("My child's PID is %d\n", pid);
// Parent waits for the child to complete
wait(NULL);
printf("Child Complete\n");
}
return 0;
}
The if-else if-else structure is the standard idiom for handling the two execution
paths:
1. The pid == 0 block is executed only by the child process. Here, it prints its own
PID (obtained via getpid()) and its parent's PID (obtained via getppid()).
2. The else block (where pid > 0) is executed only by the parent process. It prints its
own PID and the PID of the child it just created (which is the value stored in the
pid variable).
3. The wait(NULL) call in the parent process causes it to pause until the child
process terminates. This is important for ensuring the parent doesn't exit before
the child, which would "orphan" the child, and for synchronizing their execution.
The exact order of the parent's and child's printf statements can vary depending on
the OS scheduler, but the parent's Child Complete message will always appear after
the child's messages because of the wait() call.
23. Implement the Producer-Consumer problem. First, explain the solution using
semaphores, then write the code in Python/Java using locks and condition
variables.
import threading
import time
import random
class ProducerConsumer:
def __init__(self, size):
self.buffer =
self.size = size
self.lock = threading.Lock()
self.not_full = threading.Condition(self.lock)
self.not_empty = threading.Condition(self.lock)
def producer(self):
while True:
with self.lock:
while len(self.buffer) == self.size:
print("Buffer is full, producer is waiting.")
self.not_full.wait()
item = random.randint(1, 100)
self.buffer.append(item)
print(f"Producer produced {item}")
# Signal to consumer that buffer is no longer empty
self.not_empty.notify()
time.sleep(random.random())
def consumer(self):
while True:
with self.lock:
while len(self.buffer) == 0:
print("Buffer is empty, consumer is waiting.")
self.not_empty.wait()
item = self.buffer.pop(0)
print(f"Consumer consumed {item}")
# Signal to producer that buffer is no longer full
self.not_full.notify()
time.sleep(random.random())
if __name__ == "__main__":
pc = ProducerConsumer(5)
producer_thread = threading.Thread(target=pc.producer)
consumer_thread = threading.Thread(target=pc.consumer)
producer_thread.start()
consumer_thread.start()
producer_thread.join()
consumer_thread.join()
Explanation:
● A single Lock is used to provide mutual exclusion for accessing the buffer.
● Two Condition variables, not_full and not_empty, are associated with that lock.
● The producer acquires the lock. It uses a while loop (to guard against spurious
wakeups) to check if the buffer is full. If it is, it calls not_full.wait(), which
atomically releases the lock and puts the thread to sleep.
● When the consumer removes an item, it calls not_full.notify(), which wakes up a
waiting producer.
● The consumer's logic is symmetrical, waiting on the not_empty condition if the
buffer is empty.
24. How would you implement a thread-safe queue? Write the code.
This is a very practical and common interview question that directly tests a
candidate's ability to write correct concurrent code. The implementation requires
combining a data structure (a queue) with synchronization primitives to protect it from
race conditions.
A thread-safe queue must ensure that its enqueue (or push) and dequeue (or pop)
operations are atomic and that threads behave correctly when the queue is empty (on
dequeue) or full (on enqueue, for a bounded queue).
C++
#include <iostream>
#include <queue>
#include <thread>
#include <mutex>
#include <condition_variable>
template <typename T>
class ThreadSafeQueue {
public:
ThreadSafeQueue(size_t capacity) : capacity_(capacity) {}
void enqueue(T item) {
std::unique_lock<std::mutex> lock(mutex_);
// Wait until the queue is not full
cond_not_full_.wait(lock, [this] { return queue_.size() < capacity_; });
queue_.push(std::move(item));
// Notify one waiting consumer that the queue is no longer empty
cond_not_empty_.notify_one();
}
T dequeue() {
std::unique_lock<std::mutex> lock(mutex_);
// Wait until the queue is not empty
cond_not_empty_.wait(lock, [this] { return!queue_.empty(); });
T item = std::move(queue_.front());
queue_.pop();
// Notify one waiting producer that the queue is no longer full
cond_not_full_.notify_one();
return item;
}
private:
std::queue<T> queue_;
std::mutex mutex_;
std::condition_variable cond_not_empty_;
std::condition_variable cond_not_full_;
size_t capacity_;
};
// Example Usage
void producer_task(ThreadSafeQueue<int>& q) {
for (int i = 0; i < 10; ++i) {
std::cout << "Producing " << i << std::endl;
q.enqueue(i);
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
}
void consumer_task(ThreadSafeQueue<int>& q) {
for (int i = 0; i < 10; ++i) {
int item = q.dequeue();
std::cout << "Consumed " << item << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(150));
}
}
int main() {
ThreadSafeQueue<int> q(5);
std::thread producer(producer_task, std::ref(q));
std::thread consumer(consumer_task, std::ref(q));
producer.join();
consumer.join();
return 0;
}
Explanation:
● std::mutex mutex_: This mutex protects all access to the internal std::queue.
std::unique_lock is used for RAII-style locking and unlocking and is required for
use with condition variables.
● std::condition_variable cond_not_empty_: Consumers wait on this condition
variable when the queue is empty. Producers notify it after enqueuing an item.
● std::condition_variable cond_not_full_: Producers wait on this condition
variable when the queue is full. Consumers notify it after dequeuing an item.
● wait() with a Predicate: The wait() calls use a lambda function ([this] { return...;
}) as a predicate. This is the modern, robust way to use condition variables. It
protects against spurious wakeups by re-checking the condition after waking up.
The thread only proceeds if the condition is actually true.
25. Implement a binary semaphore from scratch using only a mutex and a
condition variable.
A binary semaphore is a counter that can only hold the values 0 or 1. The wait
operation (P) waits until the value is 1 and then atomically decrements it to 0. The
signal operation (V) sets the value to 1 (if it was 0) and wakes up a waiting thread.
C++
#include <mutex>
#include <condition_variable>
class BinarySemaphore {
public:
BinarySemaphore(int initial_count = 0) : count_(initial_count) {}
// P or wait() operation
void wait() {
std::unique_lock<std::mutex> lock(mutex_);
// Wait while the semaphore count is 0
cond_.wait(lock, [this] { return count_ > 0; });
// Decrement the count, as we have acquired the semaphore
count_--;
}
// V or signal() operation
void signal() {
std::unique_lock<std::mutex> lock(mutex_);
// Increment the count. For a binary semaphore, we can cap it at 1.
if (count_ == 0) {
count_++;
// Notify one waiting thread that the semaphore is now available
cond_.notify_one();
}
}
private:
std::mutex mutex_;
std::condition_variable cond_;
int count_; // Can be 0 or 1
};
Explanation:
● State Variables: The class holds a std::mutex for mutual exclusion, a
std::condition_variable for blocking and waking threads, and an integer count_ to
represent the semaphore's state (0 or 1).
● wait() (P operation):
1. It acquires the mutex to protect access to count_.
2. It then calls cond_.wait() with a predicate [this] { return count_ > 0; }. This is
the critical step. The thread will block until count_ is greater than 0. The wait
call atomically releases the mutex while the thread is asleep and re-acquires it
before waking up. The loop structure handles spurious wakeups.
3. Once the thread is woken up and the predicate is true, it decrements count_
to 0, signifying that it has acquired the semaphore.
4. The lock is released when unique_lock goes out of scope.
● signal() (V operation):
1. It acquires the mutex.
2. It increments count_ back to 1 (if it was 0). This signifies releasing the
semaphore.
3. It then calls cond_.notify_one() to wake up exactly one thread that might be
waiting in the wait() method.
4. The lock is released.
26. What are the key characteristics of a Network Operating System (NOS) or a
Real-Time Operating System (RTOS) used in a high-performance router?
27. How does a router's OS use interrupts and Direct Memory Access (DMA) to
process packets efficiently?
Context switching in a router can occur between different control plane processes
or, more critically, between a control plane task and a data plane task.9 For example, if
a packet requires special handling that cannot be done in hardware (e.g., it's destined
for the router itself, or it requires fragmentation), it might be "punted" to the CPU,
causing a context switch to a packet processing thread.
Minimizing context switching is absolutely critical in a router for one primary reason:
performance. Every CPU cycle spent on a context switch is a cycle not spent
forwarding packets.3 In a device designed to handle millions of packets per second,
even a small amount of overhead per packet can have a massive impact on the overall
throughput. Excessive context switching can lead to:
● Increased Latency: The time it takes to forward a packet increases.
● Reduced Throughput: The total number of packets forwarded per second
decreases.
● Packet Drops: If the processing queues back up due to the CPU being busy with
context switches, buffers will overflow, and incoming packets will be dropped.
For this reason, router operating systems are heavily optimized to minimize context
switches, often by running data plane tasks at a very high priority, using polling
instead of interrupts in some cases (to avoid the overhead of interrupt handling), and
processing as many packets as possible per scheduling cycle.
30. What is a "zombie process" and why is it important to handle them correctly in
a long-running system like a router?
This is a specific but important question about process lifecycle management that is
particularly relevant to high-reliability systems. It tests a candidate's understanding of
resource leaks and their long-term impact.
A zombie process is a process that has completed its execution (it has terminated)
but still has an entry in the operating system's process table.6 This occurs because the
process's parent has not yet read its exit status by calling one of the
wait() family of system calls. The kernel keeps the process table entry around so the
parent can retrieve this information. The zombie process itself is "dead"—it consumes
no CPU resources—but its entry continues to occupy a slot in the finite-sized process
table.
The importance lies in preventing a slow resource leak. If a parent process is poorly
written and repeatedly creates child processes without ever calling wait() to "reap"
them after they finish, the number of zombie processes will grow over time.6
Eventually, the process table will become completely filled with zombie entries. When
this happens, the OS will be unable to create any new processes, as there are no free
slots left in the table. This can lead to a catastrophic failure of the system, as critical
functions that require new processes to be spawned will fail. Therefore, in a
high-reliability environment, it is essential that all parent processes correctly reap
their children, or that a master "init-like" process is in place to adopt and reap any
orphaned processes, ensuring that zombies do not accumulate over time.
This guide has traversed the critical landscape of operating systems, from
foundational architecture to the nuances of concurrency and the specialized
demands of network devices. The 30 questions detailed here are not merely a
checklist but a framework for building a deep, interconnected understanding of how
modern computer systems function. Success in a top-tier technical interview hinges
not on the rote memorization of these answers, but on the ability to synthesize this
knowledge and articulate the underlying principles.
why a microkernel's modularity comes at the cost of performance? Can they articulate
the chain of events from a memory access to a page fault to a potential thrashing
condition? Can they design a solution to a concurrency problem and defend their
choice of synchronization primitives?
Therefore, the most effective way to use this guide is to practice articulating the
"why" behind every "what." When discussing a topic, frame the answer in terms of
design choices and their consequences. Use a whiteboard to draw state diagrams,
architectural layouts, and data flows. When faced with a question, ask clarifying
questions to demonstrate a methodical approach to problem-solving. The goal is to
showcase not just what you know, but how you think. By internalizing the principles
behind these questions and understanding their practical implications, a candidate
will be well-equipped to demonstrate the expert-level competence required to secure
a role at the forefront of the technology industry.