Mces Notes
Mces Notes
MICROCONTROLLER
2023
AND EMBEDDED
SYSTEMS
Mr.Chetan R
Reference Textbooks
1. Andrew N Sloss, Dominic Symes and Chris Wright, ARM system developers guide,
Elsevier, Morgan Kaufman publishers, 2008.
2. Shibu K V, “Introduction to Embedded Systems”, Tata McGraw Hill Education, Private
Limited, 2nd Edition.
03.09.2022
IV Semester
These are sample Strategies, which teachers can use to accelerate the attainment of the various course
outcomes.
1. The lecturer method (L) does not mean only the traditional lecture method, but different types of
teaching methods may be adopted to develop the outcomes.
2. Show video/animation films to explain the functioning of various concepts.
3. Encourage collaborative (group learning) learning in the class.
4. Ask at least three HOT (Higher order Thinking) questions in the class, which promotes critical
thinking.
5. Adopt Problem Based Learning (PBL), which fosters students’ Analytical skills, develop thinking
skills such as the ability to evaluate, generalize, and analyze information rather than simply recall
it.
6. Topics will be introduced in multiple representations.
7. Show the different ways to solve the same problem and encourage the students to come up with
their own creative ways to solve them.
8. Discuss how every concept can be applied to the real world, and when that's possible, it helps
improve the students' understanding.
Module-1
Microprocessors versus Microcontrollers, ARM Embedded Systems: The RISC design philosophy, The
ARM Design Philosophy, Embedded System Hardware, Embedded System Software.
ARM Processor Fundamentals: Registers, Current Program Status Register, Pipeline, Exceptions,
Interrupts, and the Vector Table, Core Extensions
C Compilers and Optimization :Basic C Data Types, C Looping Structures, Register Allocation, Function
03.09.2022
ARM programming using Assembly language: Writing Assembly code, Profiling and cycle counting,
instruction scheduling, Register Allocation, Conditional Execution, Looping Constructs
Textbook 1: Chapter-5,6
Laboratory Component:
1. Write a program to arrange a series of 32 bit numbers in ascending/descending order.
2. Write a program to count the number of ones and zeros in two consecutive memory
locations.
3. Display “Hello World” message using Internal UART.
issues – Racing and Deadlock, Concept of Binary and counting semaphores (Mutex example without any
program), How to choose an RTOS, Integration and testing of Embedded hardware and firmware,
Embedded system Development Environment – Block diagram (excluding Keil),
Disassembler/decompiler, simulator, emulator and debugging techniques, target hardware debugging,
boundary scan.
Textbook 2: Chapter-10 (Sections 10.1, 10.2, 10.3, 10.4 , 10.7, 10.8.1.1, 10.8.1.2, 10.8.2.2, 10.10
only), Chapter 12, Chapter-13 ( block diagram before 13.1, 13.3, 13.4, 13.5, 13.6 only)
Laboratory Component:
1. Demonstration of IoT applications by using Arduino and Raspberry Pi
Teaching-Learning Process 1. Chalk and Board for numerical and discussion
2. Significance of real time operating system[RTOS] using
raspberry pi
Course outcome (Course Skill Set)
At the end of the course, the student will be able to:
CO 1. Explain C-Compilers and optimization
CO 2. Describe the ARM microcontroller's architectural features and program module.
CO 3. Apply the knowledge gained from programming on ARM to different applications.
CO 4. Program the basic hardware components and their application selection method.
CO 5. Demonstrate the need for a real-time operating system for embedded system applications.
Assessment Details (both CIE and SEE)
The weightage of Continuous Internal Evaluation (CIE) is 50% and for Semester End Exam (SEE) is 50%.
The minimum passing mark for the CIE is 40% of the maximum marks (20 marks). A student shall be
deemed to have satisfied the academic requirements and earned the credits allotted to each subject/
course if the student secures not less than 35% (18 Marks out of 50) in the semester-end examination
(SEE), and a minimum of 40% (40 marks out of 100) in the sum total of the CIE (Continuous Internal
Evaluation) and SEE (Semester End Examination) taken together
Continuous Internal Evaluation:
Three Unit Tests each of 20 Marks (duration 01 hour)
1. First test at the end of 5th week of the semester
2. Second test at the end of the 10th week of the semester
3. Third test at the end of the 15th week of the semester
Two assignments each of 10 Marks
4. First assignment at the end of 4th week of the semester
5. Second assignment at the end of 9th week of the semester
Practical Sessions need to be assessed by appropriate rubrics and viva-voce method. This will contribute
to 20 marks.
Rubrics for each Experiment taken average for all Lab components – 15 Marks.
Viva-Voce– 5 Marks (more emphasized on demonstration topics)
The sum of three tests, two assignments, and practical sessions will be out of 100 marks and will be
scaled down to 50 marks
(to have a less stressed CIE, the portion of the syllabus should not be common /repeated for any of the
methods of the CIE. Each method of CIE should have a different syllabus portion of the course).
CIE methods /question paper has to be designed to attain the different levels of Bloom’s taxonomy
as per the outcome defined for the course.
1. The question paper will have ten questions. Each question is set for 20 marks.
2. There will be 2 questions from each module. Each of the two questions under a module (with a
maximum of 3 sub-questions), should have a mix of topics under that module.
3. The students have to answer 5 full questions, selecting one full question from each module.
Marks scored shall be proportionally reduced to 50 marks
MODULE 2
MODULE 3
1. How a frequently used structure can have a significant impact on its performance and code
density
2. Write a note on Bit-fields,
3. What is the best way to deal with endian and alignment problems?,
4. How Division operation is done in ARM,
5. Write a note on Portability Issues.
MODULE 4
MODULE 5
MODULE 1
Microprocessors versus Microcontrollers, ARM Embedded Systems: The RISC design philosophy, The ARM
Design Philosophy, Embedded System Hardware, Embedded System Software.
ARM Processor Fundamentals: Registers, Current Program Status Register, Pipeline, Exceptions, Interrupts,
and the Vector Table, Core Extensions
Textbook 1: Chapter 1 - 1.1 to 1.4, Chapter 2 - 2.1 to 2.5
Text book: Andrew N Sloss, Dominic Symes and Chris Wright, ARM system developers guide, Elsevier, Morgan Kaufman
publishers, 2008.
SMVITM,UDUPI Page 1
MICROCONTROLLER AND EMBEDDED SYSTEMS
RISC VS CISC
SMVITM,UDUPI Page 2
MICROCONTROLLER AND EMBEDDED SYSTEMS
■ The ARM processor controls the embedded device. Different versions of the ARM processor are available to
suit the desired operating characteristics. An ARM processor comprises a core plus the surrounding
components that interface it with a bus. These components can include memory management and caches.
■ Controllers coordinate important functional blocks of the system. Two commonly found controllers are
interrupt and memory controllers.
• Memory controllers connect different types of memory to the processor bus. On power-up a memory
controller is configured in hardware to allow certain memory devices to be active. These memory
devices allow the initialization code to be executed.
• An interrupt controller provides a programmable governing policy that allows software to determine
which peripheral or device can interrupt the processor at any specific time by setting the appropriate
bits in the interrupt controller registers.
■ The peripherals provide all the input-output capability external to the chip and are responsible for the
uniqueness of the embedded device. ARM peripherals are memory mapped. Peripherals range from a simple
serial communication device to a more complex 802.11 wireless device.
■ A bus is used to communicate between different parts of the device.
SMVITM,UDUPI Page 3
MICROCONTROLLER AND EMBEDDED SYSTEMS
Memory
An embedded system has to have some form of memory to store and execute code. You have to compare price,
performance, and power consumption when deciding upon specific memory characteristics, such as hierarchy,
width, and type.
Hierarchy
The fastest memory cache is physically located nearer the ARM processor core and the slowest secondary
memory is set further away. Generally the closer memory is to the processor core, the more it costs and the
smaller its capacity. The cache is placed between main memory and the core. It is used to speed up data transfer
between the processor and main memory.
SMVITM,UDUPI Page 4
MICROCONTROLLER AND EMBEDDED SYSTEMS
Width
The memory width is the number of bits the memory returns on each access—typically 8, 16, 32 bits. The
memory width has a direct effect on the overall performance and cost ratio.
Types
Flash ROM can be written to as well as read. Its main use is for holding the device firmware or storing
long- term data that needs to be preserved after power is off.
SRAM Cell DRAM Cell
Made up of 6 CMOS transistors (MOSFET) Made up of a MOSFET and a capacitor
Doesn’t require refreshing Requires refreshing
Low capacity (Less dense) High capacity (Highly dense)
More expensive Less expensive
Fast in operation. Typical access time is 10ns Slow in operation due to refresh
requirements. Typical access time is 60ns.
Write operation is faster than read
operation.
SMVITM,UDUPI Page 5
MICROCONTROLLER AND EMBEDDED SYSTEMS
Finally, an application performs one of the tasks required for a device. The software components can run
from ROM or RAM. ROM code that is fixed on the device (for example, the initialization code) is called
firmware.
Initialization Code
It is common for ARM-based embedded systems to provide for memory remapping because it allows the system to start
the initialization code from ROM at power-up. The initialization code then redefines or remaps the memory map to
place RAM at address 0x00000000—an important step because then the exception vector table can be in RAM and
thus can be reprogrammed.
The initialization code handles a number of administrative tasks prior to handing control over to an operating
system image. We can group these different tasks into three phases: initial hardware configuration, diagnostics,
and booting.
SMVITM,UDUPI Page 6
MICROCONTROLLER AND EMBEDDED SYSTEMS
Operation Modes
ARM processor has two modes and two privilege levels. The operation modes (thread mode and handler mode)
determine whether the processor is running a normal program or running an exception handler like an interrupt
handler or system exception handler.
Software in the privileged access level can switch the program into the user access level using the control register
(CPSR). When an exception takes place, the processor will always switch back to the privileged state and return
to the previous state when exiting the exception handler. A user program cannot change back to the privileged
state by writing to the control register. It has to go through an exception handler that programs the control register
(CPSR) to switch the processor back into the privileged access level when returning to thread mode.
The processor mode determines which registers are active and the access rights to the cpsr register itself. Each
processor mode is either privileged or non-privileged: A privileged mode allows full read-write access to the cpsr.
SMVITM,UDUPI Page 7
MICROCONTROLLER AND EMBEDDED SYSTEMS
Conversely, a non-privileged mode only allows read access to the control field in the cpsr but still allows read-
write access to the condition flags.
There are seven processor modes in total: six privileged modes (abort, fast interrupt request, interrupt request,
supervisor, system, and undefined) and one nonprivileged mode (user).
The processor enters abort mode when there is a failed attempt to access memory.
Fast interrupt request and interrupt request modes correspond to the two interrupt levels available on the
ARM processor.
Supervisor mode is the mode that the processor is in after reset and is generally the mode that an operating
system kernel operates in.
System mode is a special version of user mode that allows full read-write access to the cpsr.
Undefined mode is used when the processor encounters an instruction that is undefined or not supported
by the implementation.
User mode is used for programs and applications.
Interrupt Masks:
Interrupt masks are used to stop specific interrupt requests from interrupting the processor. There are two
interrupt request levels available on the ARM processor core—interrupt request (IRQ) and fast interrupt request
(FIQ).
The cpsr has two interrupt mask bits, 7 and 6 (or I and F ), which control the masking of IRQ and FIQ,
respectively. The I bit masks IRQ when set to binary 1, and similarly the F bit masks FIQ when set to binary 1.
‘T’ State:
The ARM instruction set is only active when the processor is in ARM state (T=0). Similarly the Thumb
instruction set is only active when the processor is in Thumb state (T=1).
Registers
General-purpose registers hold either data or an address. They are identified with the letter r prefixed to the
register number, there are up to 37 registers. The ARM processor has three registers assigned to a particular task
or special function: r13, r14, and r15. They are frequently given different labels to differentiate them from the
other registers.
SMVITM,UDUPI Page 8
MICROCONTROLLER AND EMBEDDED SYSTEMS
Register r13 is used as the stack pointer (sp) and stores the head of the stack in the current processor
mode.
Register r14 is called the link register (lr) and is where the core puts the return address whenever it calls a
subroutine.
Register r15 is the program counter (pc) and contains the address of the next instruction to be fetched by
the processor.
Every processor mode except user mode can change mode by writing directly to the mode bits of the cpsr. All
processor modes except system mode have a set of associated banked registers that are a subset of the main 16
registers. A banked register maps one-to- one onto a user mode register. If you change processor mode, a banked
register from the new mode will replace an existing register.
For example, when the processor is in the interrupt request mode, the instructions you execute still access
registers named r13 and r14. However, these registers are the banked registers r13_irq and r14_irq. The user
mode registers r13_usr and r14_usr are not affected by the instruction referencing these registers. A program still
has normal access to the other registers r0 to r12.
Saved program status register (spsr) stores the previous mode’s cpsr. You can see in the diagram the cpsr being
copied into spsr_xxx. Note that the spsr can only be modified and read in a privileged mode. There is no spsr
available in user mode.
SMVITM,UDUPI Page 9
MICROCONTROLLER AND EMBEDDED SYSTEMS
Conditional flags
Condition flags are updated by comparisons and the result of ALU operations. Most ARM instructions can be
executed conditionally on the value of the condition flags. These flags are located in the most significant bits in
the cpsr. These bits are used for conditional execution.
Conditional execution controls whether or not the core will execute an instruction. Most instructions have a
condition attribute that determines if the core will execute it based on the setting of the condition flags. Prior to
execution, the processor compares the condition attribute with the condition flags in the cpsr. If they match, then
the instruction is executed; otherwise the instruction is ignored.
The condition attribute is postfixed to the instruction mnemonic, which is encoded into the instruction. Below
table lists the conditional execution code mnemonics.
Pipeline
Using a pipeline speeds up execution by fetching the next instruction while other instructions are being decoded
and executed.
SMVITM,UDUPI Page 10
MICROCONTROLLER AND EMBEDDED SYSTEMS
It shows a sequence of three instructions being fetched, decoded, and executed by the processor. Each instruction
takes a single cycle to complete after the pipeline is filled.
As the pipeline length increases, the amount of work done at each stage is reduced, which allows the processor to
attain a higher operating frequency. This in turn increases the performance.
The memory map address 0x00000000 is reserved for the vector table, a set of 32-bit words. On some processors
the vector table can be optionally located at a higher address in memory (starting at the offset 0xffff0000).
When an exception or interrupt occurs, the processor suspends normal execution and starts loading instructions
from the exception vector table (see Table). Each vector table entry contains a form of branch instruction pointing
to the start of a specific routine:
SMVITM,UDUPI Page 11
MICROCONTROLLER AND EMBEDDED SYSTEMS
Reset vector is the location of the first instruction executed by the processor when power is applied. This
instruction branches to the initialization code.
Undefined instruction vector is used when the processor cannot decode an instruction.
Software interrupt vector is called when you execute a SWI instruction. The SWI instruction is frequently
used as the mechanism to invoke an operating system routine.
Interrupt request vector is used by external hardware to interrupt the normal execution flow of the
processor. It can only be raised if IRQs are not masked in the cpsr.
Core Extensions
The hardware extensions covered in this section are standard components placed next to the ARM core. They
improve performance, manage resources, and provide extra functionality and are designed to provide flexibility in
handling particular applications. Each ARM family has different extensions available.
Von Neumann–style cores combines both data and instruction into a single unified cache, as shown in Figure. For
simplicity, we have called the glue logic that connects the memory system to the AMBA bus logic and control.
SMVITM,UDUPI Page 12
MICROCONTROLLER AND EMBEDDED SYSTEMS
Tightly coupled memory (TCM) is fast SRAM located close to the core and guarantees the clock cycles required
to fetch instructions or data—critical for real-time algorithms requiring deterministic behavior. TCMs appear as
memory in the address map and can be accessed as fast memory. An example of a processor with TCMs is shown
in Figure.
By combining both technologies, ARM processors can have both improved performance and predictable real-
time response. Below figure shows an example core with a combination of caches and TCMs.
SMVITM,UDUPI Page 13
MODULE 2
ARM INSTRUCTION SET
Prepared By: Mr. Chetan.R, Sr. Asst. Professor,ECE Dept.
SMVITM,UDUPI Page 1
Move Instructions
Arithmetic Instructions
Logical Instructions
Comparison Instructions
SMVITM,UDUPI Page 2
Multiply Instructions
Branch Instructions
Load-Store Instructions
SMVITM,UDUPI Page 3
Single-Register Load-Store Addressing Modes
Loading Constants
Multiple-Register Transfer
SMVITM,UDUPI Page 4
Swap Instruction
Coprocessor Instructions
SMVITM,UDUPI Page 5
Shift and Rotate Instructions
SMVITM,UDUPI Page 6
ARM7 ASSEMBLY LEVEL PROGRAMS
PREPARED BY: MR.CHETAN R, SR. ASST. PROFESSOR
; TO READ FROM MEMORY ;TO READ FROM MEMORY ;TO READ FROM MEMORY ;TO READ FROM MEMORY
AND WRITE TO MEMORY AND WRITE TO MEMORY AND WRITE TO MEMORY
(ANOTHER TYPE) (USING PRE INDEXED
ADDRESSING MODE)
AREA AREA AREA
ARMPGM,CODE,READONLY ARMPGM,CODE,READONLY ARMPGM,CODE,READONLY AREA
ENTRY ENTRY ENTRY ARMPGM,CODE,READONLY
LDR R0,=MEMORY LDR R0,=MEMORY LDR R0,=0X40000000 ENTRY
LDR R1, [R0] LDR R1, [R0] LDR R1, [R0] LDR R0,=0X40000000
LDR R1, [R0]
MEMORY DCD 0X12345678 LDR R0, DEST LDR R0, =0X40000010
STR R1,[R0] STR R1,[R0] STR R1,[R0,#4]
HERE B HERE
MEMORY DCD 0X12345678 HERE B HERE HERE B HERE
END
DEST DCD 0X40000000 END END
HERE B HERE
END
C COMPILERS & OPTIMIZATION
Basic C Data Types, C Looping Structures, Register Allocation, Function, Calls, Pointer Aliasing
In Table 5.1 loads that act on 8- or 16-bit values extend the value to 32 bits before writing to an ARM register.
Unsigned values are zero-extended, and signed values sign-extended. This means that the cast of a loaded value
to an int type does not cost extra instructions.
SMVITM,UDUPI Page 1
C COMPILERS & OPTIMIZATION
The following code checksums a data packet containing 64 words. It shows why you should avoid using char
for local variables.
At first sight it looks as though declaring i as a char is efficient. You may be thinking that a char uses less
register space or less space on the ARM stack than an int. On the ARM, both these assumptions are wrong. All
ARM registers are 32-bit and all stack entries are at least 32-bit.
Case i: Consider the compiler output for this function. Case ii: Now compare this to the compiler output where
instead we declare i as an unsigned int.
In the first case, the compiler inserts an extra AND instruction to reduce i to the range 0 to 255 before the
comparison with 64. This instruction disappears in the second case.
SMVITM,UDUPI Page 2
C COMPILERS & OPTIMIZATION
The armcc output for add_v1 shows that the compiler casts the return value to a short type, but does not cast the
input values. It assumes that the caller has already ensured that the 32-bit values r0 and r1 are in the range of the
short type. This shows narrow passing of arguments and return value.
Whatever the merits of different narrow and wide calling protocols, you can see that char or short type function
arguments and return values introduce extra casts. These increase code size and decrease performance. It is
more efficient to use the int type for function arguments and return values, even if you are only passing an 8-bit
value.
C LOOPING STRUCTURES
This shows how the compiler treats a
loop with incrementing count i++.
SMVITM,UDUPI Page 3
C COMPILERS & OPTIMIZATION
■ An ADD to increment i
■ A compare to check if i is less than 64
■ A conditional branch to continue the loop if i < 64 This is not efficient.
■ A subtract to decrement the loop counter, which also sets the condition code flags on the result
■ A conditional branch instruction
The key point is that the loop counter should count down to zero rather than counting up to some arbitrary limit.
Then the comparison with zero is free since the result is stored in the condition flags. Since we are no longer
using i as an array index, there is no problem in counting down rather than up.
Below example shows the improvement if we switch to a decrementing loop rather than an incrementing loop
SMVITM,UDUPI Page 4
C COMPILERS & OPTIMIZATION
Notice that the compiler checks that N is nonzero on entry to the function. Often this check is unnecessary since
you know that the array won’t be empty.
In this case a do-while loop gives better performance and code density than a for loop.
Use a do-while loop to remove the test for N being zero that occurs in a for loop.
LOOP UNROLLING
Each loop iteration costs two instructions in addition to the body of the loop: a subtract to decrement the
loop count and a conditional branch. We call these instructions the loop overhead. On ARM7 or ARM9
processors the subtract takes one cycle and the branch three cycles, giving an overhead of four cycles per
loop.
You can save some of these cycles by unrolling a loop—repeating the loop body several times, and reducing
the number of loop iterations by the same proportion. For example, let’s unroll our packet checksum
example four times.
The following code unrolls our packet checksum loop by four times. We assume that the number of words in the
packet N is a multiple of four.
We have reduced the loop overhead from 4N cycles to (4N)/4=N cycles. On the ARM7TDMI, this
accelerates the loop from 8 cycles per accumulate to 20/4 = 5 cycles per accumulate, nearly doubling the
speed! For the ARM9TDMI, which has a faster load instruction, the benefit is even higher.
Unrolling will increase the code size with little performance benefit. Unrolling may even reduce
performance by evicting more important code from the cache.
SMVITM,UDUPI Page 5
C COMPILERS & OPTIMIZATION
REGISTER ALLOCATION
The compiler attempts to allocate a processor register to each local variable you use in a C function. It will try to
use the same register for different local variables if the use of the variables do not overlap. When there are more
local variables than available registers, the compiler stores the excess variables on the processor stack. These
variables are called spilled or swapped out variables since they are written out to memory (in a similar way
virtual memory is swapped out to disk). Spilled variables are slow to access compared to variables allocated to
registers.
SMVITM,UDUPI Page 6
C COMPILERS & OPTIMIZATION
To ensure good assignment to registers, you should try to limit the internal loop of functions to using at most 12
local variables.
FUNCTION CALLS
The ARM Procedure Call Standard (APCS) defines how to pass function arguments and return values in
ARM registers.
The first point to note about the procedure call standard is the four-register rule. Functions with four or
fewer arguments are far more efficient to call than functions with five or more arguments. For functions
with four or fewer arguments, the compiler can pass all the arguments in registers. For functions with more
arguments, both the caller and callee must access the stack for some arguments.
The first four integer arguments are passed in the first four ARM registers: r0, r1, r2, and r3. Subsequent
integer arguments are placed on the full descending stack, ascending in memory as in Figure 5.1. Function
return integer values are passed in r0.
If C function needs more than four arguments, or your C++ method more than three explicit arguments, then
it is almost always more efficient to use structures. Group related arguments into structures, and pass a
structure pointer rather than multiple arguments.
POINTER ALIASING
Two pointers are said to alias when they point to the same address. If you write to one pointer, it will affect the
value you read from the other pointer. In a function, the compiler often doesn’t know which pointers can alias
and which pointers can’t.
SMVITM,UDUPI Page 7
C COMPILERS & OPTIMIZATION
Note that the compiler loads from step twice. Usually a compiler optimization called common sub expression
elimination would kick in so that *step was only evaluated once, and the value reused for the second
occurrence. However, the compiler can’t use this optimization here. The pointers timer1 and step might alias
one another. In other words, the compiler cannot be sure that the write to timer1 doesn’t affect the read from
step. In this case the second value of *step is different from the first and has the value *timer1. This forces the
compiler to insert an extra load instruction.
The same problem occurs if you use structure accesses rather than direct pointer access. The following code also
compiles inefficiently:
The compiler evaluates state->step twice in case state->step and timers->timer1 are at the same memory
address. The fix is easy: Create a new local variable to hold the value of state->step so the compiler only
performs a single load.
In the code for timers_v3 we use a local variable step to hold the value of state->step. Now the compiler does
not need to worry that state may alias with timers.
SMVITM,UDUPI Page 8
MODULE 3
C COMPILERS & OPTIMIZATION
Structure Arrangement, Bit-fields, Unaligned Data and Endianness, Division, Floating Point, Inline Functions and Inline
Assembly, Portability Issues.
The way you lay out a frequently used structure can have a significant impact on its performance and code density.
There are two issues concerning structures on the ARM: alignment of the structure entries and the overall size of the
structure.
ARM compilers will automatically align the start address of a structure to a multiple of the largest access width used
within the structure (usually four or eight bytes) and align entries within structures to their access width by inserting
padding. For example, consider the structure
For a little-endian memory system the compiler will lay this out adding padding to ensure that the next
object is aligned to the size of that object:
This reduces the structure size from 12 bytes to 8 bytes, with the following new layout:
Therefore, it is a good idea to group structure elements of the same size, so that the structure layout doesn’t contain
unnecessary padding. The armcc compiler does include a keyword __packed that removes all padding. For example,
the structure
However, packed structures are slow and inefficient to access. The compiler emulates unaligned load and store
operations by using several aligned accesses with data operations to merge the results. Only use the __packed
keyword where space is far more important than speed and you can’t reduce padding by rearragement. Also use it for
porting code that assumes a certain structure layout in memory.
SMVITM,UDUPI Page 1
C COMPILERS & OPTIMIZATION
BIT-FIELDS
The compiler can choose how bits are allocated within the bit-field container. Different compilers can assign the same
bit-field different bit positions in the container. It is also a good idea to avoid bit-fields for efficiency.
Bit-fields are structure elements and usually accessed using structure pointers; consequently, they suffer from the
pointer aliasing problems described in Section 5.6. Every bit-field access is really a memory access.
Possible pointer aliasing often forces the compiler to reload the bit-field several times. compilers do not tend to
optimize bit-field testing very well.
You can generate far more efficient code by using an integer rather than a bit-field. Use enum or #define masks to
divide the integer type into different fields.
Now that a single unsigned long type contains all the bit-fields, we can keep a copy of their values in a single local
variable stages, which removes the memory aliasing problem. The compiler generates the following code giving a
saving of 33% over the previous version using ANSI bit-fields:
You can also use the masks to set and clear the bit-fields, just as easily as for testing them. The following code shows
how to set, clear, or toggle bits using the STAGE masks:
These bit set, clear, and toggle operations take only one ARM instruction each, using ORR, BIC, and EOR
instructions, respectively. Another advantage is that you can now manipulate several bit-fields at the same time, using
one instruction. For example:
stages |= (STAGEA | STAGEB); /* enable stages A and B */
stages &= ∼(STAGEA | STAGEC); /* disable stages A and C */
Unaligned data and endianness are two issues that can complicate memory accesses and portability. Is the array
pointer aligned? Is the ARM configured for a big-endian or little endian memory system?
The ARM load and store instructions assume that the address is a multiple of the type you are loading or storing. If
you load or store to an address that is not aligned to its type, then the behavior depends on the particular
implementation. The core may generate a data abort or load a rotated value. For well-written, portable code you
should avoid unaligned accesses.
You are likely to meet alignment problems when reading data packets or files used to transfer information between
computers. Network packets and compressed image files are good examples. Two- or four-byte integers may appear at
arbitrary offsets in these files. Data has been squeezed as much as possible, to the detriment of alignment.
Endianness (or byte order) is also a big issue when reading data packets or compressed files. The ARM core can be
configured to work in little-endian (least significant byte at lowest address) or big-endian (most significant byte at
lowest address) modes. Little-endian mode is usually the default.
The endianness of an ARM is usually set at power-up and remains fixed thereafter. Tables 5.6 and 5.7 illustrate how
the ARM’s 8-bit, 16-bit, and 32-bit load and store instructions work for different endian configurations. We assume
that byte address A is aligned to the size of the memory transfer. The tables show how the byte addresses in memory
map into the 32-bit register that the instruction loads or stores.
SMVITM,UDUPI Page 2
C COMPILERS & OPTIMIZATION
What is the best way to deal with endian and alignment problems? If speed is not critical, then use functions like
readint_little and readint_big in Example 5.10, which read a four-byte integer from a possibly unaligned address in
memory. The address alignment is not known at compile time, only at run time.
If you’ve loaded a file containing bigendian data such as a JPEG image, then use readint_big. For a bytestream
containing little-endian data, use readint_little. Both routines will work correctly regardless of the memory endianness
ARM is configured for.
DIVISION
The ARM does not have a divide instruction in hardware. Instead the compiler implements divisions by calling
software routines in the C library.
There are many different types of division routine that you can tailor to a specific range of numerator and denominator
values. The standard integer division routine provided in the C library can take between 20 and 100 cycles, depending
on implementation, early termination, and the ranges of the input operands.
Division and modulus (/ and %) are such slow operations that you should avoid them as much as possible. However,
division by a constant and repeated division by the same denominator can be handled efficiently. This section
describes how to replace certain divisions by multiplications and how to minimize the number of division calls.
Circular buffers are one area where programmers often use division, but you can avoid these divisions completely.
Suppose you have a circular buffer of size buffer_size bytes and a position indicated by a buffer offset. To advance the
offset by increment bytes you could write
SMVITM,UDUPI Page 3
C COMPILERS & OPTIMIZATION
The first version may take 50 cycles; the second will take 3 cycles because it does not involve a division.
If you can’t avoid a division, then try to arrange that the numerator and denominator are unsigned integers. Signed
division routines are slower since they take the absolute values of the numerator and denominator and then call the
unsigned division routine. They fix the sign of the result afterwards.
Many C library division routines return the quotient and remainder from the division. In other words a free remainder
operation is available to you with each division operation and vice versa.
The following routine, scale, shows how to convert divisions to multiplications in practice. It divides an array of N
elements by denominator d. We first calculate the value of s as above. Then we replace each divide by d with a
multiplication by s. The 64-bit multiply is cheap because the ARM has an instruction UMULL, which multiplies two
32-bit values, giving a 64-bit result.
Here we have assumed that the numerator and denominator are 32-bit unsigned integers. Of course, the algorithm works
equally well for 16-bit unsigned integers using a 32-bit multiply, or for 64-bit integers using a 128-bit multiply. You
should choose the narrowest width for your data. If your data is 16-bit, then set s = (216 − 1)/d and estimate q using a
standard integer C multiply.
SMVITM,UDUPI Page 4
C COMPILERS & OPTIMIZATION
FLOATING POINT
The majority of ARM processor implementations do not provide hardware floating-point support, which saves on
power and area when using ARM in a price-sensitive, embedded application.
With the exceptions of the Floating Point Accelerator (FPA) used on the ARM7500FE and the Vector Floating Point
accelerator (VFP) hardware, the C compiler must provide support for floating point in software.
In practice, this means that the C compiler converts every floating-point operation into a subroutine call. The C library
contains subroutines to simulate floating-point behavior using integer arithmetic.
This code is written in highly optimized assembly. Even so, floating-point algorithms will execute far more slowly
than corresponding integer algorithms.
If you need fast execution and fractional values, you should use fixed-point or blockfloating algorithms. Fractional
values are most often used when processing digital signals such as audio and video.
You can remove the function call overhead completely by inlining functions. Additionally many compilers allow you
to include inline assembly in your C source code. Using inline functions that contain assembly you can get the
compiler to support ARM instructions and optimizations that aren’t usually available.
The inline assembler is part of the C compiler. The C compiler still performs register allocation, function entry, and
exit. The compiler also attempts to optimize the inline assembly you write, or deoptimize it for debug mode. Although
the compiler output will be functionally equivalent to your inline assembly, it may not be identical.
The main benefit of inline functions and inline assembly is to make accessible in C operations that are not usually
available as part of the C language. It is better to use inline functions rather than #define macros because the latter
doesn’t check the types of the function arguments and return value.
PORTABILITY ISSUES
■ The char type. On the ARM, char is unsigned rather than signed as for many other processors. A common problem concerns
loops that use a char loop counter i and the continuation condition i ≥ 0, they become infinite loops. In this situation, armcc
produces a warning of unsigned comparison with zero. You should either use a compiler option to make char signed or change
loop counters to type int.
■ The int type. Some older architectures use a 16-bit int, which may cause problems when moving to ARM’s 32-bit int type
although this is rare nowadays. Note that expressions are promoted to an int type before evaluation.
■ Unaligned data pointers. Some processors support the loading of short and int typed values from unaligned addresses. A C
program may manipulate pointers directly so that they become unaligned, for example, by casting a char * to an int *. ARM
architectures up to ARMv5TE do not support unaligned pointers. To detect them, run the program on an ARM with an
alignment checking trap.
■ Endian assumptions. C code may make assumptions about the endianness of a memory system, for example, by casting a
char * to an int *.
■ Function prototyping. The armcc compiler passes arguments narrow, that is, reduced to the range of the argument type. If
functions are not prototyped correctly, then the function may return the wrong answer. Other compilers that pass arguments
wide may give the correct answer even if the function prototype is incorrect.
■ Use of bit-fields. The layout of bits within a bit-field is implementation and endian dependent. If C code assumes that bits are
laid out in a certain order, then the code is not portable.
■ Inline assembly. Using inline assembly in C code reduces portability between architectures. You should separate any inline
assembly into small inlined functions that can easily be replaced. It is also useful to supply reference, plain C implementations
of these functions that can be used on other architectures, where this is possible.
SMVITM,UDUPI Page 5
MODULE 4
MODULE-4
Embedded System Components: Embedded Vs General computing system, History of embedded systems,
Classification of Embedded systems, Major applications areas of embedded systems, purpose of embedded
systems.
Core of an Embedded System including all types of processor/controller, Memory, Sensors, Actuators, LED, 7
segment LED display, stepper motor, keyboard, Push button switch, Communication Interface (onboard and
external types), Embedded firmware, Other system components.
Textbook 2: Chapter 1 (Sections 1.2 to 1.6), Chapter 2 (Sections 2.1 to 2.6)
Text book: Shibu K V, “Introduction to Embedded Systems”, Tata McGraw Hill Education Private Limited, 2nd
Edition
Every embedded system is unique, and the hardware as well as the firmware is highly specialized to the
application domain. Embedded systems are becoming an inevitable part of any product or equipment in all
fields including household appliances, telecommunications, medical equipment, industrial control, consumer
products, etc.
SMVITM,UDUPI Page 1
EMBEDDED SYSTEM COMPONENTS
1. On generation
First generation (1G):
Built around 8bit microprocessor & microcontroller.
Simple in hardware circuit & firmware developed.
Examples: Digital telephone keypads.
Second generation (2G):
Built around 16-bit μp & 8-bit μc.
They are more complex & powerful than 1G μp & μc.
Examples: SCADA systems
Third generation (3G):
Built around 32-bit μp & 16-bit μc.
Concepts like Digital Signal Processors(DSPs), Application Specific Integrated Circuits(ASICs)
evolved.
Examples: Robotics, Media, etc.
Fourth generation:
Built around 64-bit μp & 32-bit μc.
The concept of System on Chips (SoC), Multicore Processors evolved.
Highly complex & very powerful.
Examples: Smart Phones.
SMVITM,UDUPI Page 2
EMBEDDED SYSTEM COMPONENTS
3. On deterministic behaviour
This classification is applicable for “Real Time” systems.
The task execution behaviour for an embedded systemmay be deterministic or non-deterministic.
Based on execution behaviour Real Time embeddedsystems are divided into Hard and Soft.
4. On triggering
Embedded systems which are “Reactive” in nature canbe based on triggering.
Reactive systems can be:
Event triggered
Time triggered
The application areas and the products in the embedded domain are countless.
1. Data collection/Storage/Representation
2. Data communication
3. Data (signal) processing
4. Monitoring
5. Control
6. Application specific user interface
1. Data Collection/Storage/Representation:
Embedded system designed for the purpose of data collection performs acquisition of data from the
external world.
Data collection is usually done for storage, analysis, manipulation and transmission.
Data can be analog or digital.
Embedded systems with analog data capturing techniques collect data directly in the form of analog signal
whereas an embedded system with digital data collection mechanism converts the analog signal to the
digital signal using analog to digital converters.
SMVITM,UDUPI Page 3
EMBEDDED SYSTEM COMPONENTS
2. Data communication:
Embedded data communication systems are deployed in applications from complex satellite
communication to simple home networking systems.
The transmission of data is achieved either by a wire-line medium or by a wire-less medium.
Data can either be transmitted by analog means or by digital means.
Wireless modules-Bluetooth, Wi-Fi.
Wire-line modules-USB, TCP/IP.
Network hubs, routers, switches are examples of dedicated data transmission embedded systems.
4. Monitoring:
All embedded products coming under the medical domain are with monitoring functions.
Electro cardiogram machine is intended to do the monitoring of the heartbeat of a patient but it cannot
impose control over the heartbeat.
Other examples with monitoring function are digital CRO, digital multi-meters, and logic analyzers.
5. Control:
A system with control functionality contains both sensors and actuators.
Sensors are connected to the input port for capturing the changes in environmental variable and the
actuators connected to the output port are controlled according to the changes in the input variable.
Air conditioner system used to control the room temperature to a specified limit is a typical example
for CONTROL purpose.
SMVITM,UDUPI Page 4
EMBEDDED SYSTEM COMPONENTS
Embedded systems are basically designed to regulate a physical variable (such Microwave Oven) or to
manipulate the state of some devices by sending some signals to the actuators or devices connected to the output
port system (such as temperature in Air Conditioner), in response to the input signal provided by the end users
or sensors which are connected to the input ports.
SMVITM,UDUPI Page 5
EMBEDDED SYSTEM COMPONENTS
Embedded systems are domain and application specific and are built around a central core. The core of the
embedded system falls into any of the following categories:
1. General purpose and Domain Specific Processors
Microprocessors
Microcontrollers
Digital Signal Processors
2. Application Specific Integrated Circuits. (ASIC)
3. Programmable logic devices(PLD‟s)
4. Commercial off-the-shelf components (COTs)
1.1 Microprocessors
A microprocessor is a silicon chip representing a central processing unit. · A microprocessor is a
dependent unit and it requires the combination of other hardware like memory, timer unit, and interrupt
controller, etc. for proper functioning. · Developers of microprocessors.
o Intel – Intel 4004 – November 1971(4-bit).
o Intel – Intel 4040.
o Intel – Intel 8008 – April 1972.
o Intel – Intel 8080 – April 1974(8-bit).
o Motorola – Motorola 6800.
o Intel – Intel 8085 – 1976.
o Zilog - Z80 – July 1976.
1.2 Microcontrollers
A microcontroller is a highly integrated chip that contains a CPU, scratch pad RAM, special and general
purpose register arrays ,on chip ROM/FLASH memory for program storage , timer and interrupt control
units and dedicated I/O ports.
Texas Instrument‟s TMS 1000 Is considered as the world‟s first microcontroller.
Some embedded system application require only 8 bit controllers whereas some requiring superior
performance and computational needs demand 16/32 bit controllers.
The instruction set of a microcontroller can be RISC or CISC
Microcontrollers are designed for either general purpose application requirement or domain specific
application requirement.
SMVITM,UDUPI Page 6
EMBEDDED SYSTEM COMPONENTS
Microprocessors/controllers based on the von-neumann architecture shares a single common bus for fetching
both instructions and data. Program instructions and data are stored in a common main memory. Von-Neumann
architecture based processors/controllers first fetch an instruction and them fetch the data to support the
instruction from code memory. The two separate fetches slows down the controller‟s operation. Von-Neumann
architecture is also referred as Princeton architecture, since it was developed by the Princeton University.
Microprocessors/controllers based on the Harvard architecture will have separate data bus and instruction bus.
This allows the data transfer and program fetching to occur simultaneously on both buses. With Harvard
architecture, the data memory can be read and written while the program memory is being accessed. These
separated data memory and code memory buses allow one instruction to execute while the next instruction is
fetched (“pre-fetching”). The pre-fetch theoretically allows much faster execution than Von-Neumann
architecture. Since some additional hardware logic is required for the generation of control signals for this type
of operation it adds silicon complexity to the system. Fig 2.2 explain the Harvard and Von-Neumann
architecture concept.
SMVITM,UDUPI Page 7
EMBEDDED SYSTEM COMPONENTS
SMVITM,UDUPI Page 8
EMBEDDED SYSTEM COMPONENTS
Endianness specifies the order in which the data is stored in the memory by processor operations in a multi byte
system (Processors whose word size is greater than one byte). Suppose the word length is two byte then data can
be stored in memory in two different ways:
(1) Higher order of data byte at the higher memory and lower order of data byte at location just below the higher
memory.
(2) Lower order of data byte at the higher memory and higher order of data byte at location just below the
higher memory.
Little-endian (Fig. 2.3) means the lower-order byte of the data is stored in memory at the lowest address, and the
higher-order byte at the highest address. (The little end comes first.)
For example, a 4 byte long integer Byte3 Byte2 Byte1 Byte0 will be stored in the memory as shown below:
Big-endian (Fig. 2.4) means the higher-order byte of the data is stored in memory at the lowest address, and the
lower-order byte at the highest address. (The big end comes first.) For example, a 4 byte long integer Byte3
Byte2 Byte1 Byte0 will be stored in the memory as follows :
SMVITM,UDUPI Page 9
EMBEDDED SYSTEM COMPONENTS
Advantages of PLDs :-
1) PLDs offer customer much more flexibility during the design cycle.
2) PLDs do not require long lead times for prototypes or production parts because PLDs are already on a
distributors shelf and ready for shipment.
SMVITM,UDUPI Page 10
EMBEDDED SYSTEM COMPONENTS
SMVITM,UDUPI Page 11
EMBEDDED SYSTEM COMPONENTS
Stepper Motor:
Stepper motor is an electro mechanical device which generates discrete displacement (motion) in response
to dc electrical signals.
It differs from the normal dc motor in its operation. The dc motor produces continuous rotation on applying
dc voltage whereas a stepper motor produces discrete rotation in response to the dc voltage applied to it.
Stepper motors are widely used in industrial embedded applications, consumer electronic products and
robotics control systems.
The paper feed mechanism of a printer/fax makes use of stepper motors for its functioning.
Based on the coil winding arrangements, a two phase stepper motor is classified into
Unipolar
Bipolar
Unipolar:
A unipolar stepper motor contains two windings per phase. The direction of rotation (clockwise or
anticlockwise) of a stepper motor is controlled by changing the direction of current flow. Current in one
direction flows through one coil and in the opposite direction flows through the other coil. It is easy to shift the
direction of rotation by just switching the terminals to which the coils are connected .
SMVITM,UDUPI Page 12
EMBEDDED SYSTEM COMPONENTS
Bipolar:
A bipolar stepper motor contains single winding per phase. For reversing the motor rotation the current flow
through the windings is reversed dynamically. It requires complex circuitry for current flow reversal.
In the wave step mode only one phase is energized at a time and each coils of the phase is energized alternatively. The
coils A, B, C, and D are energized in the following order:
The rotation of the stepper motor can be reversed by reversing the order in which the coil is energised.
Two-phase unipolar stepper motors are the popular choice for embedded applications. The current requirement for stepper
motor is little high and hence the port pins of a microcontroller/processor may not be able to drive them directly. Also the
supply voltage required to operate stepper motor varies normally in the range 5V to 24 V. Depending on the current and
voltage requirements, special driving circuits are required to interface the stepper motor with microcontroller/processors.
Commercial off-the-shelf stepper motor driver ICs are available in the market and they can be directly interfaced to the
microcontroller port. ULN2803 is an octal peripheral driver array available from Texas Instruments and ST
microelectronics for driving a 5V stepper motor. Simple driving circuit can also be built using transistors. The following
circuit diagram (Fig. 2.20) illustrates the interfacing of a stepper motor through a driver circuit connected to the port pins
of a microcontroller/processor.
SMVITM,UDUPI Page 13
EMBEDDED SYSTEM COMPONENTS
Keyboard
Keyboard is an input device for user interfacing. If the number of keys required is very limited, push
button switches can be used and they can be directly interfaced to the port pins for reading.
However, there may be situations demanding a large number of keys for user input. In such situations it
may not be possible to interface each keys to a port pin due to the limitation in the number of general
purpose port pins available for the processor/controller in use and moreover it is wastage of port pins.
Matrix keyboard is an optimum solution for handling large key requirements.
It greatly reduces the number of interface connections.
In a matrix keyboard, the keys are arranged in matrix fashion (i.e. they are connected in a row and column
style).
For detecting a key press, the keyboard uses the scanning technique, where each row of the matrix is pulled
low and the columns are read.
After reading the status of each columns corresponding to a row, the row is pulled high and the next row is
pulled low and the status of the columns are read. This process is repeated until the scanning for all rows are
completed.
When a row is pulled low and if a key connected to the row is pressed, reading the column to which the key
is connected will give logic 0.
SMVITM,UDUPI Page 14
EMBEDDED SYSTEM COMPONENTS
MEMORY
Memory is an important part of a processor/controller based embedded systems. Some of the
processors/controllers contain built in memory and this memory is referred as on-chip memory. Others do not
contain any memory inside the chip and requires external memory to be connected with the controller/processor
to store the control algorithm. It is called off-chip memory. Also some working memory is required for
holding data temporarily during certain operations. This section deals with the different types of memory used
in embedded system applications.
The code memory retains its contents even after the power to it is turned off. It is generally known as
non-volatile storage memory. Depending on the fabrication, erasing, and programming techniques they are
classified into the following types.
Masked ROM (MROM) Masked ROM is a one-time programmable device. Masked ROM makes use of
the hardwired technology for storing data. The device is factory programmed by masking and metallisation
process at the time of production itself, according to the data provided by the end user. The primary
advantage of this is low cost for high volume production. They are the least expensive type of solid state
memory. Different mechanisms are used for the masking process of the ROM, like
(1) Creation of an enhancement or depletion mode transistor through channel implant.
(2) By creating the memory cell either using a standard transistor or a high threshold transistor. In the high
threshold mode, the supply voltage required to turn ON the transistor is above the normal ROM IC
operating voltage. This ensures that the transistor is always off and the memory cell stores always logic 0.
Masked ROM is a good candidate for storing the embedded firmware for low cost embedded devices.
Once the design is proven and the firmware requirements are tested and frozen, the binary data (The firmware
cross compiled/assembled to target processor specific machine code) corresponding to it can be given to the
MROM fabricator. The limitation with MROM based firmware storage is the inability to modify the device
firmware against firmware upgrades. Since the MROM is permanent in bit storage, it is not possible to alter
the bit information.
Read/Write
Memory (RAM)
Static RAM (SRAM) Static RAM stores data in the form of voltage. They are made up of flip- flops.
Static RAM is the fastest form of RAM available. In typical implementation, an SRAM cell (bit) is realised
using six transistors (or 6 MOSFETs). Four of the transistors are used for building the latch (flip- flop) part
of the memory cell and two for controlling the access. SRAM is fast in operation due to its resistive
networking and switching capabilities. In its simplest representation an SRAM cell can be visualised as
shown in Fig. 2.10.
Q5 Q6
Q2 Q4
Vcc
Word Line
SMVITM,UDUPI Page 17
This implementation in its simpler form can be visualised as two-cross coupled inverters with read/write
control through transistors. The four transistors in the middle form the cross-coupled inverters. This can be
visualised as shown in Fig. 2.11.
From the SRAM implementation diagram, it is clear Write control Read control
that access to the memory cell is controlledby the
line Word Line, which controls the access
transistors (MOSFETs) Q5 and Q6. The access Data to Data
transistors control the connection to bit lines B & B\. write read
In order to write a value to the memory cell, apply the
desired value to the bit control lines (For writing 1,
make B = 1 and B\ =0; For writing 0, make B = 0 and Fig. 2.11 Visualisation of SRAM cell
B\ =1) and assert the Word Line (Make Word line
high). This operation latches the bit written in the flip-flop. For reading the content of the memory cell, assert
both B and B\ bit lines to 1 and set the Word line to 1.
The major limitations of SRAM are low capacity and high cost. Since a minimum of six transistors
are required to build a single memory cell, imagine how many memory cells we can fabricate on a silicon
wafer.
Dynamic RAM (DRAM) Dynamic RAM stores data in the form of Bit line B
charge. They are made up of MOS transistor gates. The advantages of
DRAM are its high density and low cost compared to SRAM. The
disadvantage is that since the information is stored as charge it gets leaked
off with time and to prevent this they need to be refreshed periodically. Word line
Special circuits called DRAM controllers are used for the refreshing
operation. The refresh operation is done periodically in milliseconds +
interval. Figure 2.12 illustrates the typical implementation of a DRAM –
cell.
The MOSFET acts as the gate for the incoming and outgoing data
whereas the capacitor acts as the bit storage unit. Table given below Fig. 2.12 DRAM cell implementation
summarises the relative merits and demerits of SRAM and DRAM
technology.
NVRAM Non-volatile RAM is a random access memory with battery backup. It contains static RAM based
memory and a minute battery for providing supply to the memory in the absence of external power supply.
The memory and battery are packed together in a single package. NVRAM is used for the non- volatile
storage of results of operations or for setting up of flags, etc. The life span of NVRAM is expected to be
around 10 years. DS1744 from Maxim/Dallas is an example for 32KB NVRAM.
SMVITM,UDUPI Page 18
EMBEDDED SYSTEM COMPONENTS
COMMUNICATION INTERFACES
Developed and patented by Philips for connecting low speed peripherals to a motherboard, embedded
system or cell phone
Two wire bus , Half duplex, Serial communication, Synchronous, data up to 100 kbits/sec
Serial data line (SDA)
Serial clock line (SCL)
Master controls clock for slaves
Each connected slave has a unique 7-bit address
SMVITM,UDUPI Page 19
EMBEDDED SYSTEM COMPONENTS
SMVITM,UDUPI Page 20
EMBEDDED SYSTEM COMPONENTS
BLUETOOTH
1. Low cost, Low power, short range wireless technology for data and audio communication.
2. Proposed by Ericsson in 1994.
3. Operates at 2.4Ghz and uses frequency hopping spread spectrum(FHSS).
4. Data rate 1Mbps to 24Mbps.
5. Data range 30 to 100 feet.
6. Two layers: Physical layer & Protocol layer (User defined protocols).
7. Each bluetooth device will have a 48 bit unique identification number.
8. P-P (Master-slave), P-MP( Piconet- Max. slaves 7).
9. File transfer in mobiles, Medical sectors.
For communicating with devices over a Wi-Fi network, the device when its Wi-Fi radio is turned ON,
searches the available Wi- Fi network in its vicinity and lists out the Service Set Identifier (SSID) of the
available networks. If the network is security enabled, a password may be required to connect to a particular
SSID. Wi-Fi employs different security mechanisms like Wired Equivalency Privacy (WEP) Wireless
Protected Access (WPA), etc. for securing the data communication.
SMVITM,UDUPI Page 21
EMBEDDED SYSTEM COMPONENTS
ZIGBEE
ZigBee Coordinator (ZC)/Network Coordinator: The ZigBee coordinator acts as the root of the ZigBee
network. The ZC is responsible for initiating the ZigBee network and it has the capability to store
information about the network.
ZigBee Router (ZR)/Full func on Device (FFD): Responsible for passing information from device to
another device or to another ZR.
ZigBee End Device (ZED) /Reduced Function on Device (RFD): End device containing ZigBee
functionality for data communication. It can talk only with a ZR or ZC and doesn’t have the capability to act
as a mediator for transferring data from one device to another. The diagram shown in Fig. 2.34 gives an
overview of ZC, ZED and ZR in a ZigBee network.
SMVITM,UDUPI Page 22
EMBEDDED SYSTEM COMPONENTS
SMVITM,UDUPI Page 23
EMBEDDED SYSTEM COMPONENTS
Universal Asynchronous Receiver Transmitter (UART) based data transmission is an asynchronous form
of serial data transmission.
The serial communication settings (Baud rate, number of bits per byte, parity, number of start bits and stop
bit and flow control) for both transmitter and receiver should be set as identical.
The start and stop of communication is indicated through inserting special bits in the data stream. While
sending a byte of data, a start bit is added first and a stop bit is added at the end of the bit stream. The least
significant bit of the data byte follows the ‘start’ bit.
The ‘start’ bit informs the receiver that a data byte is about to arrive. The receiver device starts polling its
‘receive line’ as per the baudrate settings. If the baudrate is ‘x’ bits per second, the time slot available for
one bit is 1/x seconds. If parity is enabled for communication, the UART of the transmitting device adds a
parity bit (bit value is 1 for odd number of 1s in the transmitted bit stream and 0 for even number of 1s).
The UART of the receiving device calculates the parity of the bits received and compares it with the
received parity bit for error checking.
The UART of the receiving device discards the ‘Start’, ‘Stop’ and ‘Parity’ bit from the received bit stream
and converts the received serial bit data to a word (In the case of 8 bits/byte, the byte is formed with the
received 8 bits with the first received bit as the LSB and last received data bit as MSB).
For proper communication, the ‘Transmit line’ of the sending device should be connected to the ‘Receive
line’ of the receiving device.
RS-232 C
RS-232 C (Recommended Standard number 232, revision C) is a full duplex, wired, asynchronous serial
communication interface.
The RS-232 interface is developed by the Electronics Industries Association (EIA) during the early 1960s.
RS-232 extends the UART communication signals for external data communication.
UART uses the standard TTL/CMOS logic (Logic ‘High’ corresponds to bit value 1 and Logic ‘Low’
corresponds to bit value 0) for bit transmission whereas RS-232 follows the EIA standard for bit
transmission.
As per the EIA standard, a logic ‘0’ is represented with voltage between +3 and +25V and a logic ‘1’ is
SMVITM,UDUPI Page 24
EMBEDDED SYSTEM COMPONENTS
RS-232 is a point-to-point communication interface and the devices involved in RS-232 communication are
called ‘Data Terminal Equipment (DTE)’ and ‘Data Communication Equipment (DCE)’.
If no data flow control is required, only TXD and RXD signal lines and ground line (GND) are required for
data transmission and reception. The RXD pin of DCE should be connected to the TXD pin of DTE and vice
versa for proper data transmission.
As per the EIA standard RS-232 C supports baudrates up to 20Kbps (Upper limit 19.2 Kbps) The commonly
used baudrates by devices are 300bps, 1200bps, 2400bps, 9600bps, 11.52Kbps and 19.2Kbps. 9600 is the
popular baudrate setting used for PC communication. The maximum operating distance supported by RS-
232 is 50 feet at the highest supported baudrate.
Embedded devices contain a UART for serial communication and they generate signal levels conforming to
TTL/CMOS logic. A level translator IC like MAX 232 from Maxim Dallas semiconductor is used for
converting the signal lines from the UART to RS-232 signal lines for communication. On the receiving side
the received data is converted back to digital logic level by a converter IC. Converter chips contain
converters for both transmitter and receiver.
SMVITM,UDUPI Page 25
EMBEDDED SYSTEM COMPONENTS
EMBEDDED FIRMWARE
Embedded firmware refers to the control algorithm (Program instructions) and or the configuration settings
that an embedded system developer dumps into the code (Program) memory of the embedded system.
There are various methods available for developing the embedded firmware. They are listed below.
(1) Write the program in high level languages like Embedded C/C++ using an Integrated Development
Environment. (2) Write the program in Assembly language using the instructions supported by your
application’s target processor/controller.
The process of converting the program written in either a high level language or processor/controller
specific Assembly code to machine readable binary code is called ‘HEX File Creation’.
The methods used for ‘HEX File Creation’ is different depending on the programming techniques used. If
the program is written in Embedded C/C++ using an IDE, the cross compiler included in the IDE converts it
into corresponding processor/controller understandable ‘HEX File’.
If you are following the Assembly language based programming technique (method 2), you can use the
utilities supplied by the processor/controller vendors to convert the source code into ‘HEX File’. Also third
party tools are available, which may be of free of cost, for this conversion.
For a beginner in the embedded software field, it is strongly recommended to use the high level language
based development technique. The reasons for this being: writing codes in a high level language is easy, the
code written in high level language is highly portable
The embedded software development process in assembly language is tedious and time consuming. The
developer needs to know about all the instruction sets of the processor/controller or at least s/he should carry
an instruction set reference manual with her/him.
Two types of control algorithm design exist in embedded firmware development.
o The first type of control algorithm development is known as the infinite loop or ‘super loop’ based
approach, where the control flow runs from top to bottom and then jumps back to the top of the
program in a conventional procedure. It is similar to the while (1) { }; based technique in C.
o The second method deals with splitting the functions to be executed into tasks and running these
tasks using a scheduler which is part of a General Purpose or Real Time Embedded Operating
System (GPOS/RTOS).
RESET CIRCUIT
The reset circuit is essential to ensure that the device is not operating at a voltage level where the device is
not guaranteed to operate, during system power ON.
The reset signal brings the internal registers and the different hardware systems of the processor/ controller
to a known state and starts the firmware execution from the reset vector (Normally from vector address
0x0000 for conventional processors/controllers.
SMVITM,UDUPI Page 26
EMBEDDED SYSTEM COMPONENTS
The reset signal can be either active high (The processor undergoes reset when the reset pin of the processor
is at logic high) or active low (The processor undergoes reset when the reset pin of the processor is at logic
low).
Since the processor operation is synchronised to a clock signal, the reset pulse should be wide enough to
give time for the clock oscillator to stabilise before the internal reset state starts.
The Zener diode Dz and transistor Q forms the heart of this circuit. The transistor conducts always when the
supply voltage Vcc is greater than that of the sum of VBE and Vz (Zener voltage).
The transistor stops conducting when the supply voltage falls below the sum of VBE and Vz. Select the
Zener diode with required voltage for setting the low threshold value for Vcc.
Microprocessor Supervisor ICs like DS1232 from Maxim Dallas, provides Brown-out protection
OSCILLATOR UNIT
Oscillator unit of the embedded system is responsible for generating the precise clock for the processor.
Certain processors/controllers integrate a built-in oscillator unit and simply require an external ceramic
resonator/quartz crystal for producing the necessary clock signals.
The speed of operation of a processor is primarily dependent on the clock frequency. However we cannot
increase the clock frequency blindly for increasing the speed of execution.
SMVITM,UDUPI Page 27
EMBEDDED SYSTEM COMPONENTS
The total system power consumption is directly proportional to the clock frequency. The power consumption
increases with increase in clock frequency.
The accuracy of program execution depends on the accuracy of the clock signal.
WATCHDOG TIMER
In desktop Windows systems, if we feel our application is behaving in an abnormal way or if the system
hangs up, we have the ‘Ctrl + Alt + Del’ to come out of the situation.
In Embedded system, we have a watchdog to monitor the firmware execution and reset the system
processor/microcontroller when the program execution hangs up.
A watchdog timer is a hardware timer for monitoring the firmware execution.
Depending on the internal implementation, the watchdog timer increments/decrements a free running
counter with each clock pulse and generates a reset signal to reset the processor.
SMVITM,UDUPI Page 28
MODULE 5
MODULE 5
The operating system acts as a bridge between the user applications/tasks and the underlying system resources
through a set of system functionalities and services.
The OS manages the system resources and makes them available to the user applications/tasks on a need basis.
A normal computing system is a collection of different I/O subsystems, working, and storage memory.
Figure 10.1 gives an insight into the basic components of an operating system and their interfaces with rest of the
world.
The Kernel
The kernel is the core of the operating system and is responsible for managing the system resources and the
communication among the hardware and other system services.
Kernel acts as the abstraction layer between system resources and user applications.
Kernel contains a set of system libraries and services.
For a general purpose OS, the kernel contains different services for handling the following.
Process Management: It includes setting up the memory space for the process, loading the process’s code into the
memory space, allocating system resources, scheduling and managing the execution of the process, setting up and
managing the Process Control Block (PCB), Inter Process Communication and synchronisation, process termination/
deletion, etc.
Primary Memory Management: The term primary memory refers to the volatile memory (RAM) where processes are
loaded and variables and shared data associated with each process are stored.
The Memory Management Unit (MMU) of the kernel is responsible for
Keeping track of which part of the memory area is currently used by which process
Allocating and De-allocating memory space on a need basis (Dynamic memory allocation).
File System Management: File is a collection of related information. A file could be a program (source code or executable),
text files, image files, word documents, audio/video files, etc. Each of these files differ in the kind of information they
hold and the way in which the information is stored. The file operation is a useful service provided by the OS.
The file system management service of Kernel is responsible for
I/O System (Device) Management: Kernel is responsible for routing the I/O requests coming from different user applications
to the appropriate I/O devices of the system. In a well-structured OS, the direct accessing of I/O devices are not allowed
and the access to them are provided through a set of Application Programming Interfaces (APIs) exposed by the kernel.
The kernel maintains a list of all the I/O devices of the system. This list may be available in advance, at the time of
building the kernel. Some kernels, dynamically updates the list of available devices as and when a new device is installed
(e.g. Windows NT kernel keeps the list updated when a new plug ‘n’ play USB device is attached to the system).
The service ‘Device Manager’ of the kernel is responsible for handling all I/O device related operations. The kernel talks
to the I/O device through a set of low-level systems calls, which are implemented in a service, called device drivers.
The device drivers are specific to a device or a class of devices. The Device Manager is responsible for
Loading and unloading of device drivers
Exchanging information and the system specific control signals to and from the device
Secondary Storage Management The secondary storage management deals with managing the secondary storage memory
devices, if any, connected to the system. Secondary memory is used as backup medium for programs and data since the
main memory is volatile. In most of the systems, the secondary storage is kept in disks (Hard Disk).
Protection Systems Most of the modern operating systems are designed in such a way to support multiple users with
different levels of access permissions (e.g. Windows 10 with user permissions like ‘Administrator’, ‘Standard’,
‘Restricted’, etc.).
Protection deals with implementing the security policies to restrict the access to both user and system resources by
different applications or processes or users. In multiuser supported operating systems, one user may not be allowed to
view or modify the whole/portions of another user’s data or profile details. In addition, some application may not be
granted with permission to make use of some of the system resources. This kind of protection is provided by the
protection services running within the kernel.
Interrupt Handler Kernel provides handler mechanism for all external/internal interrupts generated by the system. These are
some of the important services offered by the kernel of an operating system. It does not mean that a kernel contains no
more than components/services explained above.
Depending on the type of the operating system, a kernel may contain lesser number of components/services or more
number of components/ services. In addition to the components/services listed above, many operating systems offer a
number of add- on system components/services to the kernel. Network communication, network management, user-
interface graphics, timer services (delays, timeouts, etc.), error handler, database management, etc. are examples for such
components/services.
Kernel exposes the interface to the various kernel applications/services, hosted by kernel, to the user applications through
a set of standard Application Programming Interfaces (APIs). User applications can avail these API calls to access the
various kernel application/services.
2. Differentiate between hard real time and soft real time operating system with an example for each.
Hard Real-Time:
Real-Time Operating Systems that strictly adhere to the timing constraints for a task is referred as ‘Hard Real-Time’
systems.
A Hard Real-Time system must meet the deadlines for a task without any slippage.
Missing any deadline may produce catastrophic results for Hard Real-Time Systems, including permanent data lose
and irrecoverable damages to the system/users.
A system can have several such tasks and the key to their correct operation lies in scheduling them so that they meet
their time constraints.
Air bag control systems and Anti-lock Brake Systems (ABS) of vehicles are typical examples for Hard Real-Time
Systems.
The Air bag control system should be into action and deploy the air bags when the vehicle meets a severe
accident. Ideally speaking, the time for triggering the air bag deployment task, when an accident is sensed by the
Air bag control system, should be zero and the air bags should be deployed exactly within the time frame, which
is predefined for the air bag deployment task.
Soft Real-Time:
Real-Time Operating System that does not guarantee meeting deadlines, but offer the best effort to meet the deadline
are referred as ‘Soft Real-Time’ systems.
Missing deadlines for tasks are acceptable for a Soft Real- time system if the frequency of deadline missing is within
the compliance limit of the Quality of Service (QoS).
Automatic Teller Machine (ATM) is a typical example for Soft-Real-Time System.
If the ATM takes a few seconds more thanthe ideal operation time, nothing fatal happens.
An audio-video playback system is another example for Soft Real-Time system.
The term ‘task’: It is defined as the program in execution and the related information maintained by the operating
system for the program. Task is also known as ‘Job’. A program or part of it in execution is also called a ‘Process’. The
terms ‘Task’, ‘Job’ and ‘Process’ refer to the same entity in the operating system and most often they are used
interchangeably.
A ‘Process’ is a program, or part of it, in execution. Process is also known as an instance of a program in execution.
Multiple instances of the same program can execute simultaneously. A process requires various system resources like
CPU for executing the process, memory for storing the code corresponding to the process and associated variables, I/O
devices for information exchange, etc. A process is sequential in execution.
A thread is the primitive that can execute code. A thread is a single sequential flow of control within a process. ‘Thread’
is also known as lightweight process. A process can have many threads of execution. Different threads, which are part of
a process, share the same address space; meaning they share the data memory, code memory and heap memory area.
Threads maintain their own thread status (CPU register values), Program Counter (PC) and stack.
The concept of ‘Process’ leads to concurrent execution (pseudo parallelism) of tasks and thereby the efficient
utilisation of the CPU and other system resources.
Concurrent execution is achieved through the sharing of CPU among the processes.
A process mimics a processor in properties and holds a set of registers, process status, a Program Counter (PC) to
point to the next executable instruction of the process, a stack for holding the local variables associated with the
process and the code corresponding to the process. This can be visualisedas shown in Fig. 10.4.
A process which inherits all the properties of the CPU can be considered as a virtual processor, awaiting its turn to
have its properties switched into the physical processor. When the process gets its turn, its registers and the program
counter register becomes mapped to the physical registers of the CPU.
From a memory perspective, the memory occupied by the process is segregated into three regions, namely, Stack
memory, Data memory and Code memory (Fig. 10.5).
The ‘Stack’ memory holds all temporary data such as variables local to the process.
Data memory holds all global data for the process.
The code memory contains the program code (instructions) correspondingto the process.
The state at which a process is being created is referred as ‘Created State’. The Operating System recognises a
process in the ‘Created State’ but no resources are allocated to the process.
The state, where a process is incepted into the memory and awaiting the processor time for execution, is known as
‘Ready State’. At this stage, the process is placed in the ‘Ready list’ queue maintained by the OS.
The state where in the source code instructions corresponding to the process is being executed is called ‘Running
State’. Running state is the state at which the process execution happens.
‘Blocked State/Wait State’ refers to a state where a running process is temporarily suspended from execution and
does not have immediate access to resources. The blocked state might be invoked by various conditions like: the
process enters a wait state for an event to occur (e.g. Waiting for user inputs such as keyboard input) or waiting for
getting access to a shared resource.
A state where the process completes its execution is known as ‘Completed State’.
The transition of a process from one state to another is known as ‘State transition’.
When a process changes its state from Ready to running or from running to blocked or terminated or from blocked to
running, the CPU allocation for the process mayalso change.
5. Explain multithreading
A process/task in embedded application may be a complex or lengthy one and it may contain various suboperations
like getting input from I/O devices connected to the processor, performing some internal calculations/operations,
updating some I/O devices etc. If all the sub functions of a task are executed in sequence, the CPU utilisation may
not be efficient. For example, if the process is waiting for a user input, the CPU enters the wait state for the event,
and the process execution also enters a wait state.
Instead of this single sequential execution of the whole process, if the task/process is split into different threads
carrying out the different subfunctionalities of the process, the CPU can be effectively utilised and when the thread
corresponding to the I/O operation enters the wait state, another threads which do not require the I/O event for their
operation can be switched into execution. This leads to more speedy execution of the process and the efficient
utilisation of the processor time and resources.
The multithreaded architecture of a process can be better visualised with the thread-process diagram shown in Fig.
10.8.
If the process is split into multiple threads, which executes a portion of the process, there will be a main thread and
rest of the threads will be created within the main thread.
Better memory utilisation. Multiple threads of the same process share the address space for data memory. This
also reduces the complexity of inter thread communication since variables can be shared across the threads.
Since the process is split into different threads, when one thread enters a wait state, the CPU can be utilised by
other threads of the process that do not require the event, which the other thread is waiting, for processing. This
speeds up the execution of the process.
Efficient CPU utilisation. The CPU is engaged all time.
Multiprocessing:
Systems which are capable of performing multiprocessing, are known as multiprocessor systems. Multiprocessor
systems possess multiple CPUs and can execute multiple processes simultaneously.
Multitasking:
The ability of the operating system to have multiple programs in memory, which are ready for execution, is referred
as multiprogramming. In a uniprocessor system, it is not possible to execute multiple processes simultaneously.
However, it is possible for a uniprocessor system to achieve some degree of pseudo parallelism in the execution of
multiple processes by switching the execution amongdifferent processes.
The ability of an operating system to hold multiple processes in memory and switch the processor (CPU) from
executing one process to another process is known as multitasking.
Context Switching
Multitasking creates the illusion of multiple tasks executing in parallel. Multitasking involves the switching of CPU
fromexecuting one task to another.
In a multitasking environment, when task/process switching happens, the virtual processor (task/process) gets its
properties converted into that of the physical processor.
The switching of the virtual processor to physical processor is controlled by the scheduler of the OS kernel. Whenever
a CPU switching happens, the current context of execution should be saved to retrieve it at a later point of time when
the CPU executes the process, which is interrupted currentlydue to execution switching.
The context saving and retrieval is essential for resuming a process exactly from the point where it was interrupted
due to CPU switching.
The act of switching CPU among the processes or changing the current execution context is known as ‘Context
switching’.
The act of saving the current context which contains the context details (Register details, memory details, system
resource usage details, execution details, etc.) for the currently running process at the time of CPU switching is
known as ‘Context saving’.
The process of retrieving the saved context details for a process, which is going to be executed due to CPU
switching, is known as ‘Context retrieval’. Multitasking involves ‘Context switching’ (Fig. 10.11), ‘Context
saving’ and ‘Context retrieval’.
7. Explain the concept of deadlock with a neat diagram and mention how to avoid deadlock
In a multiprogramming environment several processes may compete for a finite number of resources. A process
request resources; if the resource is available at that time a process enters the wait state. Waiting process may never
change its state because the resources requested are held by other waiting process. This situation is known as
deadlock.
Deadlock Characteristics: In a deadlock process never finish executing and system resources are tied up. A deadlock
situation can arise if the following four conditions hold simultaneously in a system.
Mutual Exclusion: At a time only one process can use the resources. If another process requests that resource,
requesting process must wait until the resource has been released.
Hold and wait: A process must be holding at least one resource and waiting to additional resource that is
currently held by other processes.
No Preemption: Resources allocated to a process can’t be forcibly taken out from it, unless it releases that
resource after completing the task.
Circular Wait: A set {P0, P1, …….Pn} of waiting state/ process must exists such that P0 is waiting for a
resource that is held by P1, P1 is waiting for the resource that is held by P2 ….. P(n – 1) is waiting for the
resource that is held by Pn and Pn is waiting for the resources that is held by P4.
To avoid deadlock:
Deadlock Handling: Smart OS
Ignore Deadlocks: Deadlock free
Detect & Recover: Traffic light signal mechanism
Avoid deadlock: Careful resource allocation
Prevent Deadlocks: Sleep & Wakeup
Message passing is an
Synchronous information exchange mechanism used for Inter Process/Thread Communication.
The major difference between shared memory and message passing technique is that, through shared memory lots of
data can be shared whereas only limited amount of info/data is passed through message passing.
Also message passing is relatively fast and free from the synchronisation overheads compared to shared memory.
Based on the message passing operation between the processes, message passing is classified into
Message Queues: Process which wants to talk to another process posts the message to a First-In-First-Out (FIFO)
queue called „Message queue‟, which stores the messages temporarily in a system defined memory object, to pass
it to the desired process. Messages are sent and received through send (Name of the process to which the message is
to be sent, message) and receive (Name of the process from which the message is to be received, message) methods.
The messages are exchanged through a message queue. The implementation of the message queue, send and receive
methods are OS kernel dependent
Mailbox: Mailbox is a special implementation of message queue. Usually used for one way communication, only a
single message is exchanged through mailbox whereas „message queue‟ can be used for exchanging multiple
messages. One task/process creates the mailbox and other tasks/process can subscribe to this mailbox for getting
message notification. The implementation of the mailbox is OS kernel dependent. The MicroC/ OS-II RTOS
implements mailbox as a mechanism for inter task communication
Signalling: Signals are used for an asynchronous notification mechanism. The signal mainly used for the execution
synchronization of tasks process/ tasks. Signals do not carry any data and are not queued. The implementation of
signals is OS kernel dependent and VxWorks RTOS kernel implements „signals‟ for inter process communication.
Semaphore is a sleep and wakeup based mutual exclusion implementation for shared resource access.
Semaphore is a system resource and the process which wants to access the shared resource can first acquire this
system object to indicate the other processes which wants the shared resource that the shared resource is currently
acquired by it.
The resources which are shared among a process can be either for exclusive use by a process or for using by a number
of processes at a time.
The display device of an embedded system is a typical example for the shared resource which needs exclusive
access by a process.
The Hard disk (secondary storage) of a system is a typical example for sharing the resource among a limited
number of multiple processes. Various processes can access the different sectors of the hard-disk concurrently.
Based on the implementation of the sharing limitation of the shared resource, semaphores are classified into two;
namely ‘Counting Semaphore’ and ‘Binary Semaphore’.
The ‘CountingSemaphore’
It limits the access of resources by a fixed number of processes/threads.
It maintains a count between zero and a maximum value.
It limits the usage of the resource to the maximum value of the count supported by it.
A real world example for the counting semaphore concept is the dormitory system for accommodation (Fig.
10.34). A dormitory contains a fixed number of beds (say 5) and at any point of time it can be shared by the
maximum number of users supported by the dormitory. If a person wants to avail the dormitory facility, he/she
can contact the dormitory caretaker for checking the availability. If beds are available in the dorm the caretaker
will hand over the keys to the user. If beds are not available currently, the user can register his/her name to get
notifications when a slot is available. Those who are availing the dormitory shares the dorm facilities like TV,
telephone, toilet, etc. When a dorm user vacates, he/she gives the keys back to the caretaker. The caretaker informs
the users, who booked in advance, about the dorm availability.
Functional Requirements
Processor Support It is not necessary that all RTOS’s support all kinds of processor architecture. It is essential to
ensure the processor support by the RTOS.
Memory Requirements The OS requires ROM memory for holding the OS files and it is normally stored in a non-
volatile memory like FLASH. OS also requires working memory RAM for loading the OS services. Since embedded
systems are memory constrained, it is essential to evaluate the minimal ROM and RAM requirements for the OS
under consideration.
Real-time Capabilities It is not mandatory that the operating system for all embedded systems need to be Real-time and all
embedded Operating systems are ‘Real-time’ in behaviour. The task/process scheduling policies plays an important
role in the ‘Real-time’ behaviour of an OS. Analyse the real-time capabilities of the OS under consideration and the
standards met by the operating system for real-time capabilities.
Kernel and Interrupt Latency The kernel of the OS may disable interrupts while executing certain services and it may
lead to interrupt latency. For an embedded system whose response requirements are high, this latency should be
minimal.
Inter Process Communication and Task Synchronisation The implementation of Inter Process Communication and
Synchronisation is OS kernel dependent. Certain kernels may provide a bunch of options whereas others provide very
limited options. Certain kernels implement policies for avoiding priorityinversion issues in resource sharing.
Modularisation Support Most of the operating systems provide a bunch of features. At times it may not be necessary
for an embedded product for its functioning. It is very useful if the OS supports modularisation where in which the
developer can choose the essential modules and re-compile the OS image for functioning. Windows CE is an example
for a highly modular operating system.
Support for Networking and Communication The OS kernel may provide stack implementation and driver support for a
bunch of communication interfaces and networking. Ensure that the OS under consideration provides support for all
the interfaces required by the embedded product.
Development Language Support Certain operating systems include the run time libraries required for running
applications written in languages like Java and C#. A Java Virtual Machine (JVM) customised for the Operating
System is essential for running java applications. Similarly the .NET Compact Framework (.NETCF) is required for
running Microsoft® .NET applications on top of the Operating System. The OS may include these components as
built-in component, if not, check the availability of the same from a third party vendor for the OS under consideration.
Non-functional Requirements
Custom Developed or Off the Shelf Depending on the OS requirement, it is possible to go for the complete development
of an operating system suiting the embedded system needs or use an off the shelf, readily available operating system,
which is either a commercial product or an Open Source product, which is in close match with the system
requirements. Sometimes it may be possible to build the required features by customising an Open source OS. The
decision on which to select is purely dependent on the development cost, licensing fees for the OS, development time
and availability of skilled resources.
Cost The total cost for developing or buying the OS and maintaining it in terms of commercial product and custom
build needs to be evaluated before taking a decision on the selection of OS.
Development and Debugging Tools Availability The availability of development and debugging tools is a critical decision
making factor in the selection of an OS for embedded design. Certain Operating Systems may be superior in
performance, but the availability of tools for supporting the development may be limited. Explore the different tools
available for the OS under consideration.
Ease of Use How easy it is to use a commercial RTOS is another important feature that needs to be considered in the
RTOS selection.
After Sales For a commercial embedded RTOS, after sales in the form of e-mail, on-call services, etc. for bug fixes,
critical patch updates and support for production issues, etc. should be analysed thoroughly.
11. Explain the role of Integrated Development Environment (IDE) for Embedded Software Development
In embedded system development context, Integrated Development Environment (IDE) stands for an integrated
environment for developing and debugging the target processor specific embedded firmware.
IDE is a software package which bundles a ‘Text Editor (Source Code Editor)’, ‘Cross-compiler (for cross platform
development and compiler for same
IDEs used in embedded firmware development are slightly different from the generic IDEs used for high level
language based development for desktop applications.
In Embedded Applications, the IDE is either supplied by the target processor/controller manufacturer or by third party
vendors or as Open Source.
MPLAB is an IDE tool supplied by microchip for developing embedded firmware using their PIC family of
microcontrollers.
Keil µVision5 from ARMKeil is an example for a third party IDE, which is used for developing embedded
firmware for ARM family microcontrollers.
CodeWarrior Development Studio is an IDE for ARM family of processors/MCUs and DSP chips from Freescale.
It should be noted that in embedded firmware development applications each IDE is designed for a specific family of
controllers/processors and it may not be possible to develop firmware for all family of controllers/processors using a
single IDE
However there is a rapid move happening towards the open source IDE, Eclipse for embedded development. Most of
the proccessor/control manufacturers and third party IDE providers are trying to build the IDE around the popular
Eclipse open source IDE.
12. Explain boundary scanning for hardware testing with diagram
As the complexity of the hardware increase, the number of chips present in the board and the interconnection among
them may also increase.
The device packages used in the PCB become miniature to reduce the total board space occupied by them and
multiple layers may be required to route the interconnections among the chips. With miniature device packages and
multiple layers for the PCB it will be very difficult to debug the hardware using magnifying glass, multimeter, etc. to
check the interconnection among the various chips.
Boundary scan is a technique used for testing the interconnection among the various chips, which support JTAG
interface, present in the board.
Chips which support boundaryscan associate a boundary scan cell with each pin of the device.
A JTAG port which contains the five signal lines namely TDI, TDO, TCK, TRST and TMS form the Test Access
Port (TAP) for a JTAG supported chip. Each device will have its own TAP.
The PCB also contains a TAP for connecting the JTAG signal lines to the external world. A boundary scan path is
formed inside the board by interconnecting the devices through JTAG signal lines.
The TDI pin of the TAP of the PCB is connected to the TDI pin of the first device.
The TDO pin of the first device is connected to the TDI pin of the second device. In this way all devices are
interconnected and the TDO pin of the last JTAG device is connected to the TDO pin of the TAP of the PCB.
The clock line TCK and the Test Mode Select (TMS) line of the devices are connected to the clock line and Test
mode select line of the Test Access Port of the PCB respectively. This forms a boundary scan path. Figure 13.41
illustrates the same.
As mentioned earlier, each pin of the device associates a boundary scan cell with it. The boundary scan cell is a
multipurpose memory cell. The boundary scan cell associated with the input pins of an IC is known as ‘input cells’ and
the boundary scan cells associated with the output pins of an IC is known as ‘output cells’.
The boundary scan cells can be used for capturing the input pin signal state and passing it to the internal circuitry,
capturing the signals from the internal circuitry and passing it to the output pin, and shifting the data received from the
Test Data In pin of the TAP.
The boundary scan cells associated with the pins are interconnected and they form a chain from the TDI pin of the
device to its TDO pin.
The boundary scan cells can be operated in Normal, Capture, Update and Shift modes.
In the Normal mode, the input of the boundaryscan cell appears directly at its output.
In the Capture mode, the boundary scan cell associated with each input pin of the chip captures the signal from the
respective pins to the cell and the boundary scan cell associated with each output pin of the chip captures the signal
from the internal circuitry. In the
Update mode, the boundary scan cell associated with each input pin of the chip passes the already captured data to
the internal circuitry and the boundary scan cell associated with each output pin of the chip passes the already captured
data to the respective output pin.
In the shift mode, data is shifted from TDI pin to TDO pin of the device through the boundary scan cells. ICs
supporting boundary scan contain additional boundary scan related registers for facilitating the boundary scan
operation. Instruction Register, Bypass Register, Identification Register, etc. are examples of boundary scan related
registers.
Disassembler is a utility program which converts machine codes into target processor specific Assembly
codes/instructions. The process of converting machine codes into Assembly code is known as ‘Disassembling’.
De-compilers reproduce the code in a high level language. Frequently, this high level language is C, because C is
simple and primitive enough to facilitate the decompilation process. Decompilation does have its drawbacks, because
lots of data and readability constructs are lost during the original compilation process, and they cannot be reproduced.
Since the science of decompilation is still young, and results are "good" but not "great”.
Debugging in embedded application is the process of diagnosing the firmware execution, monitoring the target
processor’s registers and memory while the firmware is running and checking the signals from various buses of the
embedded hardware.
Debugging process in embedded application is broadly classified into two, namely; hardware debugging and
firmware debugging.
Hardware debugging deals with the monitoring of various bus signals and checking the status lines of the target
hardware.
Firmware debugging deals with examining the firmware execution, execution flow, changes to various CPU
registers and status registers on execution of the firmware to ensure that the firmware is runningas per the design.
Emulator is a self-contained hardware device which emulates the target CPU. The emulator hardware contains
necessary emulation logic and it is hooked to the debugging application running on the development PC on one end
and connects to the target board through some interface on the other end.
Simulator is a software application that precisely duplicates (mimics) the target CPU and simulates the various
features and instructions supported by the target CPU.