0% found this document useful (0 votes)

30 views10 pages

Module 5.5

The document discusses errors caused by rounding and truncation in floating point representation of numbers, explaining the concepts of truncation and rounding with examples. It outlines the general representation of floating point numbers and provides mathematical inequalities related to errors in both truncation and rounding cases. Additionally, it includes references for further reading on the topic.

Uploaded by

soujath048

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views10 pages

Module 5.5

Uploaded by

soujath048

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

ERRORS CAUSED

DUE TO ROUNDING AND TRUNCATION FOR FLOATING POINT REPRESENTATION OF

NUMBERS
THINGS TO NOTE:
❑A word length of a data is presumed to be 'b' bits by default.
❑An error will always abide the inequality

❑The general representation of floating point representation of numbers is given by:

where 'M' represents the mantissa and is the one subjective to change.
What is TRUNCATION???
Truncation is a type of quantization where extra bits get truncated. Basically, in the truncation process,
all bits less significant than the desired LSB (Least Significant Bit) are discarded.
Taking an example of '10.201562387542' which is a 15 digit input.
NOTE: The decimal point is also considered as a bit since the above mentioned number is display.
When the example value is truncated to 10 digits the output will be displayed as '10.2015623'

What is ROUNDING???
Rounding is a quantization method where we ’round-up’ a particular number to the desired number of
bits.
Rounding is the process of reducing the size of a binary number to some desirable finite size. This is done
in such a way that the rounded off number is as close to the original unquantized number as possible.
Interestingly, the rounding process is a combination of truncation and addition.
Taking the above mentioned example, rounding of the number to 10 digits will display the output as
'10.2015624'
FLOATING POINT REPRESENTATION
The general representation of floating point numbers is given by =>
where 'M' is the mantissa and is the one subjective to change during truncation or rounding.
To understand the effect of each aspect, let us understand the 2 cases:

CASE 1: TRUNCATION
The general amount of floating point numbers is given by =>

The general amount of truncated floating point numbers is given by =>

Thus the error formed will be obtained as =>

Case a: 2's complement representation

The 2's complement representation of the mantissa is given by =>

Which when simplified =>

Substituting the defined relation from relative error into the inequality known and simplify it, we get =>

Substituting for the value of 'x', the inequality becomes =>

Case b: 1's complement representation

(i)Positive values of mantissa
The 1's complement representation for positive values of the mantissa is =>
=>

=>
(ii)Negative values of mantissa
The 1's complement representation for positive values of the mantissa is =>
=>

Although the inequalities differ for values based on their sign it has ultimately inferred the error range for
the 1's complement representation, a same negative range.
Probability Distribution Function for Floating Point
NOTE: The area enclosed by
the boundary of the graph
is unity due to constant
probability distribution.
CASE 2: ROUNDING
Therefore the general amount of floating point numbers is given by =>
The general amount of rounded floating point numbers is given by =>
NOTE: Some materials would refer the change in rounding with a subscript 'R'.
Thus the error formed will be obtained as =>

In contrast to truncation, the rounding occurs when this single inequality is satisfied =>

=> =>

Substituting the value for M=½ since inequality is satisfied, the range observed will be
Probability Distribution Function for Floating Point

EXAMPLE (truncation)
Let us assume the number '0.12890625' whose binary equivalent is '0.00100001'.

When truncating the binary form to 4-bits it becomes

Therefore the error produced can be computed as follows =>
Computing the RHS of the inequality we get
Since the inequality is satisfied the presence of error in the truncated value is certain.
REFERENCES
https://www.youtube.com/playlist?list=PLsdgy6o6gsRz8cIsQmjH-k1PYi4ZD45jw -Source

https://www.technobyte.org -Write-up material

❖ https://technobyte.org/quantization-truncation-
rounding/#:~:text=What%20is%20Truncation%3F,where%20extra%20bits%20get%20'truncated.&text=Basically%2C%20in%20the%20truncatio n
%20process,bit%20number%20to%204%2Dbits .

https://technobyte.org/quantization-truncation-
rounding/#:~:text=Basically%2C%20rounding%20is%20the%20process,combination%20of%20truncation%20and%20addition.

https://youtu.be/P9NVIheNOdw -Truncation and Rounding on arithmetic grounds

THANK YOU
RICHARD JOSEPH
S5 ECE GAMMA
19

Understanding Round-off and Truncation Errors
No ratings yet
Understanding Round-off and Truncation Errors
26 pages
Module 2 Presentation - Approximations and Errors
No ratings yet
Module 2 Presentation - Approximations and Errors
79 pages
Document From Avijit Mukherjee
No ratings yet
Document From Avijit Mukherjee
10 pages
Real Number Representation and Floating Point Arithmetic
No ratings yet
Real Number Representation and Floating Point Arithmetic
12 pages
Week 5: IEEE Floating Point Revision Guide For Phase Test
No ratings yet
Week 5: IEEE Floating Point Revision Guide For Phase Test
23 pages
Unit 4 DSP New
No ratings yet
Unit 4 DSP New
43 pages
DSP Assignment
No ratings yet
DSP Assignment
8 pages
Unit 1
No ratings yet
Unit 1
7 pages
Mailam Engineering College Mailam (Po), Villupuram (DT) - Pin: 604 304
No ratings yet
Mailam Engineering College Mailam (Po), Villupuram (DT) - Pin: 604 304
43 pages
Chapter 7 - Floating Point Arithmetic
No ratings yet
Chapter 7 - Floating Point Arithmetic
8 pages
Introduction To Numerical Analysis I: Math/Cmpsc 455
No ratings yet
Introduction To Numerical Analysis I: Math/Cmpsc 455
11 pages
Floating Point Number Operations Explained
No ratings yet
Floating Point Number Operations Explained
2 pages
Part 1
No ratings yet
Part 1
33 pages
FALLSEM2018-19 - MAT5009 - TH - TT531 - VL2018191004951 - Reference Material I - 16 - Rounding - Error Calculation
No ratings yet
FALLSEM2018-19 - MAT5009 - TH - TT531 - VL2018191004951 - Reference Material I - 16 - Rounding - Error Calculation
13 pages
Numerical Methods: Accuracy & Errors
No ratings yet
Numerical Methods: Accuracy & Errors
41 pages
Understanding Floating-Point Representation
No ratings yet
Understanding Floating-Point Representation
39 pages
03FloatingPoint 1
No ratings yet
03FloatingPoint 1
11 pages
Unit 4
No ratings yet
Unit 4
26 pages
Understanding Roundoff Errors in Computing
No ratings yet
Understanding Roundoff Errors in Computing
13 pages
Numerical Methods
100% (1)
Numerical Methods
649 pages
Stack Organization in Computer Systems
No ratings yet
Stack Organization in Computer Systems
22 pages
1745344755-Floating Point Representation
No ratings yet
1745344755-Floating Point Representation
78 pages
(Turner) - Applied Scientific Computing - Chap - 02
No ratings yet
(Turner) - Applied Scientific Computing - Chap - 02
19 pages
9-Algorithms For Floating Point Arithmetic Operations-22-01-2024
No ratings yet
9-Algorithms For Floating Point Arithmetic Operations-22-01-2024
49 pages
Understanding Floating Point Numbers
No ratings yet
Understanding Floating Point Numbers
19 pages
ME3105 Notes 1
No ratings yet
ME3105 Notes 1
10 pages
Digital Signal Processing Unit IV Guide
No ratings yet
Digital Signal Processing Unit IV Guide
44 pages
Floating-Point Representation Guide
No ratings yet
Floating-Point Representation Guide
14 pages
Understanding Rounding Errors in Computing
No ratings yet
Understanding Rounding Errors in Computing
34 pages
13.3 Floating-Point Numbers, Representation & Manipulation
No ratings yet
13.3 Floating-Point Numbers, Representation & Manipulation
10 pages
Understanding Floating Point Arithmetic
No ratings yet
Understanding Floating Point Arithmetic
192 pages
Unit 1
No ratings yet
Unit 1
19 pages
Floating Point Representation
No ratings yet
Floating Point Representation
7 pages
DSP Arithmetic and FPGA Guide
No ratings yet
DSP Arithmetic and FPGA Guide
38 pages
Unit 4 Part Finite Word Length Effects
No ratings yet
Unit 4 Part Finite Word Length Effects
12 pages
Numerical Errors and Approximations
No ratings yet
Numerical Errors and Approximations
6 pages
Floating-Point Numbers
No ratings yet
Floating-Point Numbers
23 pages
8.1.4 Data Representation - Floatng Point Numbers
No ratings yet
8.1.4 Data Representation - Floatng Point Numbers
3 pages
Finite Word Length Effects in Digital Filter
No ratings yet
Finite Word Length Effects in Digital Filter
26 pages
Floating Point Arithmetic IEEE Floating Point
No ratings yet
Floating Point Arithmetic IEEE Floating Point
30 pages
Floating Point Computation Guide
No ratings yet
Floating Point Computation Guide
7 pages
Floating Point Basics and Formats
No ratings yet
Floating Point Basics and Formats
5 pages
Approximations and Errors in Numerical Computing
100% (2)
Approximations and Errors in Numerical Computing
12 pages
Mathematical Modeling
No ratings yet
Mathematical Modeling
14 pages
Numerical Errors
No ratings yet
Numerical Errors
23 pages
Numerical Methods I - Roundoff Errors
No ratings yet
Numerical Methods I - Roundoff Errors
46 pages
Mws Gen Aae Spe Floatingpoint
No ratings yet
Mws Gen Aae Spe Floatingpoint
8 pages
HW 2
No ratings yet
HW 2
4 pages
IEEE 754 Floating Point Representation Guide
No ratings yet
IEEE 754 Floating Point Representation Guide
31 pages
Understanding Numerical Representation
No ratings yet
Understanding Numerical Representation
30 pages
Floating Point Arithmetic Guide
No ratings yet
Floating Point Arithmetic Guide
158 pages
Floating-Point Number of Extreme Cases
No ratings yet
Floating-Point Number of Extreme Cases
27 pages
CHAP 03e
No ratings yet
CHAP 03e
32 pages
COA Module6 FloatingPoint
No ratings yet
COA Module6 FloatingPoint
17 pages
Floating-Point Representation in Computing
No ratings yet
Floating-Point Representation in Computing
6 pages
4.4 - 1 New Floating Point
No ratings yet
4.4 - 1 New Floating Point
22 pages
Roundoff and Truncation Errors: Dr. Abdul Muis, ST - Meng. Dr. Ir. Feri Yusivar Meng
No ratings yet
Roundoff and Truncation Errors: Dr. Abdul Muis, ST - Meng. Dr. Ir. Feri Yusivar Meng
29 pages
Intelilite 4 1.5.2 New Features List
No ratings yet
Intelilite 4 1.5.2 New Features List
25 pages
ICT Exam for Junior Students
No ratings yet
ICT Exam for Junior Students
14 pages
DRD
No ratings yet
DRD
16 pages
Unit V File Processing: Text Files
No ratings yet
Unit V File Processing: Text Files
26 pages
Creative Technologies SSC93 RD Quarter
No ratings yet
Creative Technologies SSC93 RD Quarter
11 pages
Aqa 8525 SSV 2027
No ratings yet
Aqa 8525 SSV 2027
15 pages
PL7 - Completo
No ratings yet
PL7 - Completo
344 pages
Ultrasonic Flow Meter Installation Guide
No ratings yet
Ultrasonic Flow Meter Installation Guide
13 pages
Alct Bss Telecom Parameters Dictionary
No ratings yet
Alct Bss Telecom Parameters Dictionary
1,084 pages
Data Representation Notes by Areeba Naeem
No ratings yet
Data Representation Notes by Areeba Naeem
3 pages
CS101 Midterm Short Handouts Lectures 1-7 (Vusolutionpoint - Com) - 2024-10-16T002039.970
No ratings yet
CS101 Midterm Short Handouts Lectures 1-7 (Vusolutionpoint - Com) - 2024-10-16T002039.970
20 pages
8051 Interfacing Experiments
No ratings yet
8051 Interfacing Experiments
7 pages
A Course in In-Memory Data Management: Prof. Hasso Plattner
No ratings yet
A Course in In-Memory Data Management: Prof. Hasso Plattner
6 pages
Dci401mcs Command Set Primer
No ratings yet
Dci401mcs Command Set Primer
14 pages
Class 12 CS 100+ Output Based Questions by Nitin Paliwal PDF
100% (2)
Class 12 CS 100+ Output Based Questions by Nitin Paliwal PDF
79 pages
Programming Challenges Guide
No ratings yet
Programming Challenges Guide
22 pages
M93S46-W M93S56-W M93S66-W: 4 Kbit, 2 Kbit and 1 Kbit Serial MICROWIRE Bus EEPROM With Write Protection
No ratings yet
M93S46-W M93S56-W M93S66-W: 4 Kbit, 2 Kbit and 1 Kbit Serial MICROWIRE Bus EEPROM With Write Protection
32 pages
Sarvatra ISO 8583 Interface v1 - 1
No ratings yet
Sarvatra ISO 8583 Interface v1 - 1
29 pages
Corrupted File: Input
No ratings yet
Corrupted File: Input
2 pages
Shihlin Temperature Controller Catalog en
No ratings yet
Shihlin Temperature Controller Catalog en
16 pages
Lecture 0 - CS50x PDF
No ratings yet
Lecture 0 - CS50x PDF
17 pages
Image Processing LECTURE 2-B
100% (1)
Image Processing LECTURE 2-B
23 pages
TV Remote Decoder Project Report v2
No ratings yet
TV Remote Decoder Project Report v2
3 pages
Rockwell Micro850 Free Tag Names
No ratings yet
Rockwell Micro850 Free Tag Names
8 pages
Understanding Computer Number Systems
100% (1)
Understanding Computer Number Systems
14 pages
Measuring Computer MODBUS Interface Description: Operating Manual 42/18-58 EN
No ratings yet
Measuring Computer MODBUS Interface Description: Operating Manual 42/18-58 EN
28 pages
1.3 (A) Number System - Data Storage & Compression
No ratings yet
1.3 (A) Number System - Data Storage & Compression
9 pages
Examveda Computer
100% (1)
Examveda Computer
152 pages
How To Use TeSys T Custom - Logic (2020 - 07 - 16 11 - 18 - 21 UTC)
No ratings yet
How To Use TeSys T Custom - Logic (2020 - 07 - 16 11 - 18 - 21 UTC)
28 pages
Arduino Robotics Quiz Guide
100% (1)
Arduino Robotics Quiz Guide
13 pages

Module 5.5

Uploaded by

Module 5.5

Uploaded by

ERRORS CAUSED

DUE TO ROUNDING AND TRUNCATION FOR FLOATING POINT REPRESENTATION OF

❑The general representation of floating point representation of numbers is given by:

The general amount of truncated floating point numbers is given by =>

Thus the error formed will be obtained as =>

Case a: 2's complement representation

Which when simplified =>

Substituting for the value of 'x', the inequality becomes =>

Case b: 1's complement representation

When truncating the binary form to 4-bits it becomes

https://www.technobyte.org -Write-up material

https://youtu.be/P9NVIheNOdw -Truncation and Rounding on arithmetic grounds

You might also like