Ieee Recommended Practice On Software Reliability
Ieee Recommended Practice On Software Reliability
Software Reliability
Sponsored by the
Standards Committee
IEEE
3 Park Avenue IEEE Std 1633™-2016
New York, NY 10016-5997 (Revision of IEEE Std 1633-2008)
USA
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633™-2016
(Revision of IEEE Std 1633-2008)
Sponsor
Standards Committee
of the
IEEE Reliability Society
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
Abstract: The methods for assessing and predicting the reliability of software, based on a life-
cycle approach to software reliability engineering (SRE), are prescribed in this recommended
practice. It provides information necessary for the application of software reliability (SR)
measurement to a project, lays a foundation for building consistent methods, and establishes the
basic principle for collecting the data needed to assess and predict the reliability of software. The
recommended practice prescribes how any user can participate in SR assessments and
predictions.
IEEE is a registered trademark in the U.S. Patent & Trademark Office, owned by The Institute of Electrical and Electronics
Engineers, Incorporated.
Capability Maturity Model Integrated and CMMI are registered in the U.S. Patent and Trademark Office by Carnegie Mellon University.
Excel is a registered trademark of Microsoft Corporation in the United States and/or other countries.
Java is a trademark of Sun Microsystems, Inc. in the United States and other countries.
2
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
Important Notices and Disclaimers Concerning IEEE Standards Documents
IEEE documents are made available for use subject to important notices and legal disclaimers. These
notices and disclaimers, or a reference to this page, appear in all standards and may be found under the
heading “Important Notices and Disclaimers Concerning IEEE Standards Documents.” They can also be
obtained on request from IEEE or viewed at http://standards.ieee.org/IPR/disclaimers.html.
IEEE Standards do not guarantee or ensure safety, security, health, or environmental protection, or ensure
against interference with or from other devices or networks. Implementers and users of IEEE Standards
documents are responsible for determining and complying with all appropriate safety, security,
environmental, health, and interference protection practices and all applicable laws and regulations.
IEEE does not warrant or represent the accuracy or content of the material contained in its standards, and
expressly disclaims all warranties (express, implied and statutory) not included in this or any other
document relating to the standard, including, but not limited to, the warranties of: merchantability; fitness
for a particular purpose; non-infringement; and quality, accuracy, effectiveness, currency, or completeness
of material. In addition, IEEE disclaims any and all conditions relating to: results; and workmanlike effort.
IEEE standards documents are supplied “AS IS” and “WITH ALL FAULTS.”
Use of an IEEE standard is wholly voluntary. The existence of an IEEE standard does not imply that there
are no other ways to produce, test, measure, purchase, market, or provide other goods and services related
to the scope of the IEEE standard. Furthermore, the viewpoint expressed at the time a standard is approved
and issued is subject to change brought about through developments in the state of the art and comments
received from users of the standard.
In publishing and making its standards available, IEEE is not suggesting or rendering professional or other
services for, or on behalf of, any person or entity nor is IEEE undertaking to perform any duty owed by any
other person or entity to another. Any person utilizing any IEEE Standards document, should rely upon his
or her own independent judgment in the exercise of reasonable care in any given circumstances or, as
appropriate, seek the advice of a competent professional in determining the appropriateness of a given
IEEE standard.
IN NO EVENT SHALL IEEE BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO:
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE PUBLICATION, USE OF, OR RELIANCE
UPON ANY STANDARD, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE AND
REGARDLESS OF WHETHER SUCH DAMAGE WAS FORESEEABLE.
3
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
Translations
The IEEE consensus development process involves the review of documents in English only. In the event
that an IEEE standard is translated, only the English version published by IEEE should be considered the
approved IEEE standard.
Official statements
A statement, written or oral, that is not processed in accordance with the IEEE-SA Standards Board
Operations Manual shall not be considered or inferred to be the official position of IEEE or any of its
committees and shall not be considered to be, or be relied upon as, a formal position of IEEE. At lectures,
symposia, seminars, or educational courses, an individual presenting information on IEEE standards shall
make it clear that his or her views should be considered the personal views of that individual rather than the
formal position of IEEE.
Comments on standards
Comments for revision of IEEE Standards documents are welcome from any interested party, regardless of
membership affiliation with IEEE. However, IEEE does not provide consulting information or advice
pertaining to IEEE Standards documents. Suggestions for changes in documents should be in the form of a
proposed change of text, together with appropriate supporting comments. Since IEEE standards represent a
consensus of concerned interests, it is important that any responses to comments and questions also receive
the concurrence of a balance of interests. For this reason, IEEE and the members of its societies and
Standards Coordinating Committees are not able to provide an instant response to comments or questions
except in those cases where the matter has previously been addressed. For the same reason, IEEE does not
respond to interpretation requests. Any person who would like to participate in revisions to an IEEE
standard is welcome to join the relevant IEEE working group.
Comments on standards should be submitted to the following address:
Users of IEEE Standards documents should consult all applicable laws and regulations. Compliance with
the provisions of any IEEE Standards document does not imply compliance to any applicable regulatory
requirements. Implementers of the standard are responsible for observing or referring to the applicable
regulatory requirements. IEEE does not, by the publication of its standards, intend to urge action that is not
in compliance with applicable laws, and these documents may not be construed as doing so
Copyrights
IEEE draft and approved standards are copyrighted by IEEE under U.S. and international copyright laws.
They are made available by IEEE and are adopted for a wide variety of both public and private uses. These
include both use, by reference, in laws and regulations, and use in private self-regulation, standardization,
and the promotion of engineering practices and methods. By making these documents available for use and
adoption by public authorities and private users, IEEE does not waive any rights in copyright to the
documents.
4
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
Photocopies
Subject to payment of the appropriate fee, IEEE will grant users a limited, non-exclusive license to
photocopy portions of any individual standard for company or organizational internal use or individual,
non-commercial use only. To arrange for payment of licensing fees, please contact Copyright Clearance
Center, Customer Service, 222 Rosewood Drive, Danvers, MA 01923 USA; +1 978 750 8400. Permission
to photocopy portions of any individual standard for educational classroom use can also be obtained
through the Copyright Clearance Center.
Every IEEE standard is subjected to review at least every ten years. When a document is more than ten
years old and has not undergone a revision process, it is reasonable to conclude that its contents, although
still of some value, do not wholly reflect the present state of the art. Users are cautioned to check to
determine that they have the latest edition of any IEEE standard.
In order to determine whether a given document is the current edition and whether it has been amended
through the issuance of amendments, corrigenda, or errata, visit the IEEE Xplore at
http://ieeexplore.ieee.org/ or contact IEEE at the address listed previously. For more information about the
IEEE-SA or IEEE’s standards development process, visit the IEEE-SA Website at http://standards.ieee.org.
Errata
Errata, if any, for all IEEE standards can be accessed on the IEEE-SA Website at the following URL:
http://standards.ieee.org/findstds/errata/index.html. Users are encouraged to check this URL for errata
periodically.
Patents
Attention is called to the possibility that implementation of this standard may require use of subject matter
covered by patent rights. By publication of this standard, no position is taken by the IEEE with respect to
the existence or validity of any patent rights in connection therewith. If a patent holder or patent applicant
has filed a statement of assurance via an Accepted Letter of Assurance, then the statement is listed on the
IEEE-SA Website at http://standards.ieee.org/about/sasb/patcom/patents.html. Letters of Assurance may
indicate whether the Submitter is willing or unwilling to grant licenses under patent rights without
compensation or under reasonable rates, with reasonable terms and conditions that are demonstrably free of
any unfair discrimination to applicants desiring to obtain such licenses.
Essential Patent Claims may exist for which a Letter of Assurance has not been received. The IEEE is not
responsible for identifying Essential Patent Claims for which a license may be required, for conducting
inquiries into the legal validity or scope of Patents Claims, or determining whether any licensing terms or
conditions provided in connection with submission of a Letter of Assurance, if any, or in any licensing
agreements are reasonable or non-discriminatory. Users of this standard are expressly advised that
determination of the validity of any patent rights, and the risk of infringement of such rights, is entirely
their own responsibility. Further information may be obtained from the IEEE Standards Association.
5
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
Participants
At the time this IEEE recommended practice was completed, the IEEE 1633 Working Group had the
following membership:
The following members of the individual balloting committee voted on this recommended practice.
Balloters may have voted for approval, disapproval, or abstention.
6
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
When the IEEE-SA Standards Board approved this recommended practice on 22 September 2016, it had
the following membership:
*Member Emeritus
7
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
Introduction
This introduction is not part of IEEE Std 1633-2016, IEEE Recommended Practice on Software Reliability.
Software is, from a materials viewpoint, both malleable and ductile. This means there are multiple ways to
introduce failures, intentional and un-intentional. Fixing a software defect can introduce a potential defect.
In many cases the failures that result from software defects are both predictable and avoidable but they still
occur because of the following:
a) Lack of available calendar time/resources to find all of the defects that can result in failures
b) Exceedingly complex event driven systems that are difficult to conceptualize and therefore
implement and test
c) Organizational culture that neglects to support sufficient rigor, skills, or methods required to find
the defects
d) Technical decisions that result in incorrect architecture or design decision that cannot support the
stakeholders specifications
e) Insufficient project or risk management that leads to schedule delays that lead to less time for
reliability testing
f) Operations—Contract issues, interoperability due to bad specifications and stakeholder
communications
Even a small number of software failures can lead to monetary catastrophes such as a cancelled project.
Hardware (HW) failures can be random, due to wear-out or the result of a systematic design flaw.
Reliability maintainability availability (RMA) is used to prevent and deal with hardware failures. Software
failures may result from systematic flaws in the requirements, design, code or interfaces. Hence, software
failure does not require an RMA but instead a corrective action to an existing installation. Software failures
can be common cause failures in that the same failure mode can cause multiple failures in more than one
part of the software.
Software reliability engineering (SRE) is an established discipline that can help organizations improve the
reliability of their products and processes. It is important for an organization to have a process discipline if
it is to produce high reliability software. These are specific practices and recommendations, each of which
has a context within the software engineering life cycle. A specific practice may be implemented or used in
a particular stage of the life cycle or used across several stages. Figure 1 shows how the focus of SRE shifts
as a project progresses from inception to release. The size of each bubble on this figure corresponds to how
much the particular SRE practices are being executed during each particular phase of development or
operation. For example in software engineering projects, the failure modes and effects analysis (FMEA) is
typically performed earlier in the life cycle.
8
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
Figure 1 —SRE focus by stage
The scope of this recommended practice is to address software reliability (SR). It does not specifically
address systems reliability, software safety, or software security. However, it does recognize that safety and
security requirements are part of the initial risk assessment. The recommended practice only briefly
addresses software quality. This recommended practice provides a common baseline for discussion and
prescribes methods for assessing and predicting the reliability of software. The recommended practice is
intended to be used in support of designing, developing, and testing software and to provide a foundation
on which practitioners can build consistent methods for assessing the reliability of software. It is intended
to meet the needs of software practitioners and users who are confronted with varying terminology for
reliability measurement and a plethora of models and data collection methods. This recommended practice
contains information necessary for the application of SR measurement to a project. This includes SR
activities throughout the software life cycle (SLC) starting at requirements generation by identifying the
application, specifying and analyzing requirements, and continuing into the implementation.
Common terminology
Assessment of software reliability risks that pertain to the software or project
Software failure mode analyses that can help to identify and reduce the types of defects most likely
to result in a system failure
Models for predicting software reliability early in development
Models for estimating software reliability in testing and operation
Test coverage and test selection
Data collection procedures to support SR estimation and prediction
Determining when to release a software system, or to stop testing the software and implement
corrections
Identifying elements in a software system that are leading candidates for redesign to improve
reliability
9
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
Revisions to the document and notes
This recommended practice contains six clauses and seven annexes as follows:
Content appearing in Clause 4 Roles, approach, concepts, including all subclauses; 5.4.3 Measure test
coverage, 5.5 Support release decision; adapted with permission of Robert V. Binder, Beware of Greeks
bearing data, 2014.
1
Every effort has been made to secure permission to reprint borrowed material contained in this document. If omissions have been
made, please bring them to our attention.
10
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
Content and tables appearing in 5.1.1.1, 5.1.1.3, 5.1.3.4, 5.1.3.5, 5.3.1, 5.3.2, 5.3.3, 5.3.5.2, 5.3.5.3, 5.3.8,
5.4.7, 6.1, 6.2.1.1, 6.2.1.2, 6.2.2.1, B.2.1, B.2.3, B.3, Table 12 “Keywords associated with common root
causes for defects,” Annex D, F.3, F.4.4, F.4.5, Table 48 “Average defect densities by application type
(EKSLOC),” Table 45 “Factors in determining root cause inaccuracies” reprinted with permission of Ann
Marie Neufelder, Softrel, LLC “Software Reliability Toolkit” © 2015.
Content and tables appearing in 5.4.5, 6.3.2, 6.3.3, F.3, F.6 reprinted with permission of Ann Marie
Neufelder, Softrel, LLC “Advanced Software Reliability” © 2015.
Figure 27 “SFMEA process,” 5.2.2, F.4.3, and all tables in Annex A reprinted with permission from Ann
Marie Neufelder, Softrel, LLC “Effective Application of Software Failure Modes Effects Analysis” ©
2014.
Table 8 “Relationship between risks and outcome” reprinted with permission of Softrel, LLC. “Four things
that are almost guaranteed to reduce the reliability of a software intensive system,” Huntsville Society of
Reliability Engineers RAMS VII Conference © 2014.
A portion of 5.5.1 has been reprinted with permission from Lockheed Martin Corporation article entitled
“Determine Release Stability” © 2015 Lockheed Martin Corporation. All rights reserved.
A portion of 5.5.4 has been reprinted with permission from Lockheed Martin Corporation article entitled
“Perform a Reliability Demonstration Test (RDT)” © 2015 Lockheed Martin Corporation. All rights
reserved.
Table 47 Shortcut Model Survey and Table F.9 Example of the Shortcut Model Survey reprinted with
permission Softrel, LLC “A Practical Toolkit for Predicting Software Reliability” © 2006.
Table 49 reprinted with permission from Capers Jones, “Software Industry Blindfolds: Invalid Metrics and
Inaccurate Metrics,” Namcook Analytics, 2005.
“Elevator Example” in F.4.2 reprinted with permission from Peter B. Lakey, Operational Profile Testing
© 2016.
11
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
Contents
1. Overview ...................................................................................................................................................14
1.1 Scope ..................................................................................................................................................14
1.2 Purpose ...............................................................................................................................................14
2. Normative references.................................................................................................................................14
Annex B (informative) Methods for predicting software reliability during development ...........................176
B.1 Methods for predicting code size .....................................................................................................176
B.2 Additional models for predicting defect density or defects..............................................................180
B.3 Factors that have been correlated to fielded defects.........................................................................186
Annex C (informative) Additional information on software reliability models used during testing ...........189
C.1 Models that can be used when the fault rate is peaking ...................................................................189
C.2 Models that can be used when the fault rate is decreasing ...............................................................189
C.3 Models that can be used with increasing and then decreasing fault rate ..........................................194
C.4 Models that can be used regardless of the fault rate trend ...............................................................196
C.5 Models that estimate remaining defects ...........................................................................................197
C.6 Results of the IEEE survey ..............................................................................................................198
12
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
Annex F (informative) Examples ................................................................................................................205
F.1 Examples from 5.1 ...........................................................................................................................205
F.2 Examples from 5.2 ...........................................................................................................................209
F.3 Examples from 5.3 ...........................................................................................................................220
F.4 Examples from 5.4 ...........................................................................................................................230
F.5 Examples from 5.5 ...........................................................................................................................251
F.6 Examples from 5.6 ...........................................................................................................................252
13
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Recommended Practice on
Software Reliability
1. Overview
1.1 Scope
This recommended practice defines the software reliability engineering (SRE) processes, prediction
models, growth models, tools, and practices of an organization. This document and its models and tools are
useful to any development organization to identify the methods, equations, and criteria for quantitatively
assessing the reliability of a software or firmware subsystem or product. Organizations that acquire
software subsystems or products developed with consideration to this recommended practice will benefit by
knowing the reliability of the software prior to acquisition. This document does not seek to certify either
the software or firmware or the processes employed for developing the software or firmware.
1.2 Purpose
The purpose for assessing the reliability of a software or firmware subsystem or product is to determine
whether the software has met an established reliability objective and facilitate improvement of product
reliability. The document defines the recommended practices for predicting software reliability (SR) early
in development so as to facilitate planning, sensitivity analysis and trade-offs. This document also defines
the recommended practices for estimating SR during test and operation so as to establish whether the
software or firmware meets an established objective for reliability.
2. Normative references
The following referenced documents are indispensable for the application of this document (i.e., they must
be understood and used, so each referenced document is cited in text and its relationship to this document is
explained). For dated references, only the edition cited applies. For undated references, the latest edition of
the referenced document (including any amendments or corrigenda) applies.
IEEE Std 12207™-2008, ISO/IEC/IEEE Standard for Systems and Software Engineering—Software Life
Cycle Processes. 1, 2
1
The IEEE standards or products referred to in this clause are trademarks of The Institute of Electrical and Electronics Engineers, Inc.
2
IEEE publications are available from The Institute of Electrical and Electronics Engineers (http://standards.ieee.org/).
14
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
3.1 Definitions
agile (software development): (A) IEEE definition: software development approach based on iterative
development, frequent inspection and adaptation, and incremental deliveries, in which requirements and
solutions evolve through collaboration in cross-functional teams and through continuous stakeholder
feedback. (B) Agile Consortium definition: software development approach based on iterative
development, frequent inspection and adaptation, and incremental deliveries, in which requirements and
solutions evolve through collaboration in cross-functional teams and through continuous stakeholder
feedback.
assessment: (A) An action for applying specific documented criteria to a specific software module,
package, or product for the purpose of determining acceptance or release of the software module, package,
or product. (ISO/IEC/IEEE 24765™-2010) (B) Determining what action to take for software that fails to
meet goals (e.g., intensify inspection, intensify testing, redesign software, and revise process).
NOTE—The formulation of test strategies is also part of assessment. Test strategy formulation involves the
determination of priority, duration, and completion date of testing, allocation of personnel, and allocation of computer
resources to testing. 4
calendar time: Chronological time, including time during which a computer may not be running.
clock time: Elapsed wall-clock time from the start of program execution to the end of program execution.
defect: (A) A problem that, if not corrected, could cause an application to either fail or to produce incorrect
results. (ISO/IEC/IEEE 24765-2010)
NOTE—For the purposes of this standard, defects are the result of errors that are manifest in the system requirements,
software requirements, interfaces, architecture, detailed design, or code. A defect may result in one or more failures. It
is also possible that a defect may never result in a fault if the operational profile is such that the code containing the
defect is never executed.
defect pileup: When residual defects in the software are not removed, and over time, the number increases
to the point of adversely affecting the reliability and schedule of the software releases.
error: A human action that produced an incorrect result, such as software containing a fault. Examples
include omission or misinterpretation of user requirements in a software specification, incorrect translation,
or omission of a requirement in the design specification. (ISO/IEC/IEEE 24765-2010)
NOTE—For the purposes of this standard, an error can also include incorrect software interfaces, software
architecture, design, or code.
evolutionary development: Developing and delivering software in iterative drops to the customer with a
concentration on developing the most important or least understood requirements first, and once those are
working and approved by the customer, improve the requirements, design, and testing to increase the
functionality to meet the stakeholders’ desired functionality and performance. This is especially useful for
development of new products. See also: incremental development.
NOTE—See Larman [B51]. 5
3
IEEE Standards Dictionary Online is available at: http://ieeexplore.ieee.org/.
4
Notes in text, tables, and figures of a standard are given for information only and do not contain requirements needed to implement
this standard.
5
The numbers in brackets correspond to those of the bibliography in Annex G.
15
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
execution time: (A) The amount of actual or central processor time used in executing a program. (B) The
period of time during which a program is executing.
NOTE—Processor time is usually less than elapsed time because the processor may be idle (for example, awaiting
needed computer resources) or employed on other tasks during the execution of a program.
failure: (A) The inability of a system or system component to perform a required function within specified
limits. (B) The termination of the ability of a product to perform a required function or its inability to
perform within previously specified limits. (ISO/IEC/IEEE 24765-2010) (C) A departure of program
operation from program requirements. A failure may be produced when a fault is encountered and a loss of
the expected service results.
NOTE 1—A failure may be produced when a fault is encountered and a loss of the expected service to the user results.
NOTE 2—There may not be a one-to-one relationship between faults and failures. This can happen if the system has
been designed to be fault tolerant. It can also happen if a fault does not result in a failure either because it is not severe
enough to result in a failure or does not manifest into a failure due to the system not achieving that operational or
environmental state that would trigger it.
failure intensity: Total failures observed over total operational hours experienced.
failure rate: (A) The ratio of the number of failures of a given category or severity to a given period of
time; for example, failures per second of execution time, failures per month. Syn: failure intensity. (B) The
ratio of the number of failures to a given unit of measure, such as failures per unit of time, failures per
number of transactions, failures per number of computer runs.
failure severity: A rating system for the impact of every recognized credible software failure mode.
fault: (A) A defect in the code that can be the cause of one or more failures. (B) A manifestation of an error
in the software. (ISO/IEC/IEEE 24765-2010)
NOTE—There may not necessarily be a one-to-one relationship between faults and failures if the system has been
designed to be fault tolerant or if a fault is not severe enough to result in a failure.
fault tolerance: (A) The survival attribute of a system that allows it to deliver the required service after
faults have manifested themselves within the system. (B) The ability of a system or a component to
continue normal operation despite the presence of hardware or software faults. (ISO/IEC/IEEE 24765-
2010)
firmware: The combination of a hardware device, software, and data that are incorporated into the
hardware device.
NOTE—For compliance with this standard, firmware is treated as software in a programmable device. (Adapted from
IEEE Std 610.12™-1990)
function point: (A) A unit of measurement to express the amount of business functionality an information
system (as a product) provides to a user. (B) A unit that expresses the size of an application or of a project
(ISO/IEC/IEEE 24765-2010) Function points measure software size. The functional user requirements of
the software are identified, and each one is categorized into one of five types: outputs, inquiries, inputs,
internal files, and external interfaces.
integration: The process of combining software elements, hardware elements, or both into an overall
system.
16
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
life-cycle model: A framework containing the processes, activities, and tasks involved in the development,
operation, and maintenance of a software product, spanning the life of the system from the definition of its
requirements to the termination of its use. (ISO/IEC/IEEE 24765-2010) Contrast: software development
cycle.
maximum likelihood estimation: A form of parameter estimation in which selected parameters maximize
the probability that observed data could have occurred.
module: (A) A program unit that is discrete and identifiable with respect to compiling, combining with
other units, and loading; for example, input to or output from an assembler, compiler, linkage editor, or
executive routine. (B) A logically separable part of a program.
operational: (A) Pertaining to the status given a software product once it has entered the operation and
maintenance phase. (B) Pertaining to a system or component that is ready for use in its intended
environment.
reliability growth: (A) The amount the software reliability improves from operational usage and other
stresses. (B) The improvement in reliability that results from correction of faults.
requirement reliability risk: The probability that requirements changes will decrease reliability.
scrum: (A) An iterative and incremental agile software development methodology for managing product
development. (B) The iterative project management framework used in agile development, in which a team
agrees on development items from a requirements backlog and produces them within a short duration of a
few weeks.
sprint: (A) The basic unit of development in agile/scrum. The sprint is restricted to a specific duration. The
duration is fixed in advance for each sprint and is normally between one week and one month, with two
weeks being the most common. (B) The short time frame, in which a set of software features is developed,
leading to a working product that can be demonstrated to stakeholders.
software development cycle: (A) The period of time that begins with the decision to develop a software
product and ends when the software is delivered. (ISO/IEC/IEEE 24765-2010) For the development part of
the software life-cycle processes this would include practices for planning, creating, testing, and deploying
a software system.
software engineering: (A) The application of a systematic, disciplined, quantifiable approach to the
development, operation, and maintenance of software; that is the application of engineering to software. (B)
The systematic application of scientific and technological knowledge, methods, and experience to the
design, implementation, testing, and documentation of software. (ISO/IEC/IEEE 24765-2010)
software quality: (A) The totality of features and characteristics of a software product that bear on its
ability to satisfy given needs, such as conforming to specifications. (B) The degree to which software
possesses a desired combination of attributes. (C) The degree to which a customer or user perceives that
software meets the user’s composite expectations. (D) The composite characteristics of software that
determine the degree to which the software in use will meet the expectations of the customer. (E)
Capability of the software product to satisfy stated and implied needs when used under specified
conditions. (ISO/IEC/IEEE 24765-2010)
software reliability (SR): (A) The probability that software will not cause the failure of a system for a
specified time under specified conditions. (B) The ability of a program to perform a required function under
stated conditions for a stated period of time.
NOTE—For definition (A), the probability is a function of the inputs to and use of the system, as well as a function of
the existence of defects in the software. The inputs to the system determine whether existing defects, if any, are
encountered.
17
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
software reliability engineering (SRE): (A) The application of statistical techniques to data collected
during system development and operation to specify, estimate, or assess the reliability of software-based
systems. (B) The application of software reliability best practices to enhance software reliability
characteristics of software being developed and integrated into a system.
software reliability estimation: The application of statistical techniques to observed failure data collected
during system testing and operation to assess the reliability of the software.
software reliability model: A mathematical expression that specifies the general form of the software
failure process as a function of factors such as fault introduction, fault removal, and the operational
environment.
software reliability prediction: A forecast or assessment of the reliability of the software based on
parameters associated with the software product and its development environment.
NOTE—Reliability predictions are a measure of the probability that the software will perform without failure over a
specific interval, under specified conditions.
CR capture/recapture
DD defect density
6
Capability Maturity Model Integrated and CMMI are registered in the U.S. Patent and Trademark Office by Carnegie Mellon
University. This information is given for the convenience of users of this standard and does not constitute an endorsement by the
IEEE of these products. Equivalent products may be used if they can be shown to lead to the same results.
18
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
FD fault days
FI fault injection
FW firmware
GO Goel-Okumoto
HW hardware
IR intermediate representations
JM Jelinski-Moranda
kB kilobytes
7
Java is a trademark of Sun Microsystems, Inc. in the United States and other countries. This information is given for the
convenience of users of this standard and does not constitute an endorsement by the IEEE of these products. Equivalent products
may be used if they can be shown to lead to the same results.
19
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
OP operational profile
OS operating system
RG reliability growth
RT requirements traceability
SR software reliability
SW software
20
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
UI user interface
WG Working Group
IEEE Std 610.12-1990 defines reliability as: “The ability of a system or component to perform its required
functions under stated conditions for a specified period of time.” Software reliability engineering (SRE)
supports cost-effective development of high-reliability software.
While reliability is a critical element of software quality, it is not the only one. SRE does not directly
address modeling or analysis of other important quality attributes, including usability, security, or
maintainability. There are many proven practices and technologies that address the full spectrum of
software quality as well as specific aspects. SRE should be applied in concert with them.
SRE can include analyses to prevent or remove software defects that lead to failures. SRE can also provide
unambiguous and actionable information about likely operational reliability throughout the software
development cycle. Just as the accuracy of weather forecasts is generally better for the very near term, SRE
predictions are more accurate as the software development nears completion.
4.2 Strategy
SRE allows developers to have confidence that a software system will meet or exceed its quantitative and
qualitative reliability requirements. SRE uses a data-driven strategy to achieve this, which includes but is
not limited to the following process areas, analyses, and metrics:
a) Planning—Determine the scope of the software reliability (SR) program based on the availability
of resources and the needs of the project. Identify the software line replaceable units (LRUs) that
will be included in the SRE.
b) Analyzing—Assess the software related risks and perform the failure mode analyses early to
develop requirements and design that support more reliable software.
c) Prediction—Predict SR in accordance with the system operational profile (OP) and performing the
sensitivity analysis early to identify appropriate goals for the SR and identify weaknesses in the
planned development activities that could affect the reliability of the software. Define requirements
for acceptable failure rate, residual defects, defect backlog, availability, or other quantitative
measure taking into account the criticality and interface and architectural dependencies.
d) Testing—Conduct testing that is representative of field usage while also covering the requirements,
design, and code so that the software is tested as per its OP. Develop test suites that achieve
representative sampling of the operations, modes, and stresses. Count the number of defects
discovered (each event is a fault) and corrected, and monitor the discovery rate. As needed revise
the predictions. Estimate the system’s operational reliability from the frequency and trend of
observed failures using a reliability growth model or failure intensity graph. Determine if the
estimated number of residual defects is acceptable.
21
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
e) Release—Use SRE to support a release decision. Analyze trends in faults and failures to support
transitioning from software milestone decisions. Report the release readiness of the software based
on its estimated operational reliability and estimated number of residual defects.
f) Operation—Monitor and model the number and rate of software failures so as to improve upon
future SR analyses.
Each process area is presented in Clause 5. The SR procedures shown in Clause 5 can be applied regardless
of the life-cycle model. The tasks are performed whenever the associated development activity is
performed unless specified otherwise. The clauses of this document are aligned with the six activities
shown in Figure 2.
22
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) System functions and features typically are added, revised, or dropped. Are the operations and
usage assumptions of the OP the same? If not, has the OP and the reliability test plan been revised
accordingly?
b) Has the reliability test suite been maintained so that there is no loss of coverage owing to changes
in the system under test?
c) If any of the following have been changed, have the related models, plans, or test design been
changed accordingly?
1) Reliability requirements
2) Identified risks
3) Identified failure modes and effects
4) Critical operating modes
5) Security, performance, vulnerability, and other related risks
6) Component dependencies
7) Components sourced from a different supplier
d) How do the assumed risk factors (e.g., code size, code complexity, number of interfaces, or team
capability) compare to actual risk metrics in the present and upcoming increment/sprint?
e) For completed components with adequate reliability testing, how does the actual failure data
compare with predicted failure intensity?
f) If component testing was limited or blocked, in what future increment or sprint will it be resolved?
g) Does the reliability model appear to be accurately tracking the observed failure discovery rate?
The purpose of SRE is to assure that released software meets its reliability requirements using both data-
driven quantitative modeling and analysis as well as qualitative analyses. All of the practices in this
recommended practice support that goal. Some of them are essential, some are typically useful, and some
are project specific. SRE tailoring is the process of deciding which recommended practices to apply to a
specific project. Following the tailoring process defined in IEEE Std 12207™-2008, Annex A, is
recommended. 8 Taking these general conditions into consideration, the practitioner should answer the
following questions in Figure 4.
8
Information on references can be found in Clause 2.
23
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Table 1 lists the recommended practices defined in Clause 5 of this recommended practice, indicating the
tailorability of each, as follows:
Essential activities should be performed to achieve the basic goals of SRE. If an essential activity
is omitted, the risk of an incorrect prediction and inadequate operational reliability is significant.
Typical activities are usually performed to support essential activities and should be included
unless there is a good reason to omit them. For example, in a system with just a single software
component, multi-component allocation is not necessary.
Project specific activities are usually performed only when a specific condition warrants their
inclusion. For example, in a non-critical system, developing and testing with a separate OP of
critical operations is not needed.
Table 1 also illustrates the SRE activities based on the role of the personnel who will typically perform or
assist with the activity. Organizations vary with regard to engineering roles, hence the following are the
typical roles and responsibilities.
Reliability engineers—These engineers typically have a background in hardware reliability but not
necessarily a background in software engineering or SR. Their role is typically to do predictions
and merge the predictions into the reliability block diagram (RBD) to yield a system reliability
prediction. They also perform allocations that include both software and hardware.
Software management—These engineers manage the day-to-day development activities, which
also include scheduling of personnel. Software management uses SR predictions to predict the
effort required to maintain the software, schedule the spacing of the releases so as to avoid “defect
pileup,” schedule the testing and corrective action resources necessary to determine that the
software meets the required objective, and perform a sensitivity analysis to determine the
development practices that are most and least related to reliable software.
Software quality assurance or testing—These engineers are responsible for audits and
assessments of the software deliverables as well as testing of the software. They are typically the
persons who use the SR growth models during testing to determine how many more defects are
remaining and the test effort required to discover them.
Acquisitions—Acquisitions personnel can be either commercial or government employees. They
are purchasing or acquiring software components or an entire system containing software. They
may use SR assessments to select or evaluate subcontractors, contractors, or suppliers. They may
also use the predictions to establish a system reliability objective to be used for contractual
purposes. They also use the SR models to monitor the progress of the software and system
reliability.
Reliability engineering and software engineering historically have not had defined relationships. Hence, a
SR liaison may be defined to coordinate the efforts of the software organization with the reliability
engineering organization. The SR liaison may be a reliability engineer who is knowledgeable about
24
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
software or a software engineer who is knowledgeable about reliability engineering or a system engineer
who is knowledgeable about both.
Typically the SR work is performed at the beginning of a program and at every milestone during the
program. It is important that the reliability engineers schedule their efforts to work with all involved parties
in preparation for those milestones. Depending on the life-cycle model (LCM), the milestones can vary,
hence in 4.4, the SRE effort is illustrated for several different software LCMs.
The following figure illustrates a summary of the role of the software managers, reliability engineers,
software quality engineers, and acquisitions. Clause 5 describes the data flow in more detail. Figure 5
illustrates some of the key data that is input to the reliability models and data as well as the key metrics that
are retrieved from the SRE models and data.
Role
RE—Reliability engineer or systems engineer
SQA—Software quality assurance or testing
SM—Software management
ACQ—Acquisitions personnel who are purchasing or acquiring software from other organization
Tailoring
E—Essential
T—Typical
PS—Project specific
25
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Reference
standards
Tailoring
Related
Topic
ACQ
SQA
SM
RE
5.1 Planning for software reliability—These are SRE activities that need to be performed prior to using any SRE
models or analyses
26
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Reference
standards
Tailoring
Related
Topic
ACQ
SQA
SM
RE
5.3 Apply SR during development—All of these activities can be IEC 61014:2003 [B29]
performed well before the code is written or tested. IEEE P24748-5 [B30]
5.3.1 Identify/obtain the initial system E L C C L
reliability objective—The required or or
or desired MTBF, failure rate, R R
availability, reliability for the
entire system.
5.3.2 Perform a SR assessment and E L X X L
prediction—Predict the MTBF, C or
failure rate, availability, reliability, R
and confidence bounds for each
software LRU.
5.3.3 Sanity check the prediction— T L L
Compare the prediction to
established ranges for similar
software LRUs
5.3.4 Merge the predictions into the E L R
overall system predictions—
Various methods exist to combine
the software and hardware
predictions into one system
prediction.
5.3.5 Determine an appropriate overall E L X X X
SR requirement —Now that the C R
system prediction is complete,
revisit the initial reliability
objective and modify as needed.
5.3.6 Plan the reliability growth— E L X X R
Compute the software predictions C C C
during and after a specific level of
reliability growth testing.
5.3.7 Perform a sensitivity analysis— PS X L
Identify the practices, processes, C
techniques, risks that are most
sensitive to the predictions.
Perform trade-offs.
5.3.8 Allocate the required reliability to T L X R See NOTE 2.
the software LRUs—The software C
and hardware components are
allocated a portion of the system
objective based on the predicted
values for each LRU.
5.3.9 Employ SR metrics for transition T X L X R Software and Systems Engineering
to testing—These metrics C C Vocabulary (SEVOCAB):
determine if the software is stable http://pascal.computer.org/sev_display/index.
enough to be tested. action
27
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Reference
standards
Tailoring
Related
Topic
ACQ
SQA
SM
RE
5.4 Apply SR during testing—These SRE activities are used once the ISO/IEC 29119-1:2013, Concepts and
software LRUS have been integrated. Definitions (published September 2013);
ISO/IEC 29119-2: Test Processes (published
September 2013); ISO/IEC 29119-3: Test
Documentation (published September 2013);
ISO/IEC 29119-4: Test Techniques (at DIS
stage, anticipating publication in late 2014)
28
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Reference
standards
Tailoring
Related
Topic
ACQ
SQA
SM
RE
5.5 Support release decision—SR results can be used to determine if IEEE Std 1012-2012 [B33]
the software is ready for release. If it is not, the models can
determine how much more testing time and defects are required to
be removed to be ready for release.
5.5.1 Determine release stability E C L C R IEEE Std 1012-2012 [B33], 7.2.2,
particularly 7.2.2.3.6. See NOTE 3.
5.5.2 Forecast additional test duration— T C L C
If the required reliability is not
met, determine how many more
testing hours are required to meet
it.
5.5.3 Forecast remaining defects and E C L C
effort required to correct them—If
the required reliability is not met,
determine how many more defects
and staffing is required to meet it.
5.5.4 Perform a Reliability PS L R
Demonstration Test (RDT)—
Statistically determine whether a
specific reliability objective is met.
5.6 Apply SR in operation—All of these activities are employed once the software is deployed.
5.6.1 Employ SR metrics to monitor T X X X
operational SR—It is important to C
monitor operational reliability to
know how the reliability in the
field compares to the estimated and
predicted.
5.6.2 Compare operational reliability to T L R
predicted reliability.
5.6.3 Assess changes to previous T X X X
characterizations or analyses— C
Adjust the reliability models inputs
and assumptions accordingly to
improve the next prediction.
5.6.4 Archive operational data— T X X X X
Organize the data for use for future C
predictions.
6.2, Methods for predicting defect E X L R
Annex density and fault profile.
B
6.3, SR growth models E C L R
Annex
C
NOTE 1—Within the framework of the standards harmonized with the IEEE Std 12207-2008, the concept of reliability is
often bundled with the concept of integrity and expanded in the discussion of integrity levels.
NOTE 2—This step may be implemented within the software requirements analysis process in 7.1.2 of IEEE Std 12207-2008.
NOTE 3—This step may be implemented within the decision management and measurement processes in 6.3.3 and 6.3.7,
respectively, of IEEE Std 12207-2008. See also details for verification and validation processes in IEEE Std 1012-2012 [B33].
29
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Annex D contains a relative ranking of the culture change, effort, calendar time, and automation of each of
the preceding tasks. Several of the preceding tasks can be merged with and sometimes even replace an
already existing software development or reliability engineering task.
The general SRE activities of planning, analysis, testing, and acceptance may be applied to any type of
software product and in any kind of LCM. Although the underlying practices are independent of a process
model, considerations for application of SRE practices varies among these processes.
a) Phased: Often called “waterfall” development, in a phased project all requirements are defined
first. Then design, programming, and testing follow in sequence, with backtracking as necessary.
US DoD standard 2167-A provides an example.
b) Incremental: Most requirements are defined first, then partitioned into subsets. Requirement
subsets are designed, programmed, and tested, typically in a sequence of subsets (“increments.”)
Later increments typically extend or revise the requirements and artifacts of earlier increments. The
Rational Unified Process is a widely used incremental process (see [B74]).
c) Evolutionary: Often called “agile” development, an evolutionary project begins with a general
notion of a solution. This is divided into subsets to be explored and implemented in short cycles,
typically spanning about one month. In each such cycle, usage examples are identified and
expressed as test conditions, which are prerequisite for programming. Other work products
antecedent to delivered software (e.g., line-item requirements, design documentation) are eschewed
in favor of direct interaction with system users or customers. Programming and unit testing is
conducted to implement the usage examples selected for a cycle. There are many variations.
In practice, the preceding incremental and evolutionary LCMs are adapted and refined to meet practical
considerations.
Being data-driven and working from high-level models to actual implementation measurements, most SRE
practices require the results of one or more activities as an input to another. For example, defect predictions
require certain codebase metrics and system-scope reliability predictions require either an actual or
assumed system structure. As incremental and evolutionary processes produce a series of partial
antecedents and implementation, SRE activities should be adjusted accordingly.
The charts in the following subclause show how the software development processes defined in
IEEE Std 1220-2008 may be achieved under phased (waterfall), incremental, and evolutionary (agile)
development strategies. The IEEE 12207 processes are depicted as horizontal lines. The non-dimensional
arrow of time points left to right. The solid area above a line indicates the relative effort allocated to that
process. Roughly, the total colored area indicates the total effort expended on the process. The implicit
vertical axis for each process is the percent of the process’ total effort. Upward and downward ramps
suggest increasing and decreasing work. These charts are intended to suggest how SRE practices align with
these basic process patterns. The charts are notional and suggestive. They are not intended to be
prescriptive. They show that with the necessary changes, SRE may be applied in a wide range of LCMs.
All of the procedures in this document can be used whether there is a waterfall life-cycle model or an
incremental life-cycle model. Table 2 summarizes how the procedures are applied for each.
30
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
5.4, 6.3, and Reliability growth models can be used as Reliability growth models can be used for each
Annex C soon as the software is integrated with other increment. The results of each increment can be merged
software and with the hardware for a final reliability growth estimation.
Figure 6 shows that the processes in a phased life cycle are sequential (a phase may not begin until its
antecedent phase is substantially complete) but not mutually exclusive. The long tails indicate that work to
accommodate revisions and debugging typically continues for the duration of a project, in most of the
processes. The Integration segments assume that partial integration testing is performed as some slice of the
system is complete.
This follows common practical use of the phased model, not the naïve assumption of strictly sequenced
phases.
31
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
cycle. The height and ramps of solid areas closely follow Jacobson et al. Where the UPM defines
“implementation” and “test” workflows, these are interpreted as the IEEE 12207 processes “construction,”
“integration,” and “qualification testing.”
An incremental project seeks to identify most of the requirements early on. A provisional design is
produced and revised in parallel. The requirements and architecture that drive SRE planning, analysis, and
prediction are typically substantially complete after the elaboration phase. The system is then constructed in
increments, typically with their own integration and end-to-end testing. Revision and elaboration to all
work products are expected in the construction phase. As a result, SRE predictions should be revised as
new information becomes available. Figure 7 suggests how the focus of SRE activities varies as an
incremental project progresses.
Inception—This is a short phase in which the following things are established: justification for the project;
project scope and boundary conditions; use cases and key requirements that will drive the design trade-offs;
one or more candidate architectures; identified risks; preliminary project schedule and cost estimate.
Elaboration—In this phase the majority of requirements are captured; use cases, conceptual, and package
diagrams created.
Construction—The largest phase in which system features are implemented in a series of short, timed
iterations.
32
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
As a result, a system is constructed in many small steps. These steps can produce the software metrics and
failure data that drive SRE practices, but until the final sprint, information needed for SRE planning,
modeling, and analysis is necessarily incomplete. Refer to Figure 8.
Some evolutionary approaches call for one or more planning sprints and one or more transitional sprints
(Ambler and Lines [B2]). A key difference of these sprints is that they are not necessarily intended to
produce working software. Planning sprints may be used to initiate SRE planning and analysis. Transitional
sprints may be used to apply SRE testing and release evaluation to a release candidate in its entirety. For
evolutionary projects, the recommended practice is therefore:
Include one or more planning sprints at the start of a project to establish a minimal system-scope
baseline of SRE planning and analysis artifacts.
Include one or more reliability sprints at the end of a project to conduct system-scope testing on a
completed release candidate so that meaningful reliability predictions may be developed and
evaluated. Construction and integration should be limited to making and verifying corrective action
fixes, followed by rerunning the system-scope reliability test suite.
33
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Prior to implementing SRE there are some planning tasks that should be performed first to increase the
effectiveness of the SRE activities. Figure 10 illustrates the inputs and outputs and flow of the SRE
planning tasks.
34
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Table 3 illustrates the benefits and applicability for incremental development models. Note that all of the
planning tasks are applicable for incremental or evolutionary LCMs.
Before the SR metrics and models can be used, the following things need to be identified:
This task is essential and is a prerequisite for performing a SFMEA (5.2.2), performing a reliability
assessment and prediction (5.3.2), and allocating the required SR to all software LRUs (5.3.8). The
software manager(s) have the primary responsibility for identifying the software LRUs and making them
available to the reliability engineers and other stakeholders. The reliability engineers need to have the list of
software LRUs prior to performing any predictions. The list of software LRUs will determine the scope of
the SRE effort, which ultimately affects the SRPP in 5.1.6. If the software LRUs are all developed by
different organizations, more effort is required for the SRE activities than if all LRUs are developed by the
same organization.
A software LRU is the lowest level of architecture for which the software can be compiled and object code
generated. Software configuration items are commonly referred to a computer software configuration item
(CSCI). However, a CSCI may be composed of more than one LRU. Hence the term LRU will be used in
this document. While hardware components are replaced with identical hardware components, software
components are updated whenever software defects are corrected. The lowest replacement unit for software
will be either a dynamically linked library (DLL) or an executable. In even a small system there will be
usually more than one software LRU. In medium or large systems there may be dozens of software LRUs.
If the software has been designed cohesively each LRU can fail independently of the others. For example,
35
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
if the GPS in a car is not functioning properly due to software, one should still be able to drive the car, use
the stereo, retract the roof, use the rear camera when in reverse, etc. It is a common practice to predict the
reliability of all of the software LRUs combined. However, this is not a recommended practice unless there
is only one replaceable unit, which is rarely the case with large complex software intensive systems. The
practitioner should identify the software LRUs so that they can be properly added to the RBD or other
system reliability model.
To identify the software LRUs one should look at the entire system and identify all components and then
identify which components are applicable for a SR prediction. This includes looking at all associated
hardware units on which software may reside, including central processing units (CPUs), application
specific integrated circuits (ASICs), or field programmable gate arrays (FPGAs).
From the hardware point of view there is either an ASIC or an FPGA. FPGAs are applicable for SR
because they are programmable. However, not all need the level of scrutiny or a SR prediction. Take for
instance, the Basic Input Output System (BIOS), a configurable firmware-level component. The BIOS is
typically a bootstrap component that generally either works or does not work upon startup. Due to the fact
that it can usually be verified as working or not working during testing, it may not be applicable for
inclusion in SR from a cost/benefit perspective. There may also be sensor firmware at this level or other
firmware that interfaces directly with the FPGA or ASIC. Firmware is the combination of a hardware
device such as an integrated circuit and computer instructions and data that reside as read only software on
that device. While software may be updated on a regular basis over the life of a system, the firmware is
typically updated less often over the life of the device. Above the firmware level is the operating system
(OS).
It is possible that there are different OSs installed on different processing units. Above the OS level is the
application level for the processing unit and above that are the applications that make the system under
consideration to do what it is meant to do. Government furnished software (GFS) and commercial-off-the-
shelf software (COTS) and free open source software (FOSS) are examples of applications.
The advent of rapid application development methodologies such as Agile, Dev Ops, etc., have led to
increased dependency on third-party software in the application development process and with it an
increase in the use of FOSS. Not only can the third-party software have defects in it, but systems can fail
because of mismatched interfaces to the third-party software. For example, the wrong driver might be used
with a particular device. Or there could be incompatible Application Programmer Interfaces (APIs). Or
worse there could be multiple versions of the same application in a larger system [e.g., Java Virtual
Machine (JVM)], and all run simultaneously in support of different system features.
Glueware is the software that connects the COTS, GFS, or FOSS with the rest of the software system. An
adapter is the portion of the glueware that handles the actual interface. However, the glueware may have
additional functionality over and above the adapter. If there is no COTS or GFS then there is not a need for
glueware. In nearly any system there will be newly developed application software even if the system is
largely composed of COTS, GFS, or FOSS. Software development organizations often underestimate both
the newly developed software and the glueware when the system is composed of a significant number of
COTS, GFS, or FOSS.
Middleware allows different components to communicate, distributes data, and provides for a common
operating environment. Middleware is generally purchased commercially. Middleware is not needed for
systems with a small number of components or computers. There could be several software configuration
items, each serving a different purpose such as interfacing to hardware, managing data, or interfacing to the
user. At the COTS or middleware layer there could be one or more databases. There could also be a user
interface at the application software or COTS layer.
The software architecture diagram(s) should describe all of the software configuration items and their
interaction to each other. Any software configuration item that is an independent program or executable
should be considered to be an LRU. Figure 11 contains the checklist for identifying software LRUs.
36
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Software predictions are conducted at the program or executable level. The predictive models in
5.3.2 are not performed on the modules or units of code because the lowest replaceable unit is the
application, executable, or DLL. Usually each LRU has a one-to-one relationship with a CSCI).
However, in some cases CSCIs are composed of more than one software LRU. Hence, a listing of
CSCIs does not always provide a listing of LRUs.
b) Identify all in-house developed applications, executables, and DLLs, and add to list of software
LRUs.
c) Identify all COTS software and add them to the list of software LRUs. Examples include OSs and
middleware as well as others. Do not include COTS software that is not going to be deployed with
the system (i.e., development tools)
d) Identify all GFS and add them to the list of software LRUs. Do not include GFS that will not be
deployed with the system.
e) Identify all FOSS and add them to the list of software LRUs. Do not include FOSS that is not going
to be deployed with the system (i.e., development tools)
f) Identify all glueware and middleware and add them to the list of software LRUs.
g) BIOS is a simple bootstrap that is probably not relevant for a reliability prediction.
h) FPGAs are applicable for SR but may already be part of hardware predictions.
i) For each LRU listed in step b) through f) establish the following items:
Names of each executable
Appropriate development organization
Size of each LRU (see Annex A)
Expected duty cycle of each software configuration item (see 5.3.2.3 Step 3)
The software BOM is a project specific task. Any system that is comprised of several software LRUs,
particularly systems with COTS, FOSS, firmware, or drivers, should be constructing a software BOM.
Today’s software development methodologies are progressing rapidly to match the required innovations in
time to market of services and applications. The new development approaches such as agile, has forced
developers and companies to become more dependent on COTS and FOSS in their development process.
It is typical for organizations to identify a final product software image as a single element on the main
product BOM regardless of how many software LRUs comprise the software product. The rationale is that
the current software development life cycle can manage all dependencies adequately. Unfortunately this
approach can cause problems when hardware and software come together during the product manufacturing
and final system assembly if there are multiple software LRUs. Therefore, there should be a process where
hardware and software BOM development and components are managed and tracked. This capability
allows for identifying any key dependencies between the hardware and software components, and flag any
reliability issues. This capability goes well beyond software configuration management as it addresses the
configuration of the software as a whole and not just the configuration of each of the individual software
LRUs.
37
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
By connecting the application(s), COTS, FOSS, drivers, and firmware with the suppliers through a
software BOM as part of overall software image and aligning it to hardware components in the overall
product BOM, any reliability issues with COTS, FOSS, driver, firmware, and hardware can be narrowed
down to the right source. This capability provides the companies with the ability to improve their
management of software and hardware reliability and quality. See F.1.1 for an example.
This is an essential task that is a prerequisite for developing a reliability test suite (5.4.1), using the SR
growth models (see 5.4.5), and measuring test coverage (5.4.3). This subclause is highly recommended
prerequisite for using the SRE predictive models (see 5.3.2.3 Step 3).
Software does not fail due to wear out or other physical deterioration. It also does not necessarily fail
as a function of calendar time. The discovery rate of software failures and the deployed latent defects
are directly related to how much the software is used and the manner in which it is exercised during
testing and operation.
A profile is a set of disjoint alternatives in which the sum of the probabilities of each alternative is
equal to 1. An OP for software yields the most likely usage of the software in advance of development
and testing. Without an established OP, it is possible that the software development and testing might
focus on less frequent features and modes, which could then result in unexpected field support and
reliability issues once deployed operationally. Additionally, the SR growth models discussed in 5.4.5
are accurate if and only if the software is being stressed as per its OP.
Figure 13 contains the steps for characterizing the OP. See F.1.2 for a complete example of an OP.
38
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Locate the customer profile. A customer is any group or organization that is acquiring the system.
The customer can be internal or external to the development organization. Identify the percentage
of usage by each customer group.
b) Identify the user profile. The user is a person or group of people who are using the system for a
specific goal. Identify the percentage of usage by each user group within each customer group.
c) Define the system mode profile. A system mode profile is a set of operations that are grouped based
on the state or behavior of the system. Identify the percentage of usage by each system mode in
each user group and customer group.
d) Determine the functional profile. A function in this context is a set of tasks or operations that can
be performed by the system. A functional profile identifies the usage of each of these tasks based
on the system model, user, and customer. Identify the percentage of usage by each function in each
system model by each user group and customer group.
e) Compute the OP by multiplying each of the previous percentages. See F.1.2 for a complete
example.
f) Subclause 5.4.1.1 illustrates how to use the OP to develop a reliability test suite
Figure 13 —Steps for characterizing the operational profile (see Musa [B60])
5.1.1.4 Identify the impact of software design on software and system design
This task is essential for developing and testing the software. It is a prerequisite for developing a reliability
test suite (5.4.1).
Software design can impact systems design and vice versa. Hence, an overall system understanding is
needed to assess the best design approaches to build reliability into that system. Fault and failure analyses
(see 5.2) are needed at the functional and design level to determine the weak areas and then assess and
choose the potential trade-offs for a more robust design looking at hardware, software, and combined
solutions. As a result of these analyses, additional software is often added to the system to perform fault
detection, isolation and response (FDIR), but the software cannot perform efficient FDIR if the hardware is
not instrumented or designed for it as shown in the example in F.1.3. Figure 14 is the checklist for
identifying the impact of the software design on the system and system design.
This is an essential task. This tasks is executed as typically a joint effort between acquisitions, software
management, reliability engineering, and software quality assurance. In some cases, however, the
acquisitions organization may define the failures and criticality in the statement of work.
This task is a prerequisite for performing a software failures modes effects analysis (SFMEA) (see 5.2.2),
collecting failure and defect data (see 5.4.4), applying SR metrics during testing (see 5.4.6), selecting and
using the SR growth models (see 5.4.5), and deciding whether to deploy the software (see 5.5).
If a defect is encountered under the right conditions when the product is put to use, it may cause the
product to fail to meet the user’s legitimate need. Serious consequences may follow from a user
experiencing a software failure; for example, a defect may compromise business reputation, public safety,
business economic viability, business or user security, and/or the environment.
[ISO/IEC/IEEE 29119-1:2013]
39
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Determine where the defects probably lay, the weak areas of the software system, and the best way
to mitigate them. Two techniques generally used by hardware and adapted for software are the
failure modes and effects analyses (FMEA) in 5.2.2 and fault tree analyses in 5.2.3.
b) Understand the modes and states of operation, environments and conditions the system should
operate under in order to target what should be protected and to what extent.
1) What should be done by hardware and what is best done by software depends on what needs
to be detected, what can be detected, at what level does it need to be detected at, as well as
what is the appropriate response.
2) Over designing either the hardware or the software may have the opposite effect on the
system’s reliability.
3) Software may need to detect and recover from failures. Hardware needs to have sufficient
monitoring and control at the appropriate places for software to provide closed loop response
to hardware failures.
4) Use Table 4 as a guide
c) Find the right balance between the hardware and software solutions. Overly complex software may
be unreliable while software that fails to detect system failures may also be unreliable.
40
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Reliability
Applicable methods
consideration
Hardware
Protection against Hardware interlocks
inadvertent software
command
Available memory Shield memory areas
It is a common, but incorrect, myth that compiling the code will catch most of the software defects.
However, compiling is an act of producing the executable from the implemented code. It, in itself, can
introduce defects (build issues are quite common). Compiling does not have the knowledge of what the
expected system behavior is—it only knows how to build the executable. Even with the new static and
dynamic analyzers available to detect bugs in code, they cannot identify whether the requirements are
complete, design is sufficient, or the interfaces are correct.
There are two categories that partition all software defects (i.e., each software defect belongs to exactly one
of the two categories) (Grottke, Trivedi [B23]):
Mandelbug—A defect whose activation and/or error propagation are complex. This is the case if
the propagation of the error generated by the defect involves several error states or several
subsystems before resulting in a failure, causing a time lag between the fault activation and the
failure occurrence. Another source of complexity in fault activation and error propagation is the
influence of indirect factors, such as interactions of the software application with its system-
internal environment (hardware, operating system, other applications), the timing of inputs and
operations (relative to each other), and the sequencing of inputs and operations. As a consequence,
the behavior of a Mandelbug may appear chaotic or even nondeterministic, because the same set of
input data seems to make it cause a failure at some times, but not at others. Race conditions are
classic examples of problems caused by Mandelbugs.
Bohrbug—A repeatable defect; one that manifests reliably under a possibly unknown but well-
defined set of conditions, because its fault activation and error propagation lack complexity. These
include defects due to incorrect software requirements, software architecture, software detailed
design, and code. Failures from these defects result in a predictable system failure when the inputs
are such that the faulty code (which may be due to faulty requirements or faulty design) is executed
as a particular juncture of the software code is reached. They are repeatable if one knows the
particular conditions that resulted in the failure, with 100% probability of the failure occurrence,
and they can be fixed by changing and/or rewriting the code. If the faulty code is due to faulty
specification, the specifications can be fixed and then the design and code. If the faulty code is due
to a faulty design that can also be fixed. Bohrbugs are very similar to quality hardware failures;
once found, they can be fixed and they do not have any residual failure rate left in the system. At
the component and system level testing, they contribute to the infant mortality part of the bathtub
curve. These software defects can be found with high level of confidence at subsystem level testing
and prior to the total system integration.
Due to their elusive nature, Mandelbugs are typically difficult to find in and remove from the code. The
following (overlapping) classes of defects are subtypes of Mandelbugs:
Heisenbug—This kind of defect seems to disappear or alter its behavior when one attempts to
probe or isolate it. For example, failures that are caused by improper initialization may go away as
soon as a debugger is turned on, because many debuggers initialize unused memory to default
values. (See Grottke, Trivedi [B23].)
Aging-related defect (Grottke, Matias, Trivedi [B25])—A defect that is able to cause an increasing
failure rate and/or a degrading performance while the software is running continuously. Usually,
this is due to the accumulation of internal error states. An example is a memory leak: as more and
41
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
more memory areas are claimed but—erroneously—never released during execution of the
software, there is a growing risk of a failure because of insufficient free memory. Special
techniques are used during subsystem level testing to rigorously exercise the memory management
system and de-bug the software subsystem. The aging-related defects left in the code in the
operational phase can be dealt with so-called software rejuvenation techniques (Huang et al.
[B28]).
Software-hardware interface defects—Software can fail due to a lack of robust interface with the
hardware. Failures can occur when hardware is temporarily operating outside of its specification
and interfacing with software causing both to be out of their operational ranges. Consequently, the
system can fail, or its performance is substantially degraded; but there is no specific hardware
failure that can be identified as the root cause. Trouble not identified (TNI) usually results, which is
very similar to tolerance stackups. Tolerance stackups are used in engineering to describe events
when each element of the system is in specification, but their interaction causes the overall system
performance to be in the failure region. To reduce the occurrence of such failures when software
and hardware interface, attention needs to be given to the subsystem level of the software
requirements.
Each project should agree upon failure definition and scoring criteria (FDSC), as a means of standardizing
1) which behaviors are to be categorized as reliability failures, and 2) how to assign severity scores to those
failures. The scoring criteria for software failures are project and product specific and the definition of the
scoring criteria usually involves the customer or customers. Some criteria might not, for example,
recognize degradation of a function as a failure. Further, the relative severity of a failure might depend on
identifying the criticality of a failed function or component.
Recognizing the relative consequence of a given system failure is one half of any risk-based approach to
development or appraisal efforts. If the frequencies of various failures (the other half of the risk equation)
can be assigned quantitative values, then it would help to also assign quantitative consequence values. This
is often accomplished by estimating a monetary value, or cost, of failure. The values are likely to be
influenced by the type of the underlying defect. For example, there has been evidence indicating that
failures caused by Bohrbugs tend to be more severe than those caused by Mandelbugs (Grottke, Nikora,
Trivedi [B21]). Also, the complexity involved in the fault activation of Mandelbugs suggests that these
defects may result in a lower failure rate.
Project-specific definitions for failures should be identified. These definitions are usually negotiated by the
customers, testers, developers, and users. These definitions should be agreed upon well before the
beginning of testing. There are often commonalities in the definitions among similar products (i.e., most
people agree that a software defect that stops all processing is a failure). The important consideration is that
the definitions be consistent over the life of the project. There are a number of considerations relating to the
interpretation of these definitions. The analyst should determine the answers to the questions found in
Figure 15:
a) Identify the system level failures that can be caused by software. The SFTA (see 5.2.3) and the
SFMEA (see 5.2.2) can be used to brainstorm this.
b) For each of the preceding system failures, what is the relative criticality? (i.e., catastrophic, critical,
moderate, negligible)
c) Determine whether faults caused by defects which will not be removed are counted.
d) Determine how to count defects that result more than one failure (e.g., one pattern failure represents
several failure occurrences of the same failure type)?
e) Determine what a failure is in a fault-tolerant system.
f) Generate the failure definition and scoring criteria and verify that the failure reporting and defect
tracking systems use this scoring criteria.
42
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
This is an essential task and is a prerequisite for determining the SR plan (see 5.1.6), determining an
initial system reliability objective (see 5.3.1), and performing a SR assessment and prediction (see
5.3.2). The software management is typically the lead for this task. While this task is typically
performed by the organization responsible for developing the software, it can also be performed by the
organization acquiring the software.
Reliability risk assessment can start as early as concept planning and project start up with the
assessment of the development environment and continue through requirements development, design,
reviews, verifications, changes, and testing.
From a product standpoint the risks that can affect the reliability of the software include but are not limited
to safety considerations, security and vulnerability considerations. and the current maturity of the product.
Even though safety, security, and vulnerability are not the scope of this document, these can be risks with
regards to SR. There are times in which trade-offs are required between reliability and safety, security, and
vulnerability. For example, sometimes the best way to protect against vulnerability is for the software to
stop processing. Software products that are not mature can be a reliability risk but also products that are
very mature can be risky as well if the product is too difficult to maintain or is at risk of obsolescence. A
checklist for creating a reliability risk assessment is found in Figure 16.
a) Identify product related risks such as safety, security, vulnerability, project maturity. See 5.1.3.1,
5.1.3.2, and 5.1.3.3.
b) Identify project and schedule related risks such as grossly underestimated size prediction, grossly
overestimated reliability growth, defect pileup. See 5.1.3.4.
c) Identify if there are too many risks for one release (brand new product, hardware, technology,
processes, or people). See 5.1.3.5.
d) Generate a listing of risks that affect SR from steps a) to c).
e) Consider these risks when determining the system reliability objectives
f) Consider these risks when performing the SR assessment and prediction (see 5.3.2), which will
ultimately be used to establish the reliability growth needed to achieve the prediction and reliability
objective
It is very hard to have safe software if it is not reliable. With that being said, reliability and safety are
both system design parameters that usually need to be balanced against each other. Table 5 shows that
the same analyses can be used for both safety and reliability when one adjusts the focus accordingly.
43
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Since safety and reliability are both system design parameters, they can sometimes conflict with each
other just as performance and reliability requirements can conflict. Software safety and reliability can
conflict in that a simple straightforward software design is often the most reliable as more complex
code can be less deterministic. However, from a critical function perspective, multiple software and or
hardware backups might be a design solution chosen to try to assure that a function will work despite
multiple failures. System reliability often relies on similar or dissimilar redundancy. When looking at a
more autonomous system, which needs to operate in the face of faults and failures, without human
interaction, it can be difficult to make the design trades. A design can quickly become extremely
complex. The question is: has the design introduced more potential failure modes than it fixed? There is
a possibility that in making the system more reliable the software becomes less reliable and more
complex. The reliability practitioner needs to consider this possibility.
System safety is focused on what leads to a hazard. Software safety delves into how software can
contribute to those hazards and can provide “must work,” “must not work” design solutions. SR
practitioners should inform the designers of potential defects in the software and software process
itself. Then the organization can provide a balance in design trade-offs, as well as assure sufficient
testing to find any defects in the software/system safety design. The checklist for assessing the safety
risks is in Figure 17:
44
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
d) Identify the “must-work” functions such as firing multiple pyro devices simultaneously within a
rocket stage transition, as well as the “must-not-work” situation such as not inadvertently firing
those same pyros at any other time. Perform a software FMEA (see 5.2.2) to determine critical
events, inputs, and outputs of the function can be performed to provide ideas for the design solution
space to increase both reliability and safety.
e) Once the software has been designed to mitigate the identified critical software or system element,
perform a reliability analyses to demonstrate if the chosen design is reliable from a systems
perspective. The software contributions to the system hazards and their mitigations and controls
needs to be cross checked with the reliability critical items list and the software’s functions that
may impact those. Quantitative SR predictions can also be adapted to predict and monitor the
mean time between safety-related failures.
f) Make design trades between SR and safety and consider the following:
Similar and dissimilar redundancies
Voting or monitoring logic to switch and knowing when to switch
Error and status reporting logic
Multiple sensors and effectors
How many and how current do the system conditions need to be known for a backup to
work,
If the storage/memory is trustworthy
When and where to employ watchdog timers
Common failure modes in both the software and hardware and knowing how the software
can monitor those modes
As with safety considerations, the security of a system depends partly on the reliability of the software. One
of the sources of vulnerability is software that is not coded according to accepted coding standards.
Improving the reliability of the software may help to make it less vulnerable, but it is not sufficient for
making it secure. Security issues may be analyzed with techniques such as top-down FTA and bottom-up
FMEA.
Security risks relate to system failure modes in which the confidentiality, integrity, or availability of the
system or its information is compromised. Vulnerabilities are defects (introduced in requirements, design,
or implementation) that could be exploited to initiate security failures.
Many security risks can be addressed through the same practices that reduce the risk of other types of
failure. Additional attention may be appropriate to address unique causes of vulnerabilities. These measures
may include security-aware requirements, design, and implementation practices as well as specific
appraisal techniques (such as penetration testing) to detect vulnerabilities not otherwise identified.
As with any engineering trade-offs, the extent of security and of security assurance have to be balanced
against other system attributes, such as usability or efficiency, that might be diminished due to security
enhancements. There are now several static and dynamic code analyzers that search for sets of
vulnerabilities and other tools that help determine the pedigree of COTS software and open source
software. Security is a great concern in many of software projects now from medical devices, to cars, to
factory operations.
45
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Software code that is newly developed will be generally riskier than software that is reused. On the other
hand, software products that have been fielded for many years and are nearing obsolescence are generally
riskier than those that are not. Any software product that is not “throw away” is subject to obsolescence.
This is because the environment such as the OS around the software continually changes. The software can
and will become obsolete if it is not perfected to be up to date with its supporting environment. Software
can also become problematic after several years if the software code is not maintained properly. For
example, “copy and paste” code is typical in aging systems. “Copy and paste” code is the opposite of
object-oriented code in that code that is nearly identical is copy and pasted one or more times and
ultimately can be problematic. As part of the planning process, the practitioner will need to identify the
maturity of each of the software components in the system. Table 6 shows how product maturity is
considered quantitatively in the SR modeling:
There are certain development practices that have been associated with more reliable software as discussed
in 5.3.2 and 6.2. However, there are also certain factors that can negatively affect the reliability of the
software. Table 7 provides insight into what risk to check for and why.
The size estimations for the software are a primary factor that drives the schedule. So, if the size
estimations are grossly underestimated then so will be the schedule. If the schedule is insufficient then the
reliability growth will be affected as discussed next. It is a chain reaction that begins with a few faulty
assumptions. There are several ways that the size of a software system can be grossly underestimated, as
follows:
a) Reused components really are not reusable or the work required to reuse them is as much as
developing new code
b) Size estimates are based on old history that does not take new technology into consideration
Following are some indicators that reused components really are not reusable:
46
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Size estimates are often based on past history. Unfortunately, if the size history is not recent that can lead to
gross underestimates of size because of the fact that software has been increasing in size steadily since the
1960s. If size estimates are based on past projects that are more than few years old, this should be identified
as a risk. (See US GAO [B92].)
There are at least two reasons why SR growth can be grossly overestimated, as follows:
a) Immovable deadlines—When the deadline for the completion of the software or system is
immovable that means that any schedule slippage will probably be compensated for with shortened
reliability growth cycles.
b) Reliability growth plans neglect to consider that when new features are added to the software
baseline the reliability growth resets. Refer to Figure 18.
Reliability growth has a significant impact on the reliability of the software. If the reliability growth is
underestimated then so will the reliability predictions and estimations. When reliability growth is cut short
that can and will cause defect pileup as discussed in the next subclause. Figure 18 is an example of defect
pileup that can occur when reliability growth is grossly overestimated.
47
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Software engineers are spending a considerable amount of unplanned time supporting prior
releases.
b) There are many defect reports from previous releases that have not yet been scheduled or corrected.
5.1.3.5 Assess whether there are too many risks for one software release
An inherent risk is a risk that is often difficult to avoid. Some of these include the following:
This is a brand new software product or technology (if the software is at version 1 that is indicative
of a new software product)
Any specialized target hardware has not been developed yet
Brand new processes or procedures
Brand new personnel or personnel who are new to the product or technology or company
Research (Neufelder [B65]) has shown a correlation between the number of risks on a software
project/release and the outcome of that release. The outcome of each project was known to be either
1) successful, or 2) distressed, or 3) neither. The third category is referred to as “mediocre.”
A successful project is defined as having a defect removal efficiency (DRE) of at least 75% at
deployment.
A distressed project is defined as having ≤ 40% defect removal at deployment.
NOTE 1—Other research by Jones [B9] found that DRE was much higher for successful projects.
NOTE 2—Two independent bodies of research found that the maximum DRE observed was 99.9%. See Jones [B40]
and Neufelder [B68].
The DRE is simply the percentage of total defects found over the life of that particular software release that
were found prior to deployment. DRE is also referred to as defect purity (Tian [B91], [B90]]). Table 8
shows that the projects with none of the risks shown in this subclause were most likely to be successful. In
the referenced study, there were no successful releases with more than two of these risks. See Neufelder
[B65].
48
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The following inherent risks generally cannot be changed within one software release/version. However,
the risks can often be mitigated by making several concurrent releases that have only one or two risks
applicable to it. In summary, as part of the risk assessment, the practitioner should analyze whether a
particular release has too many risky objectives and whether or not these risks can be mitigated.
The first major release of a particular product or a product that is based on technology that is new to the
particular organization is more risky from a reliability standpoint then subsequent releases. This is because
during the first release there are many unknowns that are difficult to quantify. This can lead to a schedule
slippage, which can then lead to the other risks previously shown such as insufficient reliability growth and
defect pileup.
5.1.3.5.2 The right people are not available to develop or test the software
Several major studies (Neufelder [B68], SAIC [B77]) that have correlated SR to certain key indicators have
found a strong relationship between the experience of the software engineers and the reliability of the
software. When there is high turnover or when there are software engineers who do not have the industry or
domain experience to develop and test the software, there is a risk.
When the target system hardware (over and above a computer) is evolving during the software
development process, this can be a risk to the SR. This is because the software cannot be fully tested until
the hardware system design is stable. This is also because any design changes in the hardware also result in
design changes to the software.
The world around software system can evolve faster than the software organization can keep up with. The
OS, interfacing hardware, drivers, and third-party software evolve over time. If the software is not kept up
to date with the technology of its environment, it can result in the software becoming prematurely obsolete.
During the planning phase, all technologies should be identified and analyzed to determine any potential
risks due to brand new technology or due to aging technology.
This is a typical task and is performed jointly by the software quality assurance and software management.
The acquisitions organization can review the data collection system. Prior to using any of the SR models or
49
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
analyses, the data collection system should be assessed to verify that it supports the selected SR tasks. In
setting up a reliability program, the following goals should be achieved:
Clearly defined data collection objectives.
Collection of appropriate data (i.e., data geared to making reliability assessments and predictions).
See 5.3.2.1 and 5.4.4.
Review of the collected data promptly to see verify that it meets the objectives.
A process should be established addressing each of the steps in Figure 20:
a) Establish the objectives for the data collection (which metrics are appropriate?)
b) Identify the data that needs to be collected from the steps in 5.3.2.1 and 5.4.4.
c) Set up a plan for the data collection process.
d) Use applicable tools to collect the data.
e) Evaluate the data as the process continues.
f) Provide feedback to all parties.
It is often difficult to gather SR test data if no provision was made in the project plan at the outset. Often
one is faced with accumulating the data that exists. It is common in many projects to use α-testing (in-
house system testing of mostly complete project) or β-testing (outside testing by trusted potential users of
the mostly complete project). These test results, which are generally well documented in notebooks, often
yield adequate reliability test data.
Hardware failures (cases where equipment repair or replacement was required to continue testing)
Operator error
Failures induced by test equipment
New feature or changed feature requests
The operating time between failure and the nature of the failure (new failure or repeated occurrence of a
previously identified failure), should be investigated and recorded. All failure occurrences, unique failures,
or repeated occurrences should be counted.
The following are useful tips for planning for the SR data collection:
Include a flag in the problem report that indicates whether the problem is related to reliability.
50
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Include a counter in each problem report to be incremented every time that same defect causes a
failure.
This is a typical SRE task. Reviewing the available SRE tools is a joint effort mainly because each of the
tools automated tasks performed by each of the stakeholders. The reliability engineers will be mostly
interested in the tools needed for SR prediction and failure mode analysis while the software quality
assurance and test engineer will be interested in the tools required for SR growth modeling during testing.
The software manager(s) should be involved in reviewing the tools for reliability predictions and
assessments and sensitivity analysis.
The automation considerations for the SRE tasks are described in Clause 5. It is not the goal of this
document to prescribe particular commercial tools but rather to indicate the features needed, whether these
tools exist in industry, and any refinements that need to be made to existing tools. Table 9 lists generic
types of tools that should be used and their automation capabilities. A list of available tools can be found in
the Annex C. That list does not represent an endorsement of any particular tool.
51
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
52
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
This is an essential task. The inputs to this task are all of the other tasks in the planning clause of the
document as well as identification of the appropriate SRE tasks from 4.3. This task can be executed by the
acquisitions personnel and included as part of the contract. However, it is typically performed by the
development organization and delivered to the acquisitions organization for approval. The reliability
engineer typically produces this document with inputs from software management and software quality
assurance.
Implementation of robust reliability engineering efforts starting in the requirements phase and continuing
throughout the Design, Development, Integration, Test Deployment, and Operations and Support phases
should take into account this document and other related IEEE documents, applicable ISO 9000 standards
as well as other industry standards. Software Engineers coordinate development of the Reliability
Engineering Program Plan with Reliability Engineering, System Engineering, Test and Evaluation, Safety,
Logistics and Program Managers to allow for execution of Reliability Engineering in compliance to
applicable policies and processes, coordinated across multi-disciplines and that reliability objectives are
met.
Reliability engineering processes and tasks defined in the Reliability Engineering program plan should be
integrated within the Systems and Software Engineering processes and documented in the program’s
Systems Engineering Plan, Software Engineering Plan, Test and Evaluation Master Plan and Life-Cycle
Sustainment Plan. Reliability engineering should be assessed during system and software engineering
technical reviews, Test and Evaluation, and Program Management Reviews as requested.
The Software Reliability Program Plan (SRPP) incorporates guidance from the Reliability Engineering
Program Plan and guidance to invoke SRE tasks, tailored to the program, during all program lifecycle
phases to demonstrate confidence in achieving the programs reliability requirements during integration,
developmental and operational testing. The period of performance addressed by the SRPP extends through
all lifecycle phases. The SRPP represents the primary technical planning document for implementing SRE
process practices and principles throughout the program’s lifecycle. The objective of the SRPP is to:
Define the method to manage Reliability engineering for the program that includes software, a list
of expected deliverables, and the schedule of activities for all efforts.
Establish processes to manage the Reliability requirements for systems and subsystems and
software.
Demonstrate software reliability is achieved through modeling, simulation, or analysis by each
subsystem or system.
Demonstrate software reliability is achieved by each system subsystem to meet operational
Reliability requirements.
53
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The SRPP identifies the overarching SRE task activities expected to perform in order to increase the
software reliability. The SRPP is the primary program management tool used to plan, monitor and control
program SRE tasks. The Reliability Engineering Plan defines the key stakeholder’s of Reliability
Engineering efforts. The SRPP identifies scheduling of SRE tasks relative to the lifecycle schedule so that
SRE functions are an integral part of requirements, design, development, integration, and processes and
that SRE activities are coordinated efficiently with other project disciplines, such as systems engineering,
software engineering, requirements, design, development, integration, test and evaluation, logistics
planning, safety engineering, and human systems interface.
The following procedures are defined for three possible audiences. The first audience is an Acquisitions
person whose organization is acquiring but not developing a software system. This person will be defining
deliverables for software and system contractors. The second audience is the contractor who is providing
software and system deliverables and has contractual obligations for software reliability. The third audience
is an organization who is not under contract to perform software reliability but wishes to establish an SRPP
to increase the confidence that the software meets internally identified reliability requirements. A checklist
for creating a SRPP for three possible audiences is found in Figure 21, 0, and Figure 23.
a) The SRPP should be developed during or prior to the requirements phase. Organize a group of
subject matter experts from software management, software quality, reliability engineering,
systems engineering, safety engineering, logistics, program management, etc., and review 4.3 of
this document. Review the results of the other planning tasks in 5.1.
b) Select the SRE activities that apply to the program based on the criticality of the software and the
resources available. The essential tasks are typically those tasks that can be combined with existing
development practices to reduce cost.
c) Develop contract RFPs statement of work tasks and deliverables. Verify that reliability engineering
tasks, deliverables, and requirements are included in the contract to be awarded (if applicable). The
reliability engineering team conducts predictions and allocations on systems and subsystems in
accordance with the life-cycle strategy to validate the reliability thresholds, and objects can be
placed into the contract specifications prior to release of request for proposals.
d) Specify that the development organization perform the SRE tasks identified in step a), prepare an
SRPP, provide status against it at every milestone review, and update it throughout the life cycle.
54
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Review the SRE requirements and the required SRE tasks for the program.
b) Create an SRPP that includes the required SRE tasks.
c) Arrange the SRPP in order of planning, development, deployment decision, and operation.
d) Add to each section the specified SRE tasks.
e) Identify who will perform each of the tasks.
f) Identify the cross-functional relationships and teams required to support the SRE. The SRE
practitioner should also be identified as a team member with regard to the software
engineering activities. Communication paths between the SRE practitioner and software
management should be clearly identified.
g) Invite all stakeholders to review and approve the plan.
h) Plan updates to the SRPP on a continuous basis and review at every major milestone to
verify that all projects support activities are properly managing reliability engineering.
i) Update the SRE status for any planned system engineering technical reviews, program
reviews, or major milestones to verify adequate consideration of the interfaces and
dependencies among these acquisition program activities. Updates are made to provide more
detail in the Reliability Engineering Plan in the form of a reliability growth plans, analysis,
and reporting growth curve(s).
j) Update the SRE during the Development Phase. Changes may include updates to the
reliability growth planning, analysis and reporting, identifying the systems and software
with demonstrated low reliability. Test and Evaluation activities are revised and more details
on the plans for reliability testing are detailed. Updates should provide more detail of
Reliability Engineering status at system and software engineering reviews to verify adequate
consideration of the interfaces and dependencies among these acquisition program activities.
The Reliability engineering team provides more reliability growth detail in the Reliability
Engineering Plan in the form of a reliability growth plans, analysis, and reporting growth
curve(s) that provide reliability growth progress on systems and software that have been
predicted or demonstrated to have low reliability.
k) Update the SRPP during Operation and Support. During this phase the Reliability
engineering team’s level of engagement with the system engineering and software
engineering processes is dependent on several factors that determine the Reliability
engineering activities required. Systems/subsystems upgrades that are determined to have no
observed decrease in reliability require a less comprehensive Reliability engineering
program, which may consist of the initial Reliability engineering prediction.
Systems/subsystems with new design or implemented in a new environment and/or have
been determined to cause degradation of reliability or reliability growth require a
comprehensive Reliability engineering plan, activities and products. The Reliability
engineering team may need to revisit reliability growth for systems upgrades and provides a
strategy to address new requirements or new functions that are being integrated into the
design and details on how they will be tested and validated.
l) Anytime modifications are made to the SRPP, the program should prepare and document
any Reliability inputs to the system engineering plan, software engineering plan and test and
evaluation plans.
55
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The checklist for the organizations that are not developing software under contract is shown in Figure 23.
a) Execute all of the steps under the acquisition personnel except for step c).
b) Execute all of the steps under the previous section except for step a).
Figure 23 —Checklist for creating an SRPP—for those organizations not under contract
There are a variety of software analyses related to outsourcing, requirements, design, code, and testing.
While nearly all analyses have some impact on the reliability of the software, there are some analyses
that are directly related to defects and therefore to SR:
Software defect root cause analysis (RCA)—What causes most of the defects?
Software failure modes effects analysis (SFMEA)—What kind of effect will relevant failure
modes have on system?
Software fault tree analysis (SFTA)—How can a system hazard be caused by software?
Figure 24 illustrates the flow of data between the failure modes analyses. All analyses can and are
employed to drive the development and test strategies discussed later in 5.4.1, 5.4.2, and 5.4.3. The
defect RCA can be conducted regardless of whether there is a waterfall or incremental LCM. Early in
software development, defect reports from a prior release can be used for the analysis. The SFMEA is a
bottom-up analysis that starts at the failure modes and works up to the system events that could be
caused by software failures. The SFMEA can be repeated in each increment if there is an incremental
development process. The SFTA is a system analysis that can be conducted at any time during
development or test regardless of whether there is a waterfall or incremental development.
56
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Table 10 illustrates the purpose of each of the preceding analyses as well as the applicability for
incremental development. The failure modes analyses are not affected by the development model. They
are performed whenever a particular artifact is available during development, regardless of whether
there is a waterfall development or an incremental development. With an incremental development, the
failure modes analyses can be and usually are revisited in each increment. The analyses can be used
together to improve effectiveness. The software defect RCA for example, can be used whenever there
is not a budget for the SFMEA. It can also be used prior to the SFMEA to increase the confidence that
the SFMEA focuses on the most key failure modes.
The SFMEA and SFTA can be used together when there is a brand new system with unknown failure
modes and unknown system hazards. In that case, the SFMEA and the SFTA can be performed so that
they meet in the middle. That means that the analyses are performed until the SFTA is identifying
software failure modes that were not captured on the SFMEA and the SFMEA captures system level
hazards that are not known. Table 11 provides a checklist for choosing which and when to use the
preceding analyses.
There is a desire to make requirements, design, and code reviews more effective X
by combining them with the failure mode analyses.
The software requirements specification does not describe very well how the X X
software should handle negative behavior or hazardous events.
57
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Table 11—When and how to perform the failure mode analyses (continued)
Defect
Criteria for selection SFMEA SFTA
RCA
The technology or the product is brand new. System level hazards are not X X
completely understood.
There are a small number of well-known top-level hazards, but it is unclear how X
or if the software can cause those
There is a need to identify failures that are due to a combination of events. X
A serious but intermittent event has occurred and it is urgent that the failure X
mode(s) associated with it be identified.
This task is a recommended but not required prerequisite for the SFMEA (task 5.2.2). This task is also
recommended if the goal of SRE is to make improvements in the development activities that are
causing specific types of defects to occur more often than others. This lead for this task is usually the
software quality assurance organization because this organization typically has access to the failure
data. This task does require cooperation and inputs from the software development engineers. This task
is most efficient when the software defect and failure reporting system has been defined such that
software root causes are required input for closing software problem reports.
All software defects are ultimately caused by mistakes made during the development activities. The
causes can be anything from workmanship-type mistakes to poor requirements to misunderstanding of
the operational constraints. Many software defects are caused by humans who are not actually writing
the code. For example, if the specification is erroneous then the code will be erroneous. If the interface
design is erroneous then the interface code will be erroneous. The purpose of the defect RCA as well as
the other failure mode analysis is to understand the development activity or activities that are
introducing most of the software defects.
The sources of the defects can be and usually are unique for each organization and product. Hence, a
defect RCA on one software project does not necessarily provide value for another project. Figure 25
shows the steps for performing a defect RCA. The steps to perform a defect RCA are given in
Figure 25, while Table 12 provides keywords commonly associated with root cause defects.
a) If the software LRU(s) are not yet in the post integration testing phase collect defect reports
from requirements and design reviews, inspections, and walkthroughs. As a point of
reference, look at the defect reports from a recent, similar version of software. If the software
is in the testing phase, collect the defect reports generated during testing. If the software is in
testing, collect the defect reports generated during testing.
b) Review each defect reported. In some cases the software engineer may have recorded the
root cause for the defect. Otherwise use Table 12 to identify the most likely root cause based
on the keywords in the defect report.
c) Count up the number of defect reports in each of the categories in Table 12.
d) Generate a Pareto diagram that shows the most common root causes from left to right.
Table 12 contains some keywords that are associated with common software defect root causes. The
software engineer who corrects a defect knows what the root cause(s) is/are in order to correct it.
Hence, if the software engineers record the root cause for each corrected defect, then the reliability
practitioner does not need to search for common keywords such as those listed in Table 12. Notice that
the keywords are listed in order of the phase that the defect is introduced, which includes requirements,
design, and code. A requirements defect is a defect in which the required functionality is not
58
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
implemented. It is possible that the design and code execute the wrong requirement perfectly. In that
case, the software is still faulty because it does not address the requirements. It is also possible that the
requirements are captured and understood correctly but the design does not support the requirements.
Finally, It is possible that the requirements and design are correct but the code has not been
implemented to meet either or both.
The last column of Table 12 shows the recommended resolutions if the particular root cause happens to
be the most common. For example, if the functionality root cause is the most common root cause then
the recommended resolutions for functionality is appropriate.
Example: Several defect reports are collected and analyzed. The keywords from Table 12 are searched,
parsed, and tabulated. The results are shown in Figure 26. In this example, the exception handling is the
most common root cause. This is a design related issue. The design document templates should be
reviewed to verify that exception handling is covered in the design. The design review templates should
also be reviewed to verify that exception handling is inspected as part of the design review. Note that
Figure 26 is only an example. It should not be construed that all systems will have the following root
causes.
59
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a
Table 12 —Keywords associated with common root causes for defects
Failure mode Keyword Recommend resolutions
Requirements
Functionality Specification, required, Employ more rigorous requirements related reviews. Use the
functionality, desired, functional SFMEA prior to designing and coding. See 5.2.2.2.
requirements
Design
Timing Timing, synchronization, Employ timing diagrams during design, include timing related tests
slow, fast, timeout, race in the reliability test suite in 5.4.1.4.
condition
Sequence/logic/ Sequence, order, logic, Use state transition tables and diagrams, and logic
state state, or persistent use of diagrams during design. Verify that the test plans cover all
if, else, or otherwise. state transitions as per 5.4.1.3.
Employ unit testing procedures that require a specific level
of coverage. See 5.3.9.2.
Employ the interface SFMEA during architectural design
and the detailed SFMEA during detailed design and
coding. See 5.2.2.2.
Data Corrupt, output, input, Use data flow diagrams and interface design diagrams during
unit of measure, results, design. Specify the data types, formats, unit of measure; default,
overflow, underflow minimum and maximum values. Employ the detailed SFMEA
during design and code reviews. See 5.2.2.2.
Exception Exception, detection, Employ a more rigorous review of the requirements, top-level
handling error, recovery, fault, design, interface design to verify that hardware and software failures
failure, retry, hardware, are adequately handled.
fault, try/catch
Interfaces Interfaces, parameters Employ a system wide interface design specification. Review the
design for each subsystem with regards to the interface contracts.
Employ an interface SFMEA. See 5.2.2.2.
Coding
Memory Memory, resources, free, Review the code specifically for memory allocation issue and/or
allocate, deallocate employ automated tools that search for common memory leaks.
Algorithm Algorithm, formula, Verify that all algorithms are documented in a detailed design
divide, multiply, etc. document; are coded to conform to that design; and are unit tested
by the developer. Employ a detailed SFMEA that focuses on what
can go wrong with the algorithms. See 5.2.2.2.
a
Reprinted with permission of A.M. Neufelder [B64].
This task is recommended if any of the items on Table 11 indicate that it should be performed. The SFMEA
can be a labor intensive. However, if properly tailored and planned and executed, the cost can be
outweighed by the cost of the reduced defects and the cost of reworking the requirements, design and code
late in development. The lead role for the SFMEA depends on the viewpoint. The functional, interface,
usability SFMEAs can be led by reliability engineering and supported by the appropriate software
engineers and designers. The detailed, vulnerability and serviceability SFMEAs should be led by the
software designers and software engineers but monitored or facilitated by someone who is knowledgeable
of the FMEA process such as reliability engineers. The production SFMEA is typically led by software
management.
The goal of the SFMEA is to identify key failure modes in the requirements, design and code and
describe the appropriate method in which the defect can be isolated and mitigated. The SFMEA can be
performed in different phases of product development. For example, for ease of execution it can be
incorporated as part of software code review where the reviewers discuss any applicable failure modes
for the software. The main element that enables a design team to have a successful and fruitful SFMEA
60
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
is to have failure mode taxonomy. Without this information, it is difficult to bring any consistency into
the SFMEA process.
Traditionally, the Failure Modes and Effects Analyses is a reliability analysis technique used to assess the
risk of a system failing during operation. For software, it can also be used to identify software defects that
can lead to system/subsystem failure that would be difficult to instigate during testing. Unexpected data or
behavior of software are examples. SFMEA identifies key software failure modes for data and software
actions and analyzes the effects of abnormalities on the system and other components. This can be used as a
systems approach to analyzing the software's response to hardware failures and the effect on the hardware
of anomalous software actions by identification of:
The SFMEA is applicable to any type of software or firmware application and is also applicable to any type
of development LCM. The reason is that there are a core set of failure modes that apply to all application
types and all development models. In other words, the core set of failure modes apply to any software
regardless of whether it is developed incrementally or not. For example, all software and firmware systems
have logic and data. Therefore all software and firmware systems are susceptible to faulty logic and faulty
data.
The FMEA process identified in existing FMEA standards is applicable to software (MIL-HDBK-338B
[B54], MIL-STD 1629A [B56], SAE [B87]). However, most practitioners are not able to easily apply the
FMEA to software because these references are lacking the failure modes or the viewpoints for the
software analysis. This subclause will provide the information needed to apply the FMEA to software. The
following are the steps for performing a SFMEA (Neufelder [B64]):
Prepare the SFMEA—The effectiveness of the SFMEA is highly dependent on up-front preparation and
planning. It is important to make sure that the SFMEA is focused on the most critical aspects of the
software or firmware and the most likely failure modes. It is also important that all participants in the
SFMEA have a common understanding of how to complete the SFMEA. In this step the analysts identify
where the SFMEA applies, set some ground rules for the SFMEA, identify applicable viewpoints, identify
the riskiest parts of the software, identify and gather documentation required for the analysis, identify
personnel resources needed for the analyses, identify the likelihood and severity thresholds for mitigating
risks, and define the template and tools to be used for the SFMEA.
Analyze software failure modes and root causes—One of the most critical aspects of the SFMEA is
determining the failure modes that are most applicable for a particular system. Overlooking one failure
mode could result in missing an entire class of software defects. Sometimes the simplest of failure modes
are involved with the most serious failure events. To reduce the possibility that applicable failure modes
and root causes are not overlooked, common software failure modes and root causes for each of the eight
software viewpoints are analyzed.
Identify consequences—Once the lists of failure modes and root causes are complete, the next step is
identifying the effects on the software itself (local), subsystem, and system. If there is a user interface, the
effects on the user will also be identified. Software engineers are often able to identify the local effects.
However, connecting the local events to the subsystem and system effects often requires creative thinking
and system level expertise. Consequently, this step usually needs to be completed by more than one person.
61
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Once the effects are identified, the compensating provisions, preventive measures and severity, and
likelihood are assessed.
NOTE—SFMEA process reprinted with permission from Softrel, LLC “Effective Application of Software Failure Modes Effects
Analysis” © 2014.
The steps for preparing the SFMEA are shown in Table 13.
62
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Once the SFMEA has been planned and prepared, the next step is to construct the failure modes and root
causes section of the SFMEA table as per Figure 29.
63
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Research past failure modes and root causes (Beizer [B4], Common Weakness Enumeration [B10],
Neumann [B69]) from similar systems developed in the past. Use the defect RCA from 5.2.1 if
available.
b) Brainstorm additional failure modes and root causes that pertain to this software system
c) If the selected SFMEA is a functional SFMEA, copy the software requirements that are in scope for
the SFMEA into the Table A.5. Select the failure modes that are applicable for this requirement.
Brainstorm the root causes for the applicable failure modes.
d) If the selected SFMEA is an interface SFMEA, copy the software interfaces that are in scope for
the SFMEA into the Table A.6. Select the failure modes that are applicable for this interface.
Brainstorm the root causes for the applicable failure modes.
e) If the selected SFMEA is a detailed SFMEA, copy the template in Table A.7. Identify the
applicable functions selected for the analysis. Inventory the selected functions and determine which
of the failure modes is applicable for this detailed design. Functionality, data, and exception
handling is almost always applicable while sequences, algorithms, memory management and
Input/output are not necessarily applicable to every function. Brainstorm the root causes for the
applicable failure modes.
f) If the selected SFMEA is a maintenance SFMEA, copy in all of the corrective actions,
implemented since the last baseline, into the template in Table A.8. For each corrective action,
inventory the selected function and determine which of the failure modes is applicable for this
detailed design and which is affected by the corrective action. Brainstorm the root causes for the
applicable failure modes.
g) If the selected SFMEA is a usability SFMEA, copy in all of the use cases into Table A.9 For each
use case, identify the applicable failure modes. Brainstorm the root causes for the applicable failure
modes for each use case.
h) If the selected SFMEA is a serviceability SFMEA, collect the installation scripts for the software.
Identify the applicable failure modes. Brainstorm the root causes for the applicable failure modes.
Use Table A.10.
i) If the selected SFMEA is a vulnerability SFMEA, the steps are similar to the detailed SFMEA.
However, the focus is on the vulnerability design and coding issues. Identify the failure modes that
apply to the design or code under analysis. For each applicable failure mode, identify the common
weakness enumeration that pertains to each failure mode. Use Table A.11.
j) If the selected SFMEA is a production SFMEA Use Table A.12. This viewpoint is the only
viewpoint that is process versus product related. This viewpoint focuses on why the organization is
deficient at detecting software failure modes prior to deployment. Every failure mode will have at
least one associated product related cause and at least one production related cause.
For each row of the SFMEA table identify the consequences as per Figure 30.
64
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Continue using the template that was selected in the previous step 5.2.2.2. Proceed to the
consequences section of that figure. See Table A.13.
b) Identify the local effect of each failure mode and root cause on the software LRU itself. Usually the
software engineering subject matter experts can identify this.
c) Identify the effect of each failure mode on the subsystem. Usually the engineers most experienced
with the subsystem or system can identify this.
d) Identify the effect on the system. Usually the engineers most experienced with the subsystem or
system can identify this.
e) Identify any preventive measures for this effect. Examples of preventive measures are increasing
bandwidth or memory.
f) Identify the severity and likelihood of each failure mode. The risk priority number (RPN) is
calculated from the severity and likelihood ratings. The RPN of a particular failure mode is
compared against the RPN matrix to determine which failure modes are to be mitigated, should be
mitigated, etc.
NOTE—A failure modes effects and criticality analysis (FMECA) is a FMEA that has a quantitative assignment for the
criticality such as a ranking from 1 to 10 so that the probability of the failure mode can be computed in addition to the
RPN.
5.2.2.4 Mitigate
This section of the SFMEA identifies the applicable corrective actions and compensating provisions. If
there are corrective actions or compensating provisions, then the RPN is revised. Corrective actions include
changes to the requirements, design, code, test plan, user’s manuals, installation guides, etc. Compensating
provisions are applicable when an action other than a corrective action can mitigate the failure. For
example, in some cases an end user can mitigate a software failure if they are aware of it soon enough to
avoid it. Figure 31 provides a checklist for mitigation.
The critical items list (CIL) is the final SFMEA ranked so that only the highest RPN items are placed into
the CIL. The software CIL is then merged with the hardware CIL so as to establish the system CIL. A
complete example of a SFMEA can be found in F.2.1.
65
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
5.2.2.6 Understand the differences between a hardware and software failure modes effects
analysis
The items not relevant for a SFMEA are shown in Figure 14:
This SRE task is recommended when any of the items in Table 11 indicates applicability or whenever there
is a system FTA being conducted and the system is software intensive. This task should be a joint effort
between reliability engineering and software management. FTAs, traditionally used for system hazard
and/or safety analyses, provide a top-down look as follows:
If the software is part of a hardware/software system, it should not be analyzed in a vacuum. This is
because many failures are related to interfaces between or interactions between software and hardware.
Software failure events are added to the system level tree and analyzed with the same fault tree connectors
and diagramming that is used for hardware. The software fault tree is integrated with the system fault tree
as opposed to being a standalone fault tree. If the system is software only, then the system FTA is
equivalent to a software FTA.
The part of the FTA (Vesely et al. [B93]) that is unique to software is brainstorming the software related
failure modes. Table 15 can be used to facilitate such brainstorming (Neufleder [B64], SAE [B87]).
0 is the checklist for including software on a system FTA.
66
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Verify that knowledgeable engineers are involved in the construction of the system FTA.
b) Identify system level events that have been caused by software in the past.
c) For each failure event on the system FTA, brainstorm how that failure mode could be caused by
one of the software failure events in Table 15.
d) Brainstorm system level events that can be caused by software but not hardware.
e) Add each viable software failure event to the tree just as the events due to hardware are added to
the tree.
f) Use the appropriate connector such as AND, OR, Exclusive OR to connect the failure event(s) to
the tree.
g) At each level of the fault tree, repeat the preceding steps.
h) When the fault tree has reached the lowest failure mode then the failure modes are reviewed as a
whole and investigated or tested to determine if they have or can cause the top-level event. If the
event has already happened and the FTA is being used to isolate the root causes then each of the
failure modes at the bottom of the tree will be investigated in ranked order or likelihood to isolate
the event at the top of the tree. If the analysis is being performed a priori to any field events then
each of the failure modes at the bottom of the tree are further investigated to determine appropriate
mitigation and to verify that the test plans include testing of the failure mode.
“Development” within the context of this document includes the tasks relating to software requirements,
software architecture and design, software detailed design and coding, implementation, and software unit
testing. These models can be used as early as the proposal or concept phase because they do not require any
defect or testing data that is not available until later in the development cycle. The models can be used
when there is an incremental LCM and they can also be applied to COTS and FOSS software LRUs as well
as firmware.
The SRE activities for the development phase are illustrated in Figure 33. The identification of the system
reliability objective is typically done first either by an acquisitions person or by marketing. The SR is
assessed and predicted as a parallel activity with the hardware reliability prediction activities. Once the
predictions are complete they can and should be sanity checked against typical SR values. The predictions
may be updated depending on the results of the sanity check. If within the scope of the SRE plan, the
67
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
assessment results are analyzed for sensitivity. This analysis identifies both the strengths and gaps in the
development activities that will ultimately effect the system prediction. Once the SR predictions are
finalized they are merged with the hardware predictions into the system reliability model or RBD. After
that the total SR (from all software LRUs) that is needed to meet the system objective is determined. The
SR growth needed to meet the system objective is then determined based on schedule and resources
available. The sensitivity analysis may be revisited to determine how to optimize the reliability growth of
the predictions. It may be necessary to update the system objective if it cannot be met with the current
schedule. For some projects there may be many software LRUs developed by several different
organizations. In that case, it may be necessary to allocate the top-level software requirement down to each
of the LRUs so that it can be tracked by the appropriate software engineering teams. Within each software
LRU it may be necessary to perform a sensitivity analysis in order to meet the particular LRU allocation. In
parallel to all of the SRE modeling the software is being tested by each software engineer and the white box
test coverage is monitored. When development for the increment or the entire system is nearing
completion, the reliability growth requirements and the measured white box test coverage is analyzed to
determine whether to transition to the system level testing phase.
68
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
5.3.6 Plan the SR growth Increases the confidence that the Not affected by development cycle
prediction can be met for the or LCM.
software given the current schedule.
5.3.7 Perform a sensitivity Identifies gaps, strengths, and This is performed in an early
analysis development practices that are increment or sprint.
least/most sensitivity to reliability
5.3.8 Allocate the required If there are several software Any LRU developed in any
reliability to the software LRUs organizations this step may be increment is subject to an
necessary for tracking. allocation.
5.3.9 Employ SR metrics for Identifies whether the code is stable Performed at each increment or
transition to system testing enough to progress to verification sprint as well as final increment.
testing.
This is an essential task and is a prerequisite for several other tasks. A system reliability objective is a
reliability figure of merit for the entire system including hardware and software. It is usually initially
developed during the concept or proposal phase of the project. The initial system reliability objective may
be and often is refined once the reliability predictions for the system and the engineering efforts begin. The
system reliability objective may be determined by the acquisitions organization, systems engineering, or
marketing depending on the type of system being developed. The reliability objective may be and often is
specified in a contract and hence will also be called a system reliability specification.
The potential issues with the system reliability objective with regards to SRE that are addressed by this
subclause are as follows:
69
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The checklist shown in Figure 34 is recommended to provide for a system reliability figure of merit that
reflects the software part of the system, is clearly defined and clearly communicated. Table 17 lists some
general guidance for identifying the appropriate reliability figures of merit.
a) Use Table 17 to determine which reliability figures of merit are most applicable.
b) Verify that the specification clearly indicates that the requirement applies to both hardware and
software.
c) Verify that the objective considers the impact of software failures. Review any past similar
systems and determine the actual reliability figures of merit and the impact of the software on
those systems prior to establishing the specification. Remember that systems rarely lose
functionality. If a past system had system failures due to software, a future system will likely
have more due to systems having more functions performed by the software.
d) Specify particular milestones for when the objective is to be met. SR will grow when there is
reliability growth testing and no new features are added. Once new features are added, the
reliability growth resets. Consider this when establishing the milestones for the objective. Some
typical milestones are as follows:
End of engineering test
End of acceptance test
Average of first year of usage
e) Verify that the initial system reliability objective clearly defines the word failure. An FDSC is
one of the best ways to do this. See 5.1.2. Make sure that the FDSC includes examples of
expected failures due to software and assigns an appropriate criticality to those failures.
f) Derive the quantitative objective itself based on the recommendations in Figure 35, Figure 36,
and
Figure 37.
There are three possibilities for deriving a quantitative objective. First, the objective can be derived from a
predecessor system. See Figure 35 for how to derive the objective in that case. If there is no predecessor
system then there is another alternative for deriving the objective. For instructions for this case see
Figure 36. If the system is mass deployed there is a third means of deriving the reliability objective. See
Figure 37 for instructions for how to derive a quantitative objective for mass deployed systems.
70
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Identify the actual figure of merit from field data. For example, if the selected figure of merit is
MTBSA then calculate the actual MTBSA for a predecessor system. Separate the software and
hardware related failures in the data.
b) Determine how long it has been since the software was developed for this predecessor. Multiply
the number of software failures by at least 110% for each year since that predecessor software
has been deployed. This accounts for the fact that systems on average increase in size by 10%
each year. (See US GAO [B92].)
c) Determine how much the hardware has changed. Keep in mind that some hardware may be
replaced by software. Adjust the historical hardware failure count by this percentage. The result
is an adjusted system failure count. Compute the relative difference between the adjusted
system failure count and the historical system failure count.
d) Determine the MTBSA objective by adjusting the result of step c) by the historical MTBSA.
Determine the MTBF, MTBEFF, and MTBCF similarly.
e) Determine the typical mission time for the new system. Use the MTBCF objective determined
in step d) and the typical mission time to determine the required reliability for the new system.
f) Determine the typical MTTR for the hardware and the MTSWR for the software. (See 5.3.2.3
Step 5.) Average the MTTR and MTSWR based on the percentage of HW and SW based on the
adjusted failures expected from each. This is the average repair and restore time. The
availability objective is computed using the average restore time and the predicted MTBCF
from step d).
Figure 35 —Derive the quantitative objective when system has at least one predecessor
a) Use one of the models in 5.3.2.3, 6.2, and Annex B to predict the SR.
b) Use an industry accepted model to predict the hardware reliability.
c) Combine the predictions as per 5.3.4.
d) The objective MTBF, MTBEFF, MTBCF, MTBSA, or failure rate is established based on the
achievable failure rates of the hardware and software.
e) Determine the typical mission time for the new system. Use the MTBCF objective determined
in step d) and the typical mission time to determine the required reliability for the new system.
f) Determine the typical MTTR for the hardware and the MTSWR for the software. (See 5.3.2.3
Step 5.) Average the MTTR and MTSWR based on the percentage of HW and SW based on the
adjusted failures expected from each. This is the average repair and restore time. The
availability objective is computed using the average restore time and the predicted MTBCF
from step d).
71
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Determine the maximum acceptable number of maintenance actions that require a field service
engineer to perform.
b) Determine the expected average duty cycle of the system
c) Using established methods such as a Reliability Demonstration Test (RDT), determine the
maximum failure rate that will support the maximum number of maintenance actions from step
a).
d) Invert the result of step c) to determine the objective MTBF. Adjust it by the expected
percentage of failures that will result in a system abort to yield the MTBSA objective. A system
abort means that the software is no longer performing its function. Adjust the objective MTBF
by the expected percentage of failures that will be critical to yield the MTBCF objective. Adjust
the objective MTBF by the expected number of failures that will be in essential functions to
yield the MTBEFF objective.
e) Determine the typical mission time for the new system. Use the MTBCF objective determined
in step d) and the typical mission time to determine the required reliability for the new system.
f) Determine the typical MTTR for the hardware and the MTSWR for the software. (See 5.3.2.3
Step 5.) Average the MTTR and MTSWR based on the percentage of HW and SW based on the
adjusted failures expected from each. This is the average repair and restore time. The
availability objective is computed using the average restore time and the predicted MTBCF
from step d).
This is an essential task that is a prerequisite for tasks 5.3.3 through 5.3.8. This task can also be used to
help define the reliability objective. See 5.3.1. The lead role for initiating the SR assessment is typically the
reliability engineer or software quality engineer. The assessment requires that software engineers provide
inputs. The results of those inputs are provided to the reliability engineer who will perform the SRE
calculations.
A SR assessment is when the practices employed on a SR project are assessed to predict either the risk
level of the software or a SR figure of merit such as defect density. All SR assessments have some sort of
survey or questionnaire. The survey is completed and scored to yield a predicted score, which determines
the defect density prediction that is then used to predict the reliability figures of merit. The survey is also
used for sensitivity analysis. One can determine what the predicted defect density would be, for example,
by instituting a particular change in the planned development practices. This predicted change can then be
compared with the cost of implementing that change and the cost reduction of having fewer defects as
predicted by the sensitivity analysis.
SR assessments also allow for a reliability engineer to sanity check the predicted SR results against an
actual range of values. Evaluations are also useful for establishing the SR or risk level of a software vendor.
Figure 38 contains the checklist for performing a SR assessment and prediction.
72
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Collect data about the project and product such as size and software LRUs. See 5.3.2.1.
b) Select a model to predict software defects or defect density. See 5.3.2.2.
c) Once the model is selected as per 5.3.2.2, proceed to 5.3.2.3, and then to 6.2 or Annex B for
instructions on how to use the selected models.
d) If the software is being developed with an incremental or evolutionary LCM, apply the models as
per 5.3.2.4.
e) If the software is being developed by more than one organization, use the assessment to qualify a
subcontractor, COTS, or FOSS vendor as per 5.3.2.5.
The prediction models for SR assessment require three types of data as follows:
Project data contains information to identify and characterize each system. Project data allow users to
categorize projects based on application type, development methodology, and operational environment.
Typically a Software Development Plan (SDP) has most of the information needed for the predictions. Size
data is necessarily for predicting the defects. Smaller systems will usually have fewer defects than larger
systems. The practitioner also needs to know other characteristics about the LRU such as who is developing
the LRU. Figure 39 is a checklist for collecting the data required for the SR prediction and assessment.
73
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
There are several models available for predicting SR before the code is complete as follows. Each model
has different inputs and hence will usually produce different outputs. The process for selecting the best
prediction model is based firstly on eliminating models that require information than are available or
require more inputs than what the schedule or budget can accommodate. Once the models that cannot be
used are eliminated, the remaining models are assessed to determine which ones are used most in a
particular industry and which ones are the most current with technology. Table 18 is a summary of the
prediction models.
Key:
Number of inputs—Some models have only one input while others have many inputs. Generally speaking,
the more inputs to the model, the more accurate the prediction. However, more inputs also means that more
effort is required to use the model.
Predicted output—The models predict either defect density or defects. Later in 5.3.2.3 Steps 2 and 3, it
will be shown how to convert defects/defect density to failure rate.
Industry supported—All early prediction models are based on empirical data from real historical projects.
Ideally the empirical data is from software systems that are similar to the software system that is
undergoing the prediction. As an example, if one is predicting the defect density of an airborne system, one
would want to use a model that has empirical data from this type of system.
Effort required to use the model—This is directly related to the number of inputs and the ease of
acquiring the data required for the model.
Relative accuracy—The model accuracy is a function of the number of inputs for the model, how similar
the historical data is to the system under analysis, how many sets of data comprised the model, and how
current the historical data is.
74
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Effort
Year
required
Number Predicted Industry Relative developed/
Model to Reference
of inputs output supported accuracy last
use the
updated
model
Industry tables 1 Defect Several Quick Varies 1992, 2015 6.2.1.2
Neufelder [B68] density
SAIC [B77]
CMMI tables 1 Defect Any Quick Increases 1997, 2012 6.2.1.3
Neufelder [B68] density with
CMMI
level
Shortcut model 23 Defect Any Moderate Medium 1993, 2012 6.2.1.1
Neufelder [B63] density
Full-scale model 94 to 299 Defect Any Detailed Medium- 1993, 2012 B.2.1
Neufelder [B67] density high
Metric based Varies Defects Any Varies Varies NA B.2.2
models
Smidts [B85]
Historical data A Defect Any Moderate High NA B.2.3
minimum density
of 2
Rayleigh model 3 Defects Any Moderate Medium NA B.2.4
Putnam [B71]
RADC TR-92- 43 to 222 Defect Aircraft Detailed Obsolete 1978, 1992 B.2.5
52 density
SAIC [B77]
Neufelder model Defect Any Detailed Medium 2015 B.2.6
Quanterion density to high
[B72]
Year developed/updated—Ideally, the model should be as current with modern technology as possible.
The Rome Laboratory Model is an example of a model that is partially outdated and specific to one
industry. When the Rome Laboratory Model was developed, object-oriented development was just
emerging, Ada was the default programming language, and waterfall development was the standard. The
parts of the model that are not outdated have been adapted and used as a framework for other models such
as the Shortcut and Full-scale models. Parts of the Rome Laboratory Model can be used to calibrate
historical defect data that has been collected either internally or from any of the lookup tables shown in
Table 18. There are several industry tables that map the industry type or CMMI level to defect density.
Several of these lookup tables are outdated. See 6.2.1 for the most current tables.
Figure 40 is a checklist for selecting a SR prediction model based on the preceding characteristics.
75
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Software reliability predictions models are used during requirements and design phase for the following
purpose:
During the requirements phase, system and SR prediction models verify that the set reliability
requirements are achievable prior to defining contractual requirements or prior to entering into the
design phase of the system life cycle.
During the design phase system and SR prediction models verify that the preliminary and critical
designs being proposed can achieve and meet the reliability requirements prior to commencing
system production.
The process for predicting the reliability figures of merit early in development is shown in Figure 41.
76
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The steps for predicting SR early in development are shown in Table 19.
The steps for predicting defects via defect density are shown in Figure 42.
a) Predict testing defect density using the model(s) selected in Table 18. The specific instructions
for the defect density models are found in 6.2.1 and B.2.
b) Predict operational defect density using the model(s) selected in Table 18. The specific
instructions for the defect density models are found in 6.2.1 and B.2.
c) Predict the effect size as discussed B.1.
d) Multiply the result of step a) by the result of step c) to yield the predicted number of testing
defects.
e) Multiply the result of step b) by the result of step c) to yield the predict number of defects found
in operation, post production release.
In order to convert a defect prediction to a failure rate prediction, one needs to predict when the defects will
be discovered over usage time. Recall that a discovered defect is essentially a fault. One of the following
approaches is used to determine when the defects will be manifested as faults. All of the methods in
Table 20 can and are used for forecasting reliability growth. These models will be used to forecast when the
predicted defects will be discovered.
77
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Both Duane and AMSAA-PM2 have MTBF curves that will tend to flatten out over test time. The MTBF
will still increase but much more slowly than the Exponential Method. This is illustrated Figure 43.
0 2 4 6 8 10 12 14
Figure 43 —Trend with the Exponential model and AMSAA PM2 Model
The shape parameters for the Exponential model are defined based on the type of system. See 6.2.2.1 for
procedures. The Duane and AMSAA PM2 models can be used as predictors if one has historical data from
similar programs to derive the shape parameters. See 6.2.2.2 for procedures.
Figure 44 is an example of a fault profile that results from Step 2 using the Exponential model. The
following fault profile will be used to predict the failure rate in Step 3. Note that even though faults are
measured as integers, fractional values are used for the predicted faults to allow for more accuracy for the
other metrics in Steps 3 to 5.
78
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Figure 44 —Step 2
Step 3—Predict failure rate and MTBF
In the previous step, the defects that are expected to be found over usage time were predicted. However, to
predict a point in time failure rate or a failure rate profile, one needs to establish how many hours the
software will be operating for each monthly interval. The duty cycle is how much the software is operating
on a daily or monthly basis. Some software configuration items may operate continuously while others may
operate infrequently. The goal of this step is not to identify the duty cycle of every individual function but
rather to identify the duty cycle of each software LRU that will appear on the RBD. The checklist of things
to consider when predicting the duty cycle are shown in Table 21:
Once deployed, will the typical system and therefore software be Dishwashers, aircraft, spacecraft, military
operating as a function of a particular mission? vehicles, etc.
Example #1—Duty cycle for a system related to work hours: A manufacturer of commercial lighting
systems knows that the typical customer will have office hours spanning from 7 am to 6 pm Monday
through Friday. The predicted duty cycle is therefore 55 h per week or 232 h per month.
Example #2—Duty cycle for a mission-oriented system: A manufacturer of dishwashers knows that the
average customer for this particular model will run the dishwasher once every day. It is also known that the
dishwasher cycle is 1 h long. Hence the duty cycle estimate is 1 h per day or 30 h per month.
Figure 45 is the checklist for predicting the failure rate, MTBF, MTBSA, and MTBEFF profiles.
a) Consider one “typical” deployed system as opposed to a system that is in development. For example,
hundreds of military vehicles may be deployed but the goal is to estimate the typical duty cycle of one of
those vehicles.
b) Compute the duty cycle as per Table 21.
c) The failure rate is computed by simply dividing the fault profile predicted in the duty cycle for the
software for each month of operation as predicted in the preceding step.
Predicted λ (month i) = Faults predicted for that month / Ti
Predicted MTBF(month i) = Ti/ Faults predicted for that month
79
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
d) Identify either from industry averages or from past historical data the fraction of faults expected to result
in a system abort, essential function failure, or a critical failure. Adjust the estimated for MTBF by
dividing by these fractions.
Predicted MTBCF (month i) = MTBF(month i)/fraction of faults expected to be critical
Predicted MTBSA (month i) = MTBF(month i)/fraction of faults expected to result in a system abort
Predicted MTBEFF (month i) = MTBF(month i)/fraction of faults expected to result in an essential
function failure.
Where: Ti = operational duty cycle during one month for one instance of the system
e) Since this is an exponential model, the predicted MTBFi is simply the inverse of the predicted failure
rate λi
Figure 45 —Step 3: Predict the failure rate, MTBF, MTBSA, MTBEFF profile
The failure rate predictions take into account defects of every severity level since the defect density
prediction models take into consideration defects of every severity level. However, the model does assume
that every defect is significant enough to ultimately be corrected. The mean time between failures (MTBF)
is a prediction of any failure that is noticeable and ultimately needs to be removed. Those failures can range
from catastrophic to noticeable. Some failures may result in a system abort (the entire system is down for
some period of time). Some failures may result in an essential function failure (a loss of a required system
function but the system is still operating). If one wishes to predict the mean time between critical failure
(MTBCF), mean time between system abort (MTBSA), or mean time between essential function failure
(MTBEFF) one simply adjusts the predicted MTBF by the percentage of total faults that typical result in
those three categories of failure.
Reliability is the probability of success over some specified mission time. Once the failure rate is predicted,
the reliability can be predicted by using Equation (1):
where mission time is the amount of usage time that corresponds to one mission of the software.
For example, if the system is a dishwasher, a mission would be running the dishwasher one time under
specific conditions. If the system is an aircraft a mission would be one typical flight under specified
conditions.
The MTBCF is the array of values predicted in Step 3. Note that the MTBCF is typically used in the
reliability estimates because typically only the critical defects affect reliability. The practitioner should
verify that the adjustment factor used to predict MTBCF is based on the percentage of faults that affect
reliability. Since the predicted MTBCF is an array of values over the reliability growth period then the
reliability prediction is also an array of values over the reliability growth period. Figure 46 is the checklist
for predicting reliability.
80
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Determine the expected mission time for the software. The mission time is not the same as the
duty cycle. The mission time of an aircraft for example would be the average number of hours of
an average flight while the duty cycle would be the total number of flight hours per interval of
time such as a month.
b) Solve for the predicted reliability of that mission time as follows. Note that this is an array of
values over the growth of the software since the failure rate prediction is an array of values of
the growth of the software, as shown in the following formula:
Predicted reliability (mission time) = e(–mission time/MTBCFi)
Availability can be computed for the software just as it is for any other component. Once the MTBF is
known, the availability for a continuous system can be predicted as shown in Equation (2). Note that this
formula is a limiting form. The actual availability over a short period may be different.
Since software does not wear out, MTTR does not apply. However, mean time to software restore
(MTSWR) does apply. MTSWR is computed as a weighted average of the possible restore activities as
shown in Table 22. (See Neufelder [B67].)
81
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Since the MTBF is an array of values over the reliability growth period then the availability prediction is
also an array of values over the reliability growth period. Figure 47 is the checklist for predicting
availability.
If the software is being developed in more than one increment, that will impact how the prediction models
are used. Table 23 lists the considerations pertaining to the incremental or evolutionary models and
guidance for how to apply the prediction models.
The prediction models have one thing in common. They predict the reliability figures of merit for a
particular software release. When there will be incremental development, one has two choices for
predicting the figures of merit as shown in Figure 48.
82
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) If the requirements are defined at the beginning of the increment and subsequent increments are
design/code/test then
1) Predict the estimated size of the features in a particular increment.
2) For each incremental release, predict the number of defects for each incremental release using
the methods in 5.3.2.3 Steps 1 and 2.
3) Add the predicted defects together from step b).
4) Estimate the duty cycle of a typical operational system as per 5.3.2.3 Step 3.
5) Divide the predicted defects by the predicted duty cycle for an operational system as per
5.3.2.3 Step 3 to yield a predicted failure rate for the final operational release.
b) If the requirements are evolving with each increment then
1) Predict the estimated size of the features in a particular increment.
2) Predict the number of defects for each incremental release using the methods in 5.3.2.3
Steps 1 and 2.
3) Predict when those defects will manifest into faults for each increment and plot this over
calendar time. Each of the fault profiles will generally overlap.
4) Determine the overall fault profile by combining the overlapped defect profiles for each
month of testing or operation.
5) Estimate the duty cycle of a typical operational system as per 5.3.2.3 Step 3.
6) Divide the total predicted defects by the predicted duty cycle for an operational system as per
5.3.2.3 Step 3 to yield a predicted failure rate for the final operational release.
c) Update the size predictions whenever there is an internal release.
d) Update the overall predictions whenever there is an external release.
This task is applicable whenever the list of LRUs that are applicable for a SR prediction (from 5.1.1.1)
includes an LRU developed by a third party. There are three types of software vendors, as follows:
83
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
agreement with the organization developing the system. That working agreement may provide for exchange
of a software development plan, which typically describes the software development practices that are used
to predict defect density as well as the size estimates that are needed to convert the defect density to
defects, which is then converted to failure rate. If so, any of the models shown in 6.2 are feasible. Each
vendor should have a separate prediction. If the vendor is supplying multiple software configuration items
then there should be a separate prediction for each item. The practitioner should request the software
development and the size estimates in 1000 source lines of code (KSLOC) well in advance to provide for a
timely delivery of data.
Figure 49 is a procedure for qualifying a vendor as well as predicting the reliability of the vendor supplied
software LRUs. The procedure can be tailored based on the criticality and risk of the particular vendor
supplied LRU.
COTS LRUs
COTS are an essential ingredient in enterprise and embedded systems. Examples of COTS include
operating systems and middleware. Establishing a reliability model requires a commitment from the vendor
to provide reliability metrics. Typical COTS manufacturers did not have a vendor relationship because the
software is purchased off the shelf. In some cases the COTS manufacturer may be willing to provide
reliability data for a specific environment and COTS configuration. The following instructions assume that
data is not available.
The difficulty with including COTS in a prediction is that generally the practitioner does not have access to
the following:
The difficulty is in predicting the number of defects that will be attributed to the COTS component. Once
the defects are predicted the reliability figure of merits can be predicted using 6.2, Annex B, 5.3.2.3 Steps 3
to 5. There are three approaches for predicting the number of defects from a COTS software LRU. An
example of the first approach can be found in F.3.2 step 1. The checklist for assessing SR of COTS LRUs
is shown in Figure 50.
84
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Identify all subcontractors, COTS, and FOSS vendors. Include only the vendors that are supplying
software that will be deployed with the system. For example, do not include vendors that are
supplying development tools. Assess as much of the following as possible for each organization:
1) Management—Identify whether the vendor has any process in place to set the SR goals and
SR plans.
2) Organizational structure and development life cycle—Identify whether the roles and
responsibilities clearly defined and understood and is there an approved product development
methodology.
3) Quality system—It is important to determine if there is a culture of continuous process of
product improvement in place and there is closed feedback loop that takes the lessons learned
and builds them back into the development process. Also, how strong is the training program
since it is essential to make sure all members of the design team have similar exposure to the
product development culture. Is there a defect tracking and review process in place?
4) Software development—The goal is to gain confidence that the team follows the software
development process. Determine if there is standardization around tools, RCA, defect
prediction and reduction, overall software robustness, and finally coding standards.
5) Testing—Investigate vendor testing strategy, type of testing is being performed such as unit,
integration, system, and solution testing. Is there a SR demonstration process in place and
how the target failure rate was determined and types of issue found during the system
software testing? Are all the issues being categorized properly in terms of severity? What is
the time line to close the issues and what is their definition of close.
6) Software development processes—Verify that the processes employed match the defined
processes and that those processes are sufficient for delivering reliable software.
b) If the results of step a) indicate deficiencies in management, organization structure and LCM,
quality system, software development, testing and processes, then proceed to steps c) to e) as
applicable and then proceed to the sensitivity analysis in 5.3.7 to determine the overall impact of
the vendor risks on the system prediction. If the vendor supplied software is relatively small in
comparison to the rest of the system it is possible that the vendor is acceptable even with risks. On
the other hand, if the vendor is supplying a relatively large amount of effective code, then
deficiencies may significantly impact the system reliability and hence may require selection of an
alternative vendor.
c) Identify all subcontractor developed components. Apply 5.3.2.2 and 5.3.2.3 to assess each of the
subcontractors and establish a reliability prediction for each subcontractor supplied LRU.
d) Identify all COTS LRUS and vendors. Assess each COTS vendor to establish a reliability
prediction for each COTS LRU.
e) Identify all FOSS LRUSs and vendors. Assess each FOSS vendor to establish a reliability
prediction for each FOSS LRU.
f) The resulting assessments from these predictions are merged into the system model just like the
other software LRUs.
85
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Assuming that the vendor development practices are unknown, use the industry or CMMI level
defect density prediction models. Assuming that the vendor’s code size in either effective
1000 source lines of code (EKSLOC) or function points is unknown, estimate the effective size via
the size of the executable using the procedure in B.1. Multiply the predicted defect density by the
predicted EKSLOC to yield the total predicted defects.
OR
b) Predict the defects based on past history with this COTS (i.e., how many defects were found in this
COTS component on a previous similar system?)
OR
c) If the particular COTS has been operational for at least 3 years and there are many installed sites
and the original manufacturer is still in business, it may be feasible to assume that the impact of the
COTS component is negligible.
FOSS LRUs
The primary difference between COTS and FOSS is the vendor’s business model. The COTS vendor
assumes all risks with the product as part of the cost while the FOSS vendor does not. FOSS presents a
unique challenge to organizations. There is no standard LCM or a technical solution that considers quality
of service (or attributes or non-functional requirements).
From a prediction standpoint the major differences between open-sourced software and COTS is an
understanding of the development practices employed on the software. Since the software may be written
by several different people with differing development practices. On the other hand, many times the size of
the software is known for open-sourced software. So, the steps are shown in Figure 51.
a) Has the FOSS component ever been used for a previous system? If so, compute the actual number
of defects (even if it is zero) from that previous system, use that for the prediction and do not
proceed to the next step. Otherwise proceed to step b).
b) If the particular FOSS has been operational for at least 3 years and there are many installed sites
with no known software failures, it may be feasible to assume that the impact of the FOSS
component is negligible. Otherwise go to step c).
c) Assuming that the FOSS development practices are unknown, use the industry level lookup model
to defect density since the application type is probably the only known characteristic.
d) Count the number of KSLOC directly from the code using the methods in B.1.1.
e) Multiply the result of step d) by 0.1 if the FOSS software has been deployed for at least 3 years.
f) Estimate the total number of installed sites for FOSS component. Multiply this by the appropriate
entry from Table B.5.
g) Multiply the result of step f) by the predicted defect density in step c).
86
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
This task is recommended if the SR practitioner does not have knowledge of past SR figures of merit to
sanity check the results of 5.3.2. The leader for this task is typically the reliability engineer. The
acquisitions organization may also sanity check the predictions as part of their monitoring and assessment
activity.
Despite the fact that SR is more than 50 years old, hardware reliability prediction has been employed in
industry much longer than SR prediction. That means that engineers have more resources and knowledge
available for sanity checking hardware predictions than for sanity checking software predictions. When a
reliability engineer performs a prediction on a software LRU, the engineer may experience some level of
angst if they do not have some actual reliability numbers to compare it to. If one makes a simple
mathematical mistake is there a way to detect this? If the software has 5 million lines of code, what is a
reasonable prediction?
This will present some typical reliability estimates based on how large the software is and how long it has
been deployed. If the practitioner has actual field data as discussed in 5.6, then that field data can be used
for the sanity check instead of Table 25. Be advised that Table 25 is based on the number of full-time
software engineers working on a particular release. Differences in productivity and product maturity and
inherent product risks can affect the actual MTBF. One should use Table 25 as a relative guideline for
sanity checking. Larger projects will have more faults than smaller projects when everything else is equal.
The number of software engineers working on the software is often a good indicator of the size of the
software.
The MTBF at initial deployment is the MTBF at the point in time in which the development organization
has completed all tests and are deploying the software to the field for the first time. The MTBF after 1 year
of deployment is the MTBF after 1 full year of reliability growth on real operational systems with real end
users and no new feature additions. Table 25 applies to each software LRU. A system may be composed of
several software LRUs so one should apply the sanity checking based on the number of people working on
each software LRU. (See Neufelder [B68].)
87
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Table 25 —Typical MTBF in hours values for various software sized systems
Worst case Average Best case
Size range in MTBF after MTBF after MTBF after
software Worst case Average Best case 1 year of 1 year of 1 year of
people years MTBF at MTBF at MTBF at 24/7 24/7 24/7
(include only initial initial initial operation operation operation
those writing deployment deployment deployment with no new with no new with no new
code) feature feature feature
drops drops drops
One-person
70 2600 18 500 750 7500 52 000
project
Very small,
2–9 software 14 550 3700 150 1500 10 500
people years
Small to
medium,
2 100 625 25 250 1750
10–49 software
people years
Medium,
50–99 software 1 35 250 10 100 700
people years
Large, 100–149
1 25 150 6 60 425
people years
Very large, 200
or more people Very small 15 100 4 40 275
years
NOTE—The people years apply only to those people writing the code. Worst case = deficient development
practices or many project risks, slow growth after deployment. Average case = average development practices with
1 or 2 major risks, average growth after deployment. Best case = superior development practices and no inherent
risks, fast growth after deployment. All columns are shown with the average number of people in that group. Note
that 1 year of operation means that the software is running continually.
Table 26 is a guideline for the percentage of software failures that will result in a reliability or availability
related failure.
88
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Predict the MTBF for each LRU for the first month of deployment and for 12 months of
operational usage.
b) Identify the approximate number of full-time software engineers who are developing the code for
that LRU. Do not include management, test engineers, SQA engineers.
c) Locate the appropriate row in Table 25 and identify the best, worst, and average case for the MTBF
immediately upon deployment and after 12 months of usage.
d) Compare the predicted values from step a) to the values from step c).
e) If the predicted values are not within the associated best to worst case range, revisit 5.3.2 and verify
that the size, assessment inputs, and reliability growth inputs are valid. Revisit all computations to
verify that the correct units of measure have been used. For example, make sure that size estimates
in terms of SLOC have been properly converted to KSLOC.
f) Compare Table 26 to the estimates for the percentage of faults that impact availability as per Step 3
of 5.3.2.3. If the percentages are not in range, revisit Step 3 of 5.3.2.3 and then recalculate the
reliability and availability predictions in step d) and e).
5.3.4 Merge the software reliability predictions into the overall system prediction
This is an essential task if the system is comprised of elements other than software. This task is typically
performed by the reliability engineer. The acquisitions organization will typically review the result of this
task. Once the SR predictions are completed, they are merged with the hardware reliability predictions into
the system reliability model, which is usually a reliability block diagram (RBD). The RBD may have
redundancy, voting algorithms, or dependencies between hardware and software, which are described in
Table 27.
The steps for merging software predictions into the overall system predictions are shown in Figure 53.
89
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Obtain the RBD for the hardware and/or system as well as the hardware reliability predictions
b) Obtain the SR predictions from 5.3.2
c) Add software LRUs to the RBD as per the following instructions.
d) Compute the system reliability as per 5.3.4.
e) Keep the RBD up to date throughout the life cycle and particularly whenever the size predictions or
the predictions from 5.3.2 change.
5.3.4.1 No redundancy
Consider the following example. A railroad boxcar will be automatically identified by scanning its serial
number (written in bar code form) as the car rolls past a major station on a railroad system. Software
compares the number read with a database for match, no match, or partial match. A simplified hardware
graph for the system is given in Figure 54, and the hardware reliability, R(HW), in Equation (3):
R( HW ) = RS × RC × RD × RP (3)
where
The hardware and software models are shown in Figure 54 and Figure 55. The reliability equation of the
software is shown in Equation (4):
where
R(SYSTEM)
= R( HW ) × R( SW ) (5)
90
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
In a more complex case, the hardware and software are not independent and a more complex model is
needed. For example, consider a fault tolerant computer system with hardware failure probabilities C1, C2,
C3, the same software on each computer, with failure probabilities SW1″, SW2″, SW3″, respectively, and a
majority voter V (Lakey, Neufelder [B46]). In the example, the voting algorithm V would compare the
outputs of SW1″, SW2″, and SW3″ and declare a “correct” output based on two or three out of three outputs
being equal. If none of the outputs is equal, the default would be used, i.e., SW1″. Refer to Figure 56.
Assume that some of the failures are dependent; for instance a hardware failure in C1 causes a software
failure in SW2″, or a software failure in C1 causes a failure in SW2″. Some software failures [SW″ in
Equation (6)] are independent because this software is common to all computers. Therefore, failures in SW″
are not dependent on failures occurring in the non-common parts of the software. This is shown in Figure
56 and Equation (6) as SW″ in series with the parallel components.
C1 SW1"
Voting Common
C2 SW2" Algorithm Software
C3 SW3"
Examples of hybrid reliability models are found in medical, military, and commercial subsystems, systems,
system of systems, and enterprise systems. For example, a navigation subsystem reliability model with
designed redundancy of hardware or electronics (GPSHW1, GPSHW2, ComputerHW3, ComputerHW4,
DataStoreHW1, DataStoreHW2) and software (GPS SW1, GPS SW2, GUIchartingSW1,
GUIchartingSW2) probabilities shown in Figure 57.
91
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Rs =−
1 Qs =−
1 (Q1 Q2 ...Qn )
1 (1 − R1 )(1 − R2 )...(1 − Rn )
=− (7)
n
1−
= Π(1 − Ri )
i =1
In the case of independent components the reliability of the system is then given by Equation (8):
n
Rs = ΠP ( X i ) (8)
i =1
This is an essential task. Typically the acquisitions organization provides a system level reliability
requirement. However, if the system is mostly or entirely software, the acquisitions organization has the
primary responsibility of performing this task. Otherwise, the reliability engineer is primarily responsible
for this task. The software manager(s) and software quality assurance and test engineer should review the
SR requirement as they will be responsible for developing and testing to that requirement.
The steps for determining an appropriate reliability requirement are shown in Figure 59:
92
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
This subclause determines what portion of an MTBF or MTBCF objective is applicable for the software
LRUs. It is assumed that the practitioner has the system MTBF objective. There are several methods to
determine what portion of the system MTBF or MTBCF is applicable to software, thereby providing a SR
objective for all software LRUs. The methods, benefits, and disadvantages are shown in Table 28.
93
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Benefits/
Method Description Formula
disadvantages
ARINC Assumes series subsystems with Requires an analysis of See reference.
apportion- constant failure rates, such that any each subsystems OP.
ment subsystem failure causes system
techniques failure and that subsystem mission
(Von Alven time is equal to system mission
[B95]) time.
Feasibility of Subsystem allocation factors are Method of allocating See reference.
objectives computed as a function of Delphi reliability without repair
technique numerical ratings of system for mechanical-
(Eng. Des. intricacy, state of the art, electrical systems.
[B15]) performance time, and
environmental conditions.
Minimization Assumes a system comprised of n Considers minimization See reference.
of effort subsystems in series. The of total effort expended
algorithm reliability of each subsystem is to meet system
(MIL-HDBK- measured at the present stage of reliability requirements.
338B [B54]) development, and apportions
reliability such that greater
reliability improvement is
demanded of the lower reliability
subsystems.
The steps for determining an appropriate software MTBF/MTBCF from a system MTBF/MTBCF are
shown in Figure 60.
94
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Figure 62 provides an example breakdown of system reliability requirement and supporting documentation.
The steps for establishing a software availability objective are shown in Figure 63.
95
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Figure 64 provides an example of a breakdown of a system availability requirements and its supporting
documentation.
Mean time to repair (MTTR) applies to hardware. Mean time to software restore (MTSWR) applies to
software. The appropriate restore activities for software include reboot, reload, workaround, and restart. It
can also include correcting the underlying defect and updating the software. In that case, the new software
96
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
is not identical to the existing software. The weighted average of the MTTR and MTSWR times comprises
the system restore time, which is at the top of the diagram.
It is not uncommon for a system specification to be applied to the software such that overall software
objective cannot be met by any SR prediction model. This can happen when:
The preceding issues can be avoided by employing the steps shown in 5.3.1 to identify the initial objective
and then to revisit as the software and system design evolve.
This is an essential task. The primary responsibility for performing the task is software management. The
reliability engineer, however, uses the reliability growth to predict the SR. Reliability growth is one of the
most sensitive parameters in the SRE prediction models. Hence, the reliability engineer should not make
any assumptions about it without software management input.
Complex software intensive systems should have a separate SR Growth Plan in addition to a Reliability
Growth Plan. The plan should consist of feasible and executable steps that can be easily verified and
controlled. It should also provide a closed-loop management mechanism that relies on objective
measurements. The steps for planning the SR growth are shown in Figure 65.
Verify that there will be sufficient code coverage at the subsystem level prior to integration
with the hardware. See 5.3.9.2 on code coverage.
97
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Verify that integration tests will be performed on all developmental subsystems (in prioritized
fashion) to verify the functionality of the subsystem and to confirm reliability and robustness
of subsystems in the presence of noise factors.
Verify that memory management will be exercised in the presence of noise factors, including
human factors, in order to verify lack of the memory leaks and other memory related defects
in the software code.
Verify that software will be operated for extended periods of time without a reboot or restart.
Verify that SR testing at the system level is an integral part of the testing for early prototypes
as well as the production systems. Given that the operational environment provides potential
important root causes of failures, functionality should be tested in all potential scenarios and
environment. Extend the duty cycle of subsystems during system level testing to verify the
robustness of the software in accordance with the system OP.
g) Verify that there will not be any new feature drops during the reliability growth phase as new
features “reset” the reliability growth. If a feature drop is scheduled during a reliability growth
period, communicate the effect on the reliability growth to management.
h) Verify whether defects identified from prior software releases will impact the reliability growth due
to defect pileup that spills into this release. See 5.1.3.4 and 5.3.2.3 Step 2.
i) Whenever there is a schedule slip or change for software, verify that the reliability growth plans are
still valid and adjust the duty cycle as needed to compensate for the shortened reliability growth
period.
This is a project specific task. If the software is in an early development phase, this task is relevant.
However, if the software has already been developed this task can provide value for future development
releases. The sensitivity analysis is performed whenever it is apparent that the overall system reliability
objective will not likely be met or when it has been decided that the software defects deployed need to be
reduced for purposes of reducing maintenance labor or defect pile up. The sensitivity analysis for the
software LRUs is performed by the software management. However, the sensitivity analysis at the system
level is performed by the reliability engineer since it requires knowledge of the overall system RBD.
A sensitivity analysis is when each of the inputs to the SR predictions is analyzed based on the best and
worst case. Sometimes changes to the software, the organization structure, development practices, spacing
between releases can have a positive impact on the reliability. Some changes may be costly and timely
while others may be simple and relatively inexpensive.
Sensitivity analysis is conducted on each software LRU and then at the system level. The methods in
5.3.7.1 can be executed as early as the completion of the SR predictions in 5.3.2.3. The methods in 5.3.7.2
can be executed once the SR predictions for each LRU are placed on the RBD as per 5.3.4. See F.3.5 for an
example.
SR prediction is highly sensitive to the following factors. For each factor, the practitioner should consider
how this factor can be reasonably and realistically adjusted so as to have a positive effect on the predicted
MTBF.
98
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Effective size—The effective size is a function of the new and modified code. As discussed in B.1,
new code is 100% effective while modified code is less effective and reused and COTS code is the
least effective. Effective, in this sense, means “subject to newly introduced defects.” Typically,
individual engineers do not have the authority to decrease the functionality of a system (which in
turn decreases the code size, which in turn decreases the total defects, which in turn increase
MTBF). However, the SRE practitioner can assess trade-offs and improvement scenarios for
reducing the effective size. One of the most effective ways to reduce EKSLOC is to “code a little
and test a little” instead of writing all of the code in one massive release. Refer to the example in
F.3.3. This example contrasts what happens if all of the code is written in one release versus spread
across three sequential releases. The difference in the resulting MTBF for the three small releases is
substantially better than one big release despite the fact that the total code written is the same and
the deadline for finishing the code is the same.
b) Reliability growth—This is operational time (in an operational environment) with no new feature
drops. As discussed earlier in this document, reliability growth is not unbounded for software. Once
there is a feature drop, the reliability figures of merit reset as a function of the increased feature
size. If one predicts the defect profile and the defect pileup for multiple releases as per the
document, one can determine if the releases are too close together. That is the first step in planning
for adequate reliability growth at the beginning of the project while there is still time to do so.
c) Inherent product risks discussed in 5.1.3—Research shows that the possibility of a failed project
increases with the number of these risks that are present on a particular software release.
Identifying those risks up front as early as possible is the best way to avoid a failed project. Several
of the risks cannot be avoided; however, it is often possible to spread the risks across different
releases. For example, an organization made the decision to design new hardware components, new
software environment, a new graphical user interface (GUI), and new mathematical models all in
one release. This many risks in one release is historically linked to a failed project. Instead of
having four major risks in one release, they can split the risks up into a few smaller releases.
Identifying the inherent risks before they cause a failed project is one purpose of the sensitivity
analysis. Identifying the risks in 5.1.3 does not necessarily mean that the project will be successful,
but avoiding a failure is the first step towards improvement.
d) Defect density reduction—According to the research conducted as per B.2.1, the difference
between the best observed defect density and the worst is a factor of approximately 1000. This
range makes defect density a key improvement area. Subclause 6.2.1.1 has the very short and quick
models for reliability engineers. The 22 parameters of the shortcut model are parameters that a
reliability engineer or someone in acquisitions usually has access to. Those 22 parameters do
provide some limited sensitivity analysis mainly because they contain several of the risks in 5.1.3.
Several of the models in B.2 have very detailed sensitivity analysis capabilities. The key parameters
of those detailed models are summarized in B.3. Subclause B.3 shows which parameters appear in
the most number of SR assessment models from B.2. The table in B.3 is a starting point for anyone
to understand what development characteristics have already been correlated to fewer defects. The
detailed models in B.2 provide the trade-off scenario capabilities for minimizing the defect density.
Table 29 shows the four key sensitive SRE prediction parameters. With regards to the predicted MTBF, it
illustrates the mathematical relationship, which is also shown in 5.3.2.3. The third column illustrates how
this parameter can be adjusted so as to improve the MTBF without reducing features or capability. One
easy way to improve MTBF is to reduce features or test for a long time. This table provides for
improvements that do not require a reduction in features or an infeasible amount of testing time. The final
column indicates how this parameter can be predicted inaccurately. If the parameters are assessed regularly
during development, the predictive models are more able to reflect the true development characteristics.
The relationship between each of these parameters such as size, reliability growth, product risks and defect
density, and reliability is shown in the formulas in 5.3.2.3 Steps 1 through 5. The sensitivity analysis at the
LRU level identifies which development practices have the potential to reduce the defects predicted to
occur in operation.
99
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The steps for conducting a sensitivity analysis at the software LRU level are shown in Figure 66.
100
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
101
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
3) Identify which gaps can be addressed and which are inherent risks. The gaps that are inherent
risks cannot be changed and are removed from the sensitivity analysis.
4) Starting from the list of gaps that are resolvable, change one gap at a time to be affirmative
and review the result of the revised or “what if” prediction.
5) Note the relative decrease in the defect density of resolving that particular gap.
6) When all gaps have been analyzed rank each gap in order of biggest to smallest decrease in
defect density.
7) Review the gaps at the top of the list and determine any prerequisites that should be in place
to resolve that gap. Prerequisites can include other development practices, hiring personnel,
buying tools, training, etc. Remove items from the list that have prerequisites that cannot be
addressed feasibly on this project. Examples of gaps that reduce the predicted defect density
include but are not limited to:
Maximizing the end user domain experience of the software development staff
Having incomplete requirements
Monitoring progress against schedule
Reviewing the requirements, design, code and test plan
Using appropriate tools in an appropriate manner
Proper planning
Proper change management and defect tracking
Efficient techniques for developing requirements, design, code, and test cases
Avoiding obsolete technologies
Use of incremental development methods
g) Review all of the scenarios from steps c) to f). Estimate the relative cost of each scenario on the
list.
h) Rank the results of step g) based from highest effectiveness and lowest cost to lowest effectiveness
and highest cost.
i) Discuss the results of step h) with appropriate software management. Ideally the reliability engineer
is a key member of the software engineering team and this is specified in the SRPP.
5.3.7.2 Perform sensitivity analysis of the software LRUs effect on system reliability
Once the software has been analyzed from the software LRU perspective the next step is to analyze it from
a system perspective. The practitioner should review how the software fits on the system RBD. The
software LRUs should be designed to align cohesively with the subsystem hardware that they support.
Assume that the system is an automobile. One would expect to find GPS software, security software, rear
camera software, software to control a convertible or retractable top, etc. What one would not expect to
find is that all of this software is packaged in one large executable. If that is the case then it is possible that
a failure in the GPS software for example could affect the entire automobile instead of just the GPS.
Figure 67 shows one large software LRU supporting multiple hardware subsystems (on left) compared to
subsystems that each have their own associated software (right).
102
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Subsystem X software
Subsystem X Subsystem X
LRU
One large Subsystem Y software
Subsystem Y Subsystem Y
LRU
Software LRU
Subsystem Z software
Subsystem Z Subsystem Z
LRU
Figure 67 —One large software configuration item supporting multiple hardware components
versus one software configuration item for each hardware component
The practitioner should also look for SR predictions that are unusually large compared to the value of the
software. For example, if the least critical software LRU has the lowest predicted MTBF, one should
consider whether it needs to be in the system at all or possibly if that component can be purchased
commercially. Table 30 summarizes the sensitivity of the software LRUs with respect to the system
reliability.
The procedure for analyzing sensitivity at the system level is in Figure 68.
a) Review the overall RBD from step 5.3.4, which includes both the hardware and software LRUs.
b) Identify any software LRUs that are supporting more than one hardware configuration item.
c) Determine a “what if” scenario if any offending LRUs from step b) were to be redesigned to be
cohesive. Cohesive means that the software LRUs are performing multiple unrelated features or
functions or are supporting multiple unrelated hardware. Remember that this will result in a
redesign that could affect both schedule and cost.
d) Rank each of the software LRUs in order of importance.
e) Rank each of the software LRUs in order of decreasing effective size.
f) Identify any components that are relatively large in effective size but relatively less important.
g) Identify any vendor supplied LRUs that are relatively large in predicted effective size and predicted
defect density. These LRUs may require an alternative vendor.
h) Determine “what if” scenarios by assuming that the offending software LRUs from step f) are
either removed or replaced with commercially available LRUs. Recompute the predictions for the
current and future releases using the assumption and compare to the original prediction.
i) Review all of the scenarios from the preceding steps. Estimate the relative cost of each scenario on
the list.
103
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
j) Rank the result of step i) based from highest effectiveness and lowest cost to lowest effectiveness
and highest cost
k) Discuss the results of step j), and in particular, the scenario with the highest effectiveness and
lowest cost, with appropriate software management.
This is a typical SRE task. The primary responsibility for this task is the reliability engineer. In 5.3.4 the
overall SR objective is predicted. The software organization(s) can collectively work towards that overall
SR objective. However, if there are many software LRUs and/or if the there are multiple software
organizations and/or if each software LRU is a dramatically different size then it is recommended to
allocate the required SR objective down to the individual software LRUs.
The allocation of system reliability involves solving the basic inequality as follows:
f(R1 , R2 , ... , Rn ) ≥R*
i is the allocation reliability parameter for the ith subsystem
R* is the system reliability requirement parameter
f is the functional relationship between subsystem and system reliability
For a simple series system in which the Rs represent probability of survival for t hours use Equation (9):
As discussed in the previous subclauses, there are several reliability figures of merit. Software reliability
parameters may be probabilities, or software failure rates, or MTBF. These figures of merit could become
SR requirements, specified in requirements documents. They could also be used for SR predictions, or
demonstrated SR, or for SR allocations, to name a few forms or uses.
The allocated reliability for a simple subsystem of demonstrated high reliability should be greater than for a
complex subsystem whose observed reliability has been historically low. The allocation process is
approximate. The reliability parameters allocated to the subsystems are used as guidelines to determine
design feasibility. If the allocated reliability for a specific subsystem cannot be achieved with the current
plan, then the system design is to be modified and the reliability allocations redistributed. This procedure is
repeated until an allocation is achieved that satisfies the system level requirement, within all constraints,
and results in subsystems that can be designed within the state of the art. In the event that it is found that,
even with reallocation, some of the individual subsystem requirements cannot be met within the current
state of the art, the engineer can use one or more of the strategies outlined in 5.3.7.
The allocation process can, in turn, be performed at each of the lower levels of the system, subsystem
hierarchy, for hardware and electronic equipment and components and software modules. SR allocations
are derived and decomposed from requirements from the top level of the hierarchy down to the lowest level
where software specifications are written. These allocations are apportioned to the blocks of the model. The
software model is updated with SR predictions based on engineering analysis of the design to initially
validate the model. The SR predictions are then updated with actual test and field SR data to further
validate the model. Where software requirements do not include reliability, a reliability analysis and
allocation report may be prepared that shows how SR estimates and goals are set for the elements of the
software design hierarchy.
Functional complexity, effective software size, and hardware counts are some considerations in a typical
allocation process. In some cases, the process is iterative, requiring several attempts to satisfy allocating all
104
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
requirements. The requirements may require alternative design modifications to meet the reliability. In
other cases, when requirements cannot be satisfied since components are needed with unattainable levels of
reliability, trade-off discussions with the customer may be required. The hardware and software LRU
allocations can be derived from the top-level specification or the top-level specification can be defined by
the achievable failure rates of each of the system components. The former is a top-down approach while the
latter is a bottom-up approach as shown in Figure 69.
Component 1 Component 1
Reliability Reliability
Requirement Prediction
0.9999 0.9900
• •
• •
• •
Component n Component n
Reliability Reliability
Requirement Prediction
0.9789 0.9999
Alternatively, SR allocations may not begin with SR requirements, so that the requirements at the system
level could be derived from the SR allocations. This type of SR allocation analysis is a bottom-up SR
allocation. Figure 70 is the checklist for allocating the required SR to the software LRUs.
Figure 70 —Checklist for allocating the required software reliability to the software LRUs
Once the system specification is defined there are two top-down alternatives to allocate that requirement to
the software LRUs. The first approach is to allocate from the system specification to the subsystems and
then directly to each component in the system without regard for whether it is hardware or software. There
is a tendency in industry to allocate the system specification to the hardware and software separately,
largely because the hardware and software engineering groups are often in separate engineering groups.
However, for systems that are composed of several subsystems containing both hardware and software it
may be preferred to allocate directly from the system to each of the subsystems and then to each
component. In this way the software does not have an allocation, but rather each of the software LRUs have
an allocation. This approach is shown in Figure 71.
105
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
.
Figure 72 —Allocation from system to HW or SW and then to HW or SW LRU
For either of the preceding allocation methods, the allocations at the software LRU level may be developed
from the effective size of each software LRU such that the software LRUs with the most new code are
weighted the heaviest. There are also other software design metrics, besides effective KSLOC, that are
valuable for SR analysis to be used for allocations. Other metrics such as the expected duty cycle or the
size of each software LRU can be used to determine the allocation. See F.3.4 for an example of each.
106
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The bottom-up allocation process involves predicting the reliability figures of merit for each software and
hardware LRU in the system. These predictions are then combined via a system level RBD to yield a
system level reliability figure of merit. The bottom-up allocation allows for each component to be allocated
a relatively accurate portion of the system specification. In the event that the system level prediction is not
acceptable for the mission or customer, each component of the system is provided with a reliability goal
that is proportional to its contribution to the system. The bottom-up allocation can also be used to derive the
system level specification is none exists.
5.3.9 Employ software reliability metrics for transition to software system testing
This is a typical SRE task. The primary responsibility for this task is the software management and
software organization. The software quality assurance and test personnel are also involved in measuring the
traceability between test cases and requirements.
The reliability predictions discussed in 5.3.2 through 5.3.8 are established for the purposes of long-term
planning of reliability growth, maintenance staffing, and resources. There are other metrics that are useful
for determining when to transition to software system level testing. Table 31 summarizes the metrics that
can be used to transition to the testing activity. The following metrics can be used regardless of whether
there is a waterfall, incremental, or evolutionary LCM. Employing the following metrics reduces the
number of blocked tests that are encountered by software system testers. Blocked tests generally waste both
calendar and work hours.
Table 31 —Metrics that can be used during requirements, design, and construction
Software metrics Definition Typical goal prior to transition to testing
Requirements traceability Degree to which the All requirements for this increment can be traced
requirements have been met by to the requirements, the architecture and the test
the architecture, code, and test cases. See 5.3.9.1.
cases
Structural coverage Degree to which the lines of See 5.3.9.2.
code, paths, and data have been
tested
Traceability is defined as the degree to which the requirements have been met by the architecture, code, and
tests (IEEE Std 610™-1990 [B31]). The requirements traceability (RT) metrics aid in identifying
requirements that are either missing from, or in addition to, the original requirements. RT is defined as
shown in Equation (10):
where
During implementation, the software engineers should be testing their own code prior to integration with
other software, software testing, system integration, and systems testing. Structural coverage is the degree
that the design and code that has been covered via structural or clear box testing. Structural or clear box
107
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
testing is done from the software engineer’s point of view. That means that the tester needs to have full
view of the code in order to do the testing. Structural coverage testing is usually performed with the help of
automated tools that identify the test cases and calculate the coverage results when the tests are run.
Software engineers perform a certain amount of testing prior to software testing and system testing. Their
testing can be more effective if they know how much control and data they are covering with their tests.
The following coverage metrics assist in demonstrating that the code is tested adequately prior to testing
from a system or end-user point of view. The defects found during structural testing can be difficult to
instigate or discover during operational systems testing. The defects found during structural testing are
often far less expensive to fix during development than in later phases of testing.
The structure of the software can be tested by control flow coverage and boundary value coverage. There
are five approaches to control flow testing that range in effort and time required to test. As discussed
following, statement coverage is the bare minimum needed to cover the control flow while multiple
condition decision coverage (MCDC) (Hayhurst et al. [B27]) covers every combination of the control flow
and conditions. Flight controls systems, for example, are required to have MCDC coverage for FAA
certification (RTCA [B73]). Boundary value coverage covers the range of inputs such that there is a
minimal set of inputs that cover the possible data spaces. All of the following tests can be identified and
executed manually, however, on even medium-sized software projects, an automated tool will be a
necessity for achieving the coverage in a reasonable amount of time. Table 32 illustrates the types of code
coverage, when they are used, and the output of the coverage metric.
108
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Ex:
if (A and B) or C then
……
Else
…..
Possible test cases:
A=True, B = True, C = False
A = False, B = False, C = True
Note that the preceding covers the conditions but not the paths since the else condition is never executed.
Using the previous example some possible test cases for condition decision coverage are shown as follows.
This is also the minimum number of test cases to test both the conditions and the decisions.
Using the preceding example there are eight required test cases to test every combination of decisions and
conditions.
A B C
True True True
True True False
True False True
False True True
True False False
False True False
False False True
False False False
109
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Which type of coverage metric is appropriate? That depends on the criticality of the code, and any
contractual or regulatory requirements as well as time. As can be seen in the preceding example,
decision/condition coverage requires two test cases for each branch in logic. Therefore it does not require
more testing than the statement coverage from a test execution standpoint. The additional effort is in
identifying the tests. However, if one has an automated tool to identify the test cases, then the labor of
employing decision coverate testing over path testing is similar. MCDC does require more test cases than
the others. If the software is safety critical and/or regulated MCDC may be necessary. Figure 73 is the
checklist for measuring code coverage.
a) Decide which type of control flow testing is feasible as well as the desired percentage coverage.
b) Decide whether to implement the boundary value coverage tests.
c) Verify that the automated tools needed for the control flow testing and boundary value coverage
testing are available. Code-based adequacy criteria require appropriate instrumentation of the
application under test. These tools insert recording probes into the code, which in turn provide the
basis for coverage analyzers to assess which and how many entities of the flow graph have been
exercised and provide a measure against the selected test coverage criterion, such as statement,
branch, or decision coverage.
d) Verify that all software engineers know how to use the tool.
e) Verify that there are clear procedures in place to verify that all code is tested as per the decisions
made in steps a) and b).
f) Measure the test coverage.
g) Make a decision about transitioning to the next phase of testing based on the coverage results.
h) This task may be revisited in 5.4.3.
i) Once code coverage is measured in this subclause and black box coverage is measured in 5.4.1,
make a decision concerning the coverage in 5.5.
This subclause discusses how to apply SR during software system level testing and beyond. It is not the
scope of this document to discuss how to do software and systems testing. The inputs and outputs for each
of the SRE task employed during software and system testing are shown in Figure 74.
Developing a reliability test suite that is based on the OP is first. Test coverage is continually measured
throughout testing based on the white box test results, black box test results, and if applicable, the SFMEA
report from 5.2.2. The test effectiveness can be increased via fault insertion, which relies on the failure
modes defined in 5.2.2.
110
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
During system level testing, faults and failures are recorded as well as the amount of usage time
experienced by the software. This data is used by all of the SR growth models. The best SR model is the
one that fits the failure rate trend observed as per 5.4.4.
Other metrics are also applied during testing that support the results of the reliability growth models. The
results of the predictions from 5.3.2.3 and the reliability growth models from 5.4.5 are verified for accuracy
prior to making a release decision.
The decision to release the software is based on the results of the metrics, the reliability objective from step
5.3.1, the results of step 5.4.7 and the measured test coverage from step 5.4.3.
111
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Table 33 —SRE Tasks performed during software and systems testing phase
SRE testing Applicability for incremental
Purpose/benefits
activities development
5.4.1 Develop a Develops a test suite that can increase reliability. Not affected by development
reliability test suite cycle or LCM.
5.4.2 Increase test Increased test coverage, particularly on the ability for Not affected by the LCM.
effectiveness via the software to identify and recover from failures.
software fault
insertion
5.4.3 Measure test Measures the percentage of the code and the Test coverage is accumulated
coverage requirements have been tested. from increment to increment.
5.4.4 Collect fault This is a required prerequisite for all other SRE testing Can be collected at each
and failure data activities. increment as well as totaled for
all increments.
5.4.5 Select Estimate current and future failure rate, MTBF, See 5.4.5.6 for more
reliability growth reliability, and availability that can be merged into the information.
models system reliability model in order to determine whether
a reliability objective is met.
5.4.6 Apply SR In order to determine whether the software should be Can be performed during each
metrics released, additional metrics over and above the increment and at the final
reliability estimations from step 5.4.5 should be used. increment.
5.4.7 Determine the The models employed during development and test
accuracy of the should be validated against actual failure, fault and
predictive and defect rates to allow for the models to be updated as
reliability growth required based on actual results.
models
5.4.8 Revisit the In 5.2.1 the defect RCA is performed on historical data
defect RCA from similar systems. This is updated with the root
causes found during testing so as to drive the focus
during testing and improve future analyses.
This is a typical but highly recommended SRE task. The individuals planning the reliability test suite
include software quality and testing, software management, and reliability engineering. The individuals
who will perform the tests themselves include the software engineers and software quality and test
personnel as well as the reliability engineers.
The reliability test suite differs slightly from the traditional test suite in that the goal is to measure the
reliability growth in an operational profile. In theory, SR testing is relatively straightforward. In practice,
the concept is far from trivial. To implement SR testing effectively the software test team will need to have
the inputs summarized for the five types of software system level tests. There are three types of black box
tests: operational profile, requirements based, and model based. There are two types of stress case tests:
timing and performance, and failure modes testing. It is assumed that the clear box testing is performed by
the developers as per 5.3.9.2. The inputs to the reliability test suite are shown in Table 34.
112
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The checklist for developing a reliability test suite is shown in Figure 75.
a) Develop an OP as per 5.1.1.3 and proceed to 5.1.1.4 to identify how to incorporate that OP into the
reliability test suite.
b) Identify all of the software requirements and proceed to 5.4.1.2 to identify how to incorporate the
requirements into the reliability test suite.
c) Identify or construct a finite state machine for the software. Proceed to 5.4.1.3 to identify how to
include model based testing into the reliability test suite.
d) Locate timing diagrams, performance requirements. Proceed to 5.4.1.4 to identify how to
incorporate the timing and performance into the reliability test suite.
e) Locate the specific software failure modes that resulted from 5.2.2. Proceed to 5.4.1.5 to identify
how to incorporate the failure modes into the reliability test suite. Then proceed to 5.4.2 to identify
how to increase the effectiveness of this test via fault insertion.
OP testing should be in conjunction with the requirements testing and the model-based testing. Instructions
are provided in 5.1.1.3 for characterizing the OP. In this, the practitioner develops a test suite that uses that
OP. Recall that the profile is defined by the customer types, user types, and system modes and software
functions. The software test suite should be constructed so that the particular software functions are
exercised the most in the system mode used by most of the users at most of the customers. This may seem
obvious but far too often the software test suite applies equal attention to every function regardless of its
113
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
probability of being used. See F.4.1 for an example. By using the OP, the developers and testers can be
prepared to develop and test as similarly as possible to the actual end users.
Requirements testing is when every software requirement is explicitly tested. Since software requirements
tend to be at a high level, this type of testing is typically combined with other tests such as OP testing. The
goal is to not only cover the OP but to make sure that every written requirement is also verified.
Model-based testing uses test models derived from requirements, specifications, and expectations. Test
models use many different representations; state machines are supported in most methods and tools. The
entire procedure is illustrated with a concrete example in the F.4.2. The steps for developing a model-based
reliability test suite are summarized in Figure 76.
114
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The inputs to the timing and performance testing are timing diagrams, scheduling diagrams, performance
requirements, and the results of 5.1.1.4. Timing diagrams visualize how data items change over time, and in
particular the concurrency, overlaps, and sequencing.
The purpose of testing for timing considerations is to identify the following faults (Binder [B5]):
Deadlock: A stalemate that occurs when two elements in a process are each waiting for the other to
respond.
Livelock: Similar to a deadlock except that states of the processes involved in the live lock
constantly change with regard to one another, none progressing.
Race conditions: The result of an operation is dependent on the sequence or timing of other
uncontrollable events. Specifically it is when two or more threads are attempting to access the same
data at the same time. This can happen when there are incorrect priorities for the tasks or processes
or resources are not locked.
Scheduling diagrams visualize the performance of the software. The purpose of testing for performance is
to identify faults related to any of the following (Laplante [B50], Mars [B52], Reinder [B75]):
CPU utilization: Percentage of time the CPU is executing does not exceed a predefined threshold
such as 70%.
Throughput: The number of processes completed per time unit is as required.
Turnaround time: Interval from time of process submission to completion time meets the
performance requirements.
Waiting time: Sum of periods spend waiting is not excessive.
Response time: Time from submission of request until time first response is produced is not
excessive.
Fairness: Each thread received equal CPU time and hence no threads starve.
It is not within the scope of this document to discuss the details of timing and performance testing.
However, it is important that the practitioner verify that timing and performance testing is being performed.
Failure modes testing and timing and performance testing are the only two tests that cover the stress points
of the software. While the timing and performance tests cover the stress in terms of loading, throughput,
and timing, the failure modes tests covers stress testing that is of a functional nature. The requirements and
OP define what the software is expected to do. But they rarely define what the software should not or
cannot do. The SFMEA focuses on the failure space. The result of the SFMEA is a list of mitigations that
usually require a change to the code, and sometimes require a change to the design and the software
requirements. It is commonly and incorrectly believed that the SFMEA yields results that would have been
found during testing. The reason why this reasoning is usually faulty is that “could have” and “would have”
are two very different things. While the mitigations that result from the SFMEA usually “could have” been
found in testing, they generally are not because the testing usually focuses on the success space and not the
failure space.
The checklist in 5.1.1.4 represents a summary of the things that should be included in a failure modes
testing suite. For each of the following failure events, the software can detect, isolate, and apply the
applicable “R”—recovery, reduced operations, reset, partial reset, restart, reload, or repair.
115
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Failures that should be identified by the software include but are not limited to hardware failures,
inadvertent operations, out-of-sequence commands, synchronization and interface failures, valid but wrong
inputs, invalid but accepted inputs, environment effects, etc. See the example in F.4.3.
This is a project specific task because it does typically require specialized tools. The software failure
modes identified in 5.2 are the inputs to this task. The software organization and management have the
lead responsibility for this task.
Software fault injection (FI) is the process of systematically perturbing the program’s execution
according to a fault model and observing the effects. The intent is to determine the error resiliency of the
software program under faulty conditions. The user needs to apply a compiler that compiles programs in
high-level languages such as C and C++ to a low-level IR (intermediate representation). The IR
preserves type information from the source level, but at the same time, represents the detailed control
and data flow of the program.
The faults inserted represent that of off-nominal conditions from a real, operational environment, which are
introduced randomly. Fault insertion differs from fault seeding in which the source code is modified so as
to be defective for the purposes of estimating the total number of defects in the software. Fault insertion is
when an image of the software is manipulated during runtime so that specific types of faults can be
triggered. Fault seeding is not recommended under any circumstances while fault insertion is.
Since fault insertion typically needs to be performed repetitively, it is desirable to avoid going through
the compile cycle every time a new fault is to be injected. Therefore, the ideal tool should insert special
functions into the IR code of the application, and defer the actual injection to runtime.
The FI infrastructure should provide both fault injection and tracing of the execution after injection.
This will allow the user to inject as many faulty conditions as one wants without recompiling the
application. Further, it is possible to defer the decision of the specific kind of faulty condition to inject
at runtime, depending on the runtime state of the application.
One of the main challenges in FI is to trace the propagation of the faulty condition in the program, and
to map it back to the source code. This is essential for understanding the weaknesses in the program’s
fault handling mechanisms and improving them. The FI infrastructure should allow users to trace the
propagation of the faulty condition after its injection as the program executes, and also to identify
specific kinds of targets or sinks in the program that should be compared with the fault-free run, to
determine whether they have been corrupted by the fault.
116
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The method of categorizing faults is an important aspect of fault tolerance and management. The potential
sources of defects are software requirements, architecture, and software code. The failure data regarding the
most common defects can be obtained from field or customer reports.
Defects can be classified via a number of taxonomies. For example, there are defects related to timing,
sequencing, data, error handling, and functionality. Software failure modes are derived from the software
defect classifications. For example, “faulty timing” is a software failure mode. A race condition is one of
the many root causes of faulty timing. Examples of failure modes and root causes can be found in 5.2.2.2.
Figure 77 is the fault insertion test process that starts from identification of various failure modes that the
software is susceptible to evaluation of test results.
a) Collect customer field failure data and brainstorm failure modes applicable to the software. The
analyses in 5.2.1 can be used for this step.
b) Develop a software taxonomy for this system.
c) Identify which failure modes are most critical and applicable for each function of code (some
functions may have some failure modes that are applicable while other functions may have other
applicable failure modes).
d) Plan how to insert the faulty condition that causes that failure mode.
e) Insert the faulty condition by instrumenting the system.
f) Capture how the system behaves when this faulty condition is inserted.
g) Evaluate the test results and make modifications to the requirements, design, or code as applicable.
This is a typical but highly recommended SRE task. The software quality assurance and test organization
and the software organization have the primary responsibility for this task since some of the test coverage is
measured during development test and some is measured during a system level test.
Test coverage is a function of both black box coverage and structural coverage. Black box coverage is an
indication of the coverage of the requirements, the model, and the OP. Statistical testing such as the RDT
are considered black box tests because they are conducted without knowledge of the detailed design or
code.
Structural (or “clear box”) testing provides assurance based on visibility into, and analysis of, the actual
detailed design code. The goal is usually to execute every line of code and/or decision in the code at least
once. For even more coverage the goal might be to execute not only every line of code and decision but to
also exercise a large spectrum of inputs for each line of code or decision. Clear box coverage is a
measurement of the extent to which certain software units are executed, typically during testing of an
instrumented implementation. For example, if every statement in a program is executed at least once, 100%
statement coverage is achieved. Many coverage models have been proposed, used, and evaluated.
Does high code coverage equate to high reliability? Can high reliability be achieved without code
coverage? Can code coverage be used to improve the accuracy of reliability estimates? In summary, the
answers are “no,” “no,” and “maybe.”
The relationship between code coverage and SR has been studied since the early 1990s. (See Comparative
Study [B1].) Motivated by reliability estimates that were often overly optimistic, several studies evaluated
the addition of code coverage measurements to reliability growth models. Not surprisingly, increased
117
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
coverage is correlated with increased reliability as well as reduced estimate error. These studies
conclusively established that reliability estimates based on low-coverage test runs often overestimate
operational reliability.
A few things may be said with certainty. A software defect cannot be revealed unless faulty code is
executed (“covered”) and the related data is in a state that triggers an incorrect effect. If this occurs during
testing, the observable defect that caused the failure may be corrected and/or influence a reliability
estimate. If a defect escapes the development and testing process and then produces a field failure, a
prerelease reliability estimate based on that testing will over-estimate operational reliability, other things
being equal.
Escapes may result even if faulty code is covered. Although it is assumed that executing faulty code during
test is necessary to reveal a defect in that code, it is not sufficient. Coverage (of any kind) cannot be
sufficient because the data state size related to a faulty code is typically astronomically large and is itself
the typically the result of complex environmental configurations and execution traces.
Observed reliability, both during and post test, is determined by a system’s usage (operational profile). If a
certain pattern of usage does not cause faulty code to be executed, it will not fail given that usage pattern.
In general, simply executing a statement, path, or any other software unit is never sufficient to trigger a
failure. One important consideration is that testing longer does not necessarily increase test coverage
because it is possible to test the same inputs over and over again. With lower code coverage, the risk
increases that an undetected defect will escape and be triggered in field operation, resulting in worse than
estimated reliability.
In addition to these limitations, code coverage is not a guarantee of adequate testing or failure-free
operation since:
118
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Identify the black box and structural coverage goals based on the criticality of the software,
availability of tools, and budget.
b) Software engineers should have already executed code coverage tests and metrics as per 5.3.9.2 and
5.4.1.
c) If the desired or required code coverage was not achieved during development test, it may need to
be measured during integration and systems testing. There are automated testing tools that measure
code coverage as a background task while the software is running.
d) It is essential that whenever changes are made to the code after the coverage has been measured
that the changed modules be retested. Otherwise, the coverage of the changed modules is
essentially zero since the change invalidates the previous test results as well as coverage measures.
e) Whenever a test case fails the developers should identify and correct the defect and then retested to
confirm 1) the defect had been properly corrected, and 2) no new defects were introduced (through
regression testing). In terms of incomplete coverage, there are also coverage gaps in terms of lack
of understanding of how the system will behave under conditions that have not been tested.
f) Record which gaps were remediated and why, which ones were not and why, the rationale for the
remediated and non-remediated gaps, and whether the risk level was accepted or mitigated.
g) Go to 5.5 and make a decision concerning the adequacy of the code coverage.
This is an essential task because the SR growth models require fault and failure data as inputs. The software
quality assurance and test person typically has the lead role in collecting the fault and failure data.
However, the software engineers themselves are contributing to the fault and failure database.
During the testing activity, SR growth models are used to determine the current and forecasted SR figures
of merit. These models require two inputs: the date of the failure and the test or usage hours expended each
day. The date of the failure can be derived from the failure report.
Failure reports—During testing problem reports are generated when anomalies occur in testing. Generally,
when multiple failures occur because of the same defect it is common practice to generate a failure report
that is associated with the unique underlying defect associated with the failure. In order to understand the
relationship between faults and failures, one should either generate a failure report for each instance of the
failure, or record each instance of the failure in the report.
Before software system testing begins determine where the database of software problem reports exists
within the organization. If there are multiple organizations, it is entirely possible that there is more than one
database. Once the database(s) are located, filter the data to ignore the following reports:
119
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Include a flag in the problem report that indicates that the problem is related to reliability.
Include a counter in each problem report to be incremented every time that same defect causes a
failure.
The software problem report typically has several items of information on it. These are the items that are
needed for the models.
Once the data is filtered, organize it. In 5.4.5.5 two types of data formats were discussed—inter-failure data
and time to failures. In the first case, it is not known exactly when the failure occurred, only that it occurred
during some time period. So, if all of the failure reports are logged at the end of the day, one would know
how many failures were detected per day but not when those failures occurred during the day. In the second
case, if the testing was to be automated such that the actual number of test hours between each failure event
is captured that would be time-to-failure data.
Time—Since software does not fail due to wear out, its failure rate is not necessarily a function of calendar
time. It is important to measure the actual time that the software is executing to provide for data that is
normalized properly. Failure to normalize can dramatically reduce the accuracy of the models, particularly
if the software is not executing the same amount every day. For example, if one day there are 20 people
testing 8 h a day and the next day no one is testing that is an example of when execution time is much more
accurate than calendar time. Ideally, time for SR measurements is measured in CPU time. However, it is
often not possible to measure or collect CPU time during testing and particularly during operation. In that
case, the execution time can be approximated by estimating the number of operational test hours for every
individual who is using the software for its intended purpose (Musa et al. [B59]). Figure 79 summarizes the
steps to collecting failure data.
Once the data is collected, certain primitives should be computed from that data. These primitives are
useful for selecting the SR growth models and are also needed as inputs to the SR growth models. The
checklist for determining data primitives is shown in Figure 80.
a) Locate the software defect and failure reporting databases in the organization/project.
b) Export the software reports that are classified as defects (do not include any new feature
recommendations or hardware reports).
c) For each report identify the date and severity of the failure.
d) Identify which unit of measure is available—CPU time, usage hours, etc.
e) Organize the data—If the time of the failure is known then one can compute the time between
failures or failures in a period of time.
1) If the time to failure models are being used one will need at least one of the following:
The number of CPU hours or operational hours since the last failure. Note that walk clock
hours is usually not a normalized measure unless the testing effort per day remains constant.
The number of runs or test cases executed since the last failure.
Test labor hours since the last failure.
120
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
2) If the inter-failure data models are being used, one will need at least one of the following for
each day of testing and for every computer used for testing:
The number of CPU hours or operational hours per time interval such as a day or week.
The number of labor hours spent on testing this software during each interval.
f) Compute the failure data primitives as discussed in the next subclause.
Figure 79 —Checklist for collecting fault and failure data during testing
Example #1:
The data set in Table 35 is an example of when it is known how many failures have occurred during some
period of time, but the time between the failures is not known. In the following example, the failures
observed per day of testing are recorded. The number of faults and test hours expended during each day is
recorded as shown following. On the first day of testing April 28 one fault was observed in 8 h of testing.
On May 3, the next fault was found. There had been 16 h of testing since April 28. On May 5, the third,
fourth, and fifth faults were observed, and there had been 16 h of testing since the second fault was
observed, and so on. The n column is simply the sum of all f’s up to that point in time. The t column is
simply the sum of all x’s up to that point in time. In the last column, n/t is computed. Note that the
granularity of the data is in terms of days. If a larger interval such as weeks is used, the data will have 20%
of the granularity. The more granular the more accuracy the estimates that will result from the methods
discussed in 5.4.5.
121
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The cumulative fault rate (n/t) is plotted against the cumulative faults (n) as shown in Figure 81.
122
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
As seen from the Figure 81 and the table data, the cumulative fault rate is increasing until June 6. From
June 6 to the current point in time at August 30 the trend is decreasing. This information will come in
handy when selecting the reliability growth models.
Alternatively, one can determine if the fault rate is increasing or decreasing by plotting the non-cumulative
faults versus the operational time as shown in Figure 82. One can see from the following graph that the
observed faults peaked on May 26 and started to decrease after that time.
Example #2:
Figure 83 is a plot from a real software system. Initially, the trend appeared logarithmic. Then the fault rate
increased for a period of time. Then the trend started to decrease and appeared to be linear. The overall
trend, however, is a decreasing fault rate. Any of the models discussed in 5.4.5 are at least applicable with
the following data.
123
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Example #3
Figure 84 is another real software fault data set that shows the fault rate is steadily increasing. If the testing
phase has just begun this is not unusual. However, if the testing phase is nearly complete this trend could be
an indication that the software is not stable. Most of the reliability growth models will not produce a result
in the following case. Figure 84 is also an example of what should not be expected at the point in time in
which the software is about to deploy. In 5.4.5 notice that there are only two models that can be used for
the following data set. On the other hand, all of the models are applicable for the example in Figure 83.
This is an essential task. Typically the software quality assurance and test engineer or the reliability
engineer will select the best reliability growth (RG) models.
For several decades it has been observed that the distribution of software faults over the life of a particular
development version typically resembles a bell shaped (or Rayleigh curve) as shown in Figure 85. (See
Putnam [B71].)
a) Faults are steadily increasing. This is very typical for the early part of testing.
b) Faults are peaking (statistically this happens when about 39% of total faults are observed, see
Putnam [B71]). There could be one peak or several peaks if there is incremental development.
c) Faults are steadily decreasing. If there a no new features and the software is tested and defects that
cause faults are removed, eventually the fault occurrences will decrease.
d) Faults are happening relatively infrequently until no new observations.
124
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
12
Peaking
Decreasing
8
discovered
6
Stabilizing
4
Increasing
2
0
Normalized usage period
125
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Figure 86 illustrates the steps for identifying the models that are applicable. The instructions for using each
of the models can be found in either 6.3 or Annex C. The model(s) are selected firstly based on its
assumptions concerning failure rate. Once the models that apply to the current failure rate trend are
identified, the model assumptions concerning defect count, defect detection rate per defect, and effort
required to use the model are analyzed next. The process for selecting the appropriate models is as much a
process of elimination as a process for selection in that the models that cannot be used on a particular data
set are eliminated first. The remaining models are used and the model that is trending the best against the
data is identified in 5.4.7.
Annex C and 6.3 discuss the steps for using the preceding SR models. The models are presented with the
models that require the least amount of effort to use first and the models that require the most effort last.
Examples are provided for the models in 6.3.
a) Review the fault data primitives as per 5.4.4. Recall that the primitives show whether the rate of
faults is increasing or decreasing.
b) Review 5.4.5.1. If the rate of faults is steadily increasing then remove all models that do not have
this assumption. If the rate of faults is increasing and then decreasing, then either model the most
recent data or employ a model that can handle both trends. Otherwise if the trend is decreasing
most of the models are applicable.
c) If the fault trend is decreasing, review 5.4.5.2. If the fault rate trend is nonlinear (i.e., logarithmic)
that is an indication that the defects are not equally probable to result in faults. In that case,
eliminate models with that assumption.
d) Review 5.4.5.3. If there is reason to believe that the inherent defect content is changing over time
then eliminate models with this assumption. The inherent defect count might increase if the
software engineers are injecting new defects while correcting the discovered defects. Refer to 5.4.6
concerning the corrective action effectiveness.
e) Review 5.4.5.4. Eliminate models that require more effort or automation than is feasible for this
project.
f) Review 5.4.5.5. Eliminate any models that require knowledge of the time of each failure if this
information is not available (most of the time, this is the case).
126
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
g) If more than one model is feasible, use the model that requires the least effort or use all applicable
models.
h) It is a good idea to be prepared to use more than one model. During testing the trend and
assumptions can change. The cost of using the models is largely in collecting the data. If the
models are automated it is generally not more expensive to use more than one model.
i) The model that is currently trending the best can be identified as per 5.4.7.
j) The models that are the easiest to use are described in 6.3, while the others that require automations
are described in Annex C. Use the model(s) identified in the preceding steps and proceed to either
6.3 or Annex C as appropriate.
k) If the software is being developed incrementally, refer to 5.4.5.6.
The fault rate during testing can be any of the following. Generally, the models will assume at least one of
the following failure rates:
a) Increasing
b) Peaking
c) Decreasing
d) First increasing and then decreasing
During the early part of testing it is very common for the fault rate to increase. Once the software stabilizes
it may then begin to decrease. If new features are added to the code during testing, it may/will increase
again. Prior to selecting any SR growth model, the practitioner should graph the number of fault
occurrences over time as per 5.4.4. If the most recent trend is not decreasing it is too early to use most of
the SR growth models. If the most recent trend is decreasing, all of the models may be applicable.
If the trend is initially increasing and then decreasing, the practitioner has two options, as follows:
a) The simplest option is to remove the portion of the data that is initially increasing and use the
appropriate SR growth model(s) on the most recent data.
b) The models that apply to the increasing and then decreasing models such as the S-shaped models or
the Weibull model can be considered.
This is applicable if the fault rate is decreasing. Review the fault rate trend from 5.4.4. If it appears to be
linear then consider the models that assume a linearly decreasing fault rate. The linearly decreasing fault
rate is typical when the software is operational. During testing the fault rate may be nonlinear if some
defects are resulting in faults before others. For example, the more obvious defects may be discovered early
in testing and the less obvious later in testing. If it is not obvious whether the trend is linear or nonlinear
then consider both the types of models.
Several of the models predict reliability by first predicting how many defects exist in the software.
127
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Finite and fixed—This means that the total number of defects that exist in the software that are both
known and unknown is countable and does not increase over time. This means that defects will not
be generated when other defects are corrected.
b) Finite and not fixed—This is similar to Item a) except that it is possible that defects are introduced
while correcting other defects.
c) Infinite—These models assume that the inherent defect count is not measurable. These models will
instead measure the rate of defect detection.
The practitioner should research how many corrective actions have introduced new defects in the software.
If this percentage is relatively high then the models that assume a finite and fixed defect count may not be
suitable.
In 5.4.4 the examples illustrated fault count data. It was known what day the fault occurred but it was not
known exactly when the fault occurred within that day. This is called failure count data, and it is generally
not difficult to collect as long as the analyst knows how many people are testing each day or week and the
date of each software failure discovery is recorded.
Some of the models require a data format that is more granular than the failure count data. Data formats
that are relatively more difficult to come by include the following:
Failure time—The time that each software failure occurred is recorded. This data requires that the software
test harness is intelligent enough to detect a software failure without human intervention and record the
time of that failure.
Inter-failure time—The time in between each software failure is recorded. This can be derived from the
failure time data.
Failure time data assumes that testing commences at some initial time t0 = 0 and proceeds until n failures
have been observed at time tn. Thus, the time of the ith failure is ti. This is one data formats used by failure
counting models. Example of failure times in test hours follows:
T = <3, 33, 146, 227, 342, 351, 353, 444, 556, 571, 709, 759>
There are n = 12 failures. Here t1 = 3, t2 = 33, tn = 759. An alternative representation of failure time data is
inter-failure time data. As the name suggests, inter-failure time data represents the time between the (i–1)th
and ith failure. Example: The same data shown previously is expressed as inter-failure time data as follows:
X = <3, 30, 113, 81, 115, 9, 2, 91, 112, 15, 138, 50>
128
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Example: The same data is shown as failure count data. There are 10 people testing 8 h per day. So, the
intervals are in 80 h, as follows:
During the first 80 h, 2 failures occurred. During the next 80 h, 1 failure occurred, etc.
While failure times and inter-failure times can be converted from one format to the other, to disaggregate
failure count data to obtain the failure times or inter-failure times it may be necessary to assume that all of
the failures during an hour were spaced equally. Thus, an organization wishing to apply a model requiring
inter-failure or failure time data should collect this data directly or make the simplifying assumption of
equally spaced defect discovery.
5.4.5.6 Apply software reliability growth models with an incremental or evolutionary life
cycle model
If the software is being developed in more than one increment, that will impact how the reliability growth
models are employed. Table 37 illustrates the considerations pertaining to the incremental or evolutionary
models and guidance on how to apply the reliability growth models.
The steps for applying the SR growth models when there is an incremental or evolutionary LCM are
summarized in Figure 87.
Table 37 —Applying reliability growth models for incremental or evolutionary life cycle
Considerations with incremental or
How to apply the reliability growth models
evolutionary development
If there is an incremental LCM, what do the increments Regardless of whether the requirements are evolving
consist of? Will the requirements be evolving in each of in each increment, do a separate estimation for each
the increments or will the requirements be defined up front increment and then overlay the estimations for each
and the design and code and test activities evolve over increment.
several increments?
How many internal releases will be made prior to the final The user will need to use the models on each internal
release? increment as well as the final release.
How many external releases will be made prior to the final Each external release is subject to its own reliability
release? growth estimation.
Figure 87 —Steps to apply SRG models for incremental or evolutionary life cycle
129
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
This task is either typical or essential depending on the metrics. See Table 38 for the primary role and
priority of each metric.
The SR growth models presented in 5.4.5, 6.3, and Annex C forecast one or more SR figures of merit based
on known failure/usage time data. As with any statistical model, the model can only forecast what is
observed. The accuracy of the SR models depends on both the black and white box coverage during testing.
If the OP is not being exercised and/or if the requirements of the software are not being verified, that
may/will cause the SR models to be optimistic since the models cannot measure what they do not know.
The practitioner should use the test coverage and requirements coverage in conjunction with the SR models
to verify that the assumptions of the models are being met.
The accuracy of the SR models presented in 5.4.5 also depends on whether and to what extent the defect
corrections introduce new defects. In most cases the models assume that the defects are removed without
introducing new ones. For legacy software systems, the practitioner should measure the corrective action
effectiveness. If the SR models indicate for example, an estimated number of defects of 200 and the
corrective action effectiveness is 10%, then the practitioner should assume that the estimate of 200 is
optimistic.
The practitioner may want to compare the actual testing defect density to the predicted defect density. If the
actual testing defect density is below or above the expected bounds for the testing defect density that may
be an indication that the software has many more defects than estimated.
Additionally the SR models may indicate that a particular reliability objective has been achieved, however,
it is possible that if the software is shipped when the requirement is met, there will be an excess of
backlogged defects. The organization may decide to continue testing beyond the point of meeting the
reliability requirement to reduce defect pileup in the next software release. If the incoming defect rate is
bigger than the fix rate that is a sign of impending defect pileup regardless of whether the required
reliability objective has been met.
Hence, the SR models should be augmented with additional supporting metrics that combined can aid the
practitioner in determining when the software is ready for the next phase of testing or ready for
deployment. The metrics in Table 38 can be used during testing regardless of whether there is a waterfall,
incremental, or evolutionary LCM. The metrics fall into the four basic categories previously discussed: 1)
coverage of both the requirements and the software structure, 2) code stability, 3) process stability, and 4)
ability to correct defects in a timely fashion. (See IEEE Std 730™-2014 [B32], Smidts [B84], [B85].)
130
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
where
RT is the value of the measure requirements traceability,
R1 is the number of requirements met by the completed test cases
R2 is the number of original requirements
131
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
During the requirements, design and coding phases the defect density can be predicted as per 5.3.2.3.
During testing, the actual testing defect density can be measured directly and can be used to validate the
accuracy of the predicted defect density as per 5.4.7. Defect density as measured during testing is given as
shown in Equation (12):
∑
1
Di
i =1
DD = (12)
KSLOC
where
Di is the number of unique defects detected during the ith day of testing
KSLOC is the number of source lines of code (LOC) in thousands
The checklist for applying SR metrics during testing is shown in Figure 88.
a) Review the requirements traceability and test coverage during testing. Both should be approaching
100%. If it is not the SR growth models will likely not be accurate.
b) Review the corrective action effectiveness. If the percentage of corrective actions to the software
that are not effective exceeds 5%, an RCA should be conducted to determine the cause.
c) If this metric is selected, review the actual testing defect density. Compare it to the predicted testing
defect density as per 5.3.2.3 and 6.2.1. If the testing defect density is unusually high it may be an
indication of either better test coverage than expected or a bigger product than expected. Investigate
both before assuming one or the other.
132
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
d) Review the number of backlogged defects. There should not be any high-priority backlogged
defects when transitioning to the next phase of testing or deployment regardless of whether the
reliability objective has been met.
e) If these metrics are selected, review the incoming defect rate against the defect fix rate. The rate of
defect fixing should be on par with the rate of incoming defects. This is another indicator of defect
pileup and impending schedule slippage.
f) If these metrics are selected, review the defect days’ number and the number of backlogged defects.
If this number for either is very high it could be an indication of defect pileup and impending
schedule slippage.
This is an essential task. The reliability engineer has the primary responsibility for 5.4.7.1 and software
quality assurance and test engineer has the primary responsibility for 5.4.7.2. The purpose of this task is to
adjust the allocation of resources for additional reliability growth as necessary.
SR models can be used prior to testing (prediction models from 5.3.2.3) as well as during testing (reliability
growth models from 5.4.5). These models should be validated with actual failure data during testing and at
the end of testing to verify the accuracy of the selected models. The accuracy can be verified during any
major milestone during testing and particularly prior to making any release decisions as per 5.5. The
purpose of this task is to adjust the allocation of resources for additional reliability growth as necessary.
Several models can be applied early in development to predict SR. However, those models usually predict
reliability at the point in time of deployment. As a result, the accuracy of the early prediction models
cannot be validated until the software has been fielded for some period of time. Nevertheless, there are
methods to validate prediction models during testing that establish a relationship between the volume of
deployed defects and the volume of testing defects. This presents a method to validate any prediction model
during testing. Table 39 illustrates some typical ratios between testing and fielded defect density. 9
Distressed projects had different ratios from testing to fielded defects largely because of their approach to
testing.
The steps for determining the accuracy of predicted defect density are shown in Figure 89.
9
Copyright Ann Marie Neufelder, Softrel, LLC. Reprinted with permission of author.
133
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Determine the percentage of code that is currently complete and has been tested so far for this
release.
b) Multiply the result of a by the total normalized effective KSLOC (EKSLOC) predicted as per
B.1. This is the total normalized EKSLOC developed so far.
c) Collect how many unique defects have been reported so far on this code.
d) If availability of hardware or other resources is blocking test cases, divide the result of c) by the
percentage of blocked test cases.
e) Compute the result of step d) over the result of step b). This is the current testing defect density.
f) Predict the fielded defect density using one of the methods in 6.2.
g) Review Table 39. Identify the range of testing defect densities for this project.
h) Compare the result of f) to the result of g). If the actual testing defect density falls within the
upper and lower bounds then the prediction is validated.
Once it has been established that the data and estimations are as accurate as possible, one of the simplest
ways to determine the accuracy of any SR growth model is to compare the estimation of the MTBF at any
point in time to the actual next time to failure. For example, if the model is estimating that the current
MTBF is 20 h, one can validate the estimate when the next failure occurs. Figure 91 contains the
procedures for determining the relative accuracy of the SR growth models.
134
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Review the fault and usage time data in 5.4.4. Is the usage time per time period normalized? If not,
the results are very unlikely to be accurate. Proceed back to 5.4.4, normalize the time data and
recompute all results. Otherwise proceed to step b).
NOTE—Example of usage time that is not normalized includes calendar time or test cases run.
b) Review the granularity of the data. Is the granularity in terms of faults per hour or day? If so, the
data is relatively granular. If the data is in terms of faults per week or month, the data is not
granular. Investigate whether the faults detected during each normalized usage hour or day during
testing is available. If the answer is no, proceed to step c) but understand that the models may not
be accurate with low granularity, particularly if there are fewer than 30 intervals of testing.
c) Review the sheer volume of data. If there are less than 30 observed faults or less than 30 time
intervals, the estimations will have fairly wide confidence bands. The practitioner should compute
the confidence bounds as per the instructions for the models. These bands are wider when there are
fewer data points.
d) Revisit the beginning of 5.4.5. Verify the actual observed fault trend to date and determine whether
it is increasing, peaking, decreasing or stabilizing. Verify that the models selected are
recommended for that particular fault trend. If the model is not recommended for the actual fault
trend or it is not known what the fault trend is, stop and revisit 5.4.5. Otherwise proceed to step e).
e) Review the corrective action effectiveness. If this is more than 5% the results of the models could
be affected. Adjust the model results by the corresponding percentage and proceed to step f).
f) Review the requirements coverage and structural coverage. If it is less than 100% the model results
may be optimistic. Compute the actual coverage as per 5.4.3 and adjust the estimates accordingly.
So, if there is 70% coverage, assume that the estimated total inherent defects are 30% higher than
estimated. Proceed to the next checklist in Figure 91.
Figure 90 —Pre-checklist for determining the accuracy of the reliability growth models
a) Retrospectively compute the relative error of each of the models. Compare the estimated results
with the actual results. So, if the model estimated that the failure rate would be a particular number
in 100 h of operation then that estimate should be compared against the actual failure rate once
100 h of operation has passed. Relative error = (actual – estimated)/estimated.
b) Rank the models based on the lowest relative error. Note that the model with the closest fit may
have changed during testing as the data trend changes.
c) The relative error can be tracked on a regular basis to establish which of the models (if any) are
providing the most accurate trend. The model that is currently trending the most accurately should
be used in making any deployment decisions.
Figure 91 —Checklist for determining the accuracy of the reliability growth models
It is important that a model be continuously rechecked for accuracy, even after selection and application, to
verify that the fit to the observed failure history is still satisfactory. In the event that a degraded model fit is
experienced, alternate candidate models should be evaluated. See F.4.5.2 for an example.
During testing the defect RCA discussed in 5.2.1 should be revisited now that there is testing defect and
failure data. The reliability growth models discussed in 5.4.5 estimate the volume of faults but do not
135
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
predict the types of defects that are needed to identify the development activity that is associated with the
most software defects. The defect RCA identifies how these defects could have been prevented or
identified earlier. For example, if the most common software defect is related to faulty logic the software
engineering organization should consider incorporated checklists for typical logic related defects or
possibly automated tools that aid in the logic design.
A release decision is made prior to any external software release. All of these tasks that support the release
decision are applicable for incremental or evolutionary development because this task is performed prior to
the customer release. Table 40 lists the considerations for making a release decision.
Most of the inputs are from SRE tasks performed during development and test. The inputs to the release
decision include as a minimum the test coverage from 5.4.3, the relative accuracy of the predictions and
estimations as well as the estimates themselves, the forecasted remaining defects and effort required to
correct them, forecasted defect pileup, and the release stability as a function of the required SR objective. If
the SRE plan includes these activities then the SFMEA report from step 5.2.2, the accept/reject decision
from the Reliability Demonstration Test (RDT) as per 5.5.4, and the forecasted additional test duration
from 5.5.2. Figure 92 illustrates the activities that support the release decision, whether the activities are
essential, typical, or project specific, and the inputs and outputs of each activity.
136
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
At release time some decisions need to be made. The major decision is whether or when to release the
software into operation or into the next phase of the system qualification. Releasing the software too early
can result in defect pileup and/or missed reliability objectives and can also cause the next software release
to be late if software engineering staff is interrupted by field reports. The modeling, analysis, and testing
practices of SRE are intended to produce information that supports acceptance, the process of making a
release decision. The most accurate prediction of operational reliability and lowest risk of defect escapes
therefore requires achieving adequate code coverage, defects discovered are approximately equal to the
predicted number of defects, and test suite is executed, which stresses the system. These criteria are based
on the following observations. The criteria for acceptance are shown in Table 41.
Without minimal code coverage, there is no evidence about the reliability of uncovered code.
Without testing that stresses a system under test (SUT) with extreme scenarios interleaved with
representative usage, there is no evidence of robustness.
Without comparing actual revealed defects to a latent defect estimate, the risk of releasing a
product too early increases.
Without testing long enough to discover the predicted number of defects, it is likely that latent
defects exist, even if all code has been covered.
Without testing that achieves a representative sample of field usage, one cannot have confidence
that reliability observed in test is a meaningful indicator of operational reliability.
137
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Figure 93 shows the recommended workflow to develop and use acceptance parameters. Prior to
acceptance, the OP coverage, code coverage, and stress point coverage should have already been measured.
The reliability growth models can be used to determine whether the required reliability has been met
(release stability), as well as estimate the remaining defects, defect pileup.
138
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
This is an essential task that is typically performed by the software quality assurance and test engineer. The
results of this task are supplied to the reliability engineers.
During software development, the reliability of software will typically improve as the development process
progresses, that is, as capabilities or features are successfully developed and defects are removed from the
product. One measure of that overall software product quality is release stability. Release stability is the
degree to which the fully developed, released, and installed executing software is free of critical defects,
faults that when encountered will cause the software or computer system in which the software is installed
to stop execution, freeze critical interfaces until some action is taken, automatically restart, require an
operator initiated reset to restore expected operation, or otherwise prevent access to or execution of a
required critical capability of the system. The stability measure is a MTBF metric, a customer-oriented
metric.
An SRGM, either based on historical performance or making a current estimate and projecting the current
project stability measure toward the targeted release date, is a typical measurement approach for assessing
release stability during software development. Simply counting and accumulating critical fault counts
during the development process, the software developer may model the arrival rate of these defects and
project forward the likely defect counts based on the SRGM. Periodic assessments of the model fit to actual
data may be necessary. Alternatively, for example, critical faults may be counted during an RDT and a
forecast of stability calculated using time between failure measurements as per 5.4.5. Through this software
data and metric study or review, the developer can estimate when the software reaches the required
reliability to release for further test or release for customer use. Oftentimes however, software release
decisions are made based on a number of factors (e.g., cost, schedule, quality); stability is only one of those
factors. Armed with this information and other in-process software development metrics as needed, the
software team has the methods and tools to assess software stability during development to make software
release decisions (or alternatively, to continue the test, analyze, and fix process).
The date at which a given reliability goal can be achieved is obtainable from the SR modeling process
illustrated in Figure 94. As achievement of the reliability target approaches, the adherence of the actual data
to the model predictions should be reviewed and the model corrected, if necessary. Figure 95 discusses
stability of the software for release.
139
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
NOTE—Reprinted with permission from Lockheed Martin Corporation article entitled “Determine Release Stability”
© Lockheed Martin Corporation. All rights reserved.
a) Examine the estimated current failure from 5.4.5, 6.3, and Annex C.
b) Compare it to the required failure rate from 5.3.1.
c) Has it been met? If not, use the procedures in 5.5.2 to estimate when it will be met.
This is a typical task that is typically performed by the software quality assurance and test engineer. The
results of this task are supplied to the reliability engineers.
Additional test duration should be predicted if the initial and objective failure rates and the parameters of
the model are known. By the end of testing the software should be stable enough to use an exponential
model for forecasting additional test duration. See Equation (13).
where
Once the ∆t is computed, it should be divided by the number of work hours per day or week to determine
how many more days or weeks of testing are required to meet the objective.
This is an essential task that is typically performed by the software quality assurance and test engineer. The
results of this task are supplied to software management.
140
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Additional defects required to be fixed to meet the objective can be solved for by using Equation (14):
where
Once the estimated defects required to meet the reliability objective is calculated, it can then be used to
estimate the number of engineers that need to be staffed to detect and correct them. For example, if a
typical defect requires 4 h of testing labor to detect and verify the corrective action and requires 4 h of
development labor to isolate and correct, then the number of people needed to test and perform corrective
action can be determined.
The preceding indicates the risk of releasing this version when there are not sufficient people to correct the
defects. However, there is also the risk of releasing the software when there will be defects to correct in
future releases as well. The defect pileup should also be estimated as per the instructions in 5.4.6.
This is a project specific task that is typically performed by the reliability engineer in conjunction with an
existing hardware Reliability Demonstration Test.
A Reliability Demonstration Test (RDT) (MIL-HDBK 781A [B55]) uses statistics to determine whether a
system meets a specific reliability objective during the final testing. RDT has been used for hardware
demonstration for several decades. It is most useful for demonstrating the system reliability that includes
the software. MIL-HDBK 781A [B55] contains instructions for performing an RDT. The RDT can be
applied to software by simply tracking both the hardware and software failures against the system
reliability objective.
Software Reliability Engineering [B88] and The Handbook of Software Reliability Engineering [B89]
discuss RDT for assessing the reliability of software systems. Once all software development testing has
been accomplished (i.e., required capabilities developed, integrated, and verified, and defects corrected)
and the RDT system has been prepared, then RDT can begin. The method establishes an OP for the RDT,
select tests randomly, conduct the tests, and “do not fix” discovered defects during testing. As test time is
accumulated and defects are discovered, these are plotted on a chart similar to the following. Progress
during RDT is noted on the chart until either an “accept” or a “reject” status is achieved. See Figure 96.
There are additional methods for conducting an RDT and obtaining a useable MTBF result. MIL-HNDK-
781A [B55] discusses a variety of methods and associated mathematics. One in particular includes a time-
terminated acceptance test where RDT is conducted until a particular amount of test time has been
accumulated. Based on the cumulated test time and number of faults discovered, an estimate of the MTBF
is computed. Note that repairing the discovered faults after the RDT does not guarantee a higher quality
software product without performing another RDT since the software release is a new (different) product
due to the change in the code.
141
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
NOTE—Reprinted with permission from Lockheed Martin Corporation article entitled “Perform a Reliability
Demonstration Test” © Lockheed Martin Corporation. All rights reserved.
a) Perform all developer, software, and system tests. The RDT should not be conducted until all tests
are run and all defects that impact reliability are removed.
b) Select the alpha (consumers risk of accepting software that does not meet the object) and beta
(producers risk of rejecting software that is acceptable) for the test and identify the objective in
terms of failure rate.
c) Start the test by using the software in an operational environment. If there is an RDT planned for
the hardware, the software is simply part of that test in that the failures due to hardware and
software are tracked.
d) Whenever a failure is discovered it is plotted regardless of whether it is due to the software or
hardware.
e) When the trend reaches the “reject” range the test ends and the objective is determined to not be
met with beta confidence
f) When the trend reaches the “accept” range the test ends and the objective is determined to be met
with beta confidence.
g) Otherwise if the trend is in the “continue testing” range then the test continues
h) If a failure due to software is encountered the underlying defect have to be fixed to continue
testing; the test also ends with a reject status.
During operation the SRE tasks shown in Figure 98 can and should be executed. The inputs and outputs for
each task described in the figure are evaluated in support of the release decision. None of the tasks in this
subclause are affected by the LCM since these activities take place once the software is deployed.
142
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
This is a typical task that is typically performed by the software quality assurance and test engineer. The
results of this task are supplied to the reliability engineers and software management.
Table 44 lists the metrics that can be used in operation regardless of whether there is a waterfall,
incremental, or evolutionary LCM. These metrics are categorized by the following:
Indication of accuracy in the size, defect density, and reliability growth assumptions
Ability to support the fielded software without impacting the future releases
Indication of software release success
Ability to compare multiple similar products
143
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Recall that software failure rate is a function of the software’s duty cycle and the size of the software. For
that reason, software failures rates are difficult to compare across different types of systems. Therefore, the
defect density metric is used since it is a normalized metric. However, if one would like to compare
systems that have a similar duty cycle and size, the SWDPMH metric (Smidts et al. [B85]) can be used as
long as the systems under comparison are relatively the same size (in lines of code) and operate relatively
the same amount with similar install base sizes.
This metric is useful for systems that will be mass deployed as a million hours of usage is required. If the
system is not mass deployed the unit of measure can be simply converted to hours instead of millions of
hours. The metric shows the rate at which software defects are occurring per cumulative million hours of
user system / product usage. Each defect experienced by the user regardless of whether its severity is
counted in calculating the SWDPMH. There will be a lag between the time software is released and the
start of calculating SWDPMH. This time is required to build an acceptable install base. The size of the
install base of any software release could be different, which creates a challenge for organizations to
compare products against each other; this is the reason why SWDPMH is normalized. As long as the
products that are being compared belong to the same family, SWDPMH can be used to determine the
superiority of one product over others.
One key benefit of SWDPMH is the ability to measure the reliability of the software by various released
versions. It is important to identify best practices that helped a version be more successful than the previous
versions. The calculation methodology is straightforward and utilizes number of the defects reported by the
144
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
customers, which includes all severity (1, 2, 3, etc.) and number of units installed. There are approximately
730 h in a month. So assuming that the software is running continually [see Equation (15)]:
“Correction factor” = adjustment for percentage of customers who do not have the associated version. A
correction factor of 1.02 implies 2% of customer found defects did not have an associated version.
a) Compute the actual defect density and KSLOC for the software in operation as per Table 44. Verify
that the actual defect is within range of the predicted defect densities from 5.3.2.3 Step 1. If not,
analyze the predicted versus actual size, defect density, or reliability growth to determine which of
these is not within the predicted range. It is possible that the actual defect density is different from the
predicted defect density because the KSLOC is much bigger than expected or the reliability growth is
much smaller than expected, for example.
b) Compute the immediate customer found defects as per Table 44. If the customer is finding a
substantial number of severity 1 and 2 defects in the first 6 months of operation, the test coverage
metrics in 5.4.3 and 5.3.9.2 should be revisited.
c) Compute the metrics related to field support. If any of these metrics indicate that the field issues are
not getting fixed as fast as they are occurring then the maintenance schedule should be revisited. This
can cause defect pileup for future releases.
d) If there are multiple customers, compute the software adoption rate. If the rate is high, that is usually
an indication of a successful release.
e) Compute the SWDPMH across several fielded software products. Compare similar products to
determine typical rates that can then be used in the prediction and estimation models for the next
release.
This is a typical task that is typically performed by the reliability engineer. During operation, the reliability
of the software can be measured by collecting the actual failure reports from end users and customers. The
failure data should be filtered and organized similarly to testing data as discussed in 5.4.4. The actual
operational failure rate is simply the failures recorded during some interval of time divided by the actual
operating hours during that time interval. The reliability and availability can also be computed from the
operational data similarly to how it is computed during testing.
Following a software product release for customer use, the customer or user of the software product may
track observed critical faults during operational use and may make software stability performance
measurements. This software performance information could be useful in sharing with the development
organization [and other customer(s)] providing useful validation feedback on the stability measure and the
software product and/or as input to a follow-on product upgrade or development.
The operational reliability figures of merit are straightforward to compute. The important part is monitoring
these metrics and archiving them in such a way that the actual field SR can be used to calibrate the
prediction models used during development as well as the estimative models used during testing. During
operation the actual number of defects discovered should be compared against the predicted and estimated.
If there are significant differences, the root causes for inaccuracy should be investigated as per Table 45.
145
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Pertains to reliability
prediction models
growth models
Pertains to
Potential root
Effect on
cause for Rationale Recommendation
accuracy
inaccuracy
The software is Yes. No. Linear The more code there is, the Use the difference in size to
much bigger more defects there will be. establish confidence bounds
than predicted for future predictions.
Insufficient test Yes Yes Nonlinear SR growth models can Measure test coverage as per
coverage underestimate defects if the 5.4.3.
test coverage during testing is
relatively low.
Optimistic Yes Yes Nonlinear If the software is late or if See 5.3.2.3 Steps 2 and 3
assumptions new features are introduced concerning defect pileup that
about reliability before the SR has grown this can result from overestimating
growth can affect the actual reliability growth.
reliability.
The model Yes No Linear It is not uncommon for The defect density models
assumes planned practices to be should be kept up to date
better/worse abandoned during during development. The
development development. development practices should
practices than be closely monitored to verify
actually that the model is capturing the
employed true development capabilities.
The predictive Yes N/A Un- The models with more inputs Revisit 5.3.2.3, 6.2, and
model chosen predict- are usually more accurate than Annex B for selecting the best
does not have able the models with fewer models. Predict the
many inputs inputs—but only if the model confidence bounds when
is used properly and the inputs doing the predictions.
are correct.
The model was Yes Yes Un- Small errors in using the Revisit 5.3.2.3, 6.2, and
used incorrectly predict- models can result in big errors Annex B to verify that the
able in accuracy. predictive models are used
correctly. Revisit 5.4.4, 5.4.5,
6.3, Annex C to verify that the
reliability growth models are
used properly.
146
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Compute the actual effective size of the fielded software to the predicted effective size. Compute
the relative error. That relative error is directly related to any error in the reliability objective.
b) Compute the actual test coverage during testing. If it is less than 100% that could be why the
predicted reliability is optimistic.
c) Compare the actual reliability growth in terms of operational hours to the predicted reliability
growth in terms of hours. If the actual reliability growth is less than the predicted reliability growth
that will result in an optimistic prediction.
d) Compare the actual development practices that were employed on this software version to those
that were planned. If there are any differences that will usually cause the predictions to be
inaccurate either optimistically or pessimistically.
e) Review the prediction models and inputs. If the model has only a few inputs or those inputs are not
correct that could cause the predicted reliability to be different from the actual reliability.
f) If all of the preceding have been investigated and there are no discrepancies between the predicted
and actual assumptions then note the difference between the predicted and actual reliability and
proceed to 5.6.3.
This is a typical task that is typically performed by the software quality assurance and test engineer. The
results of this task are supplied to the reliability engineers. In addition to making continuous improvements,
the actual reliability is measured so that previous characterizations and analyses can be changed
appropriately. Figure 101 is the checklist for assessing changes to previous characterizations or analyses.
147
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
g) If it is clear that some of the tasks that were not included in the SRE plan should have been
included then update the plan as per step 5.1.6.
h) If there is a failure found in operation that is intermittent and likely caused by software or an
interaction between software and hardware, revisit the task to put software on the system FTA.
i) To improve the accuracy of future objectives and predictions, revisit 5.3.1. Determine system
reliability objective, 5.3.3. Sanity check the early predictions, 5.4.7. Validate the prediction and
estimation models and revisit 5.3.2.3 Step 2 to forecast defect pileup.
This is a typical task that is typically performed by the software quality assurance and test engineer. The
results of this task are supplied to the reliability engineers. The reliability prediction process is improved
over time by incorporating actual data with predictive data. Figure 102 is the checklist for archiving
operational data. For an example, see F.6.
a) Once there is at least 3 years of data from a particular release, archive the actual defects, failure
rate, defect profile, reliability and/or availability, and use these to predict the figures of merit for
the next operational release. Note that it is fairly typical for software releases to be spaced a few
months or a year apart. However, a particular version can still result in fielded defects well beyond
the next release because the end users have not yet uncovered all defects. Hence, 3 or more years of
data is to be collected for defects that originate in this release. The software engineer who corrects
the defect should know which version the defect originated in. Count all defects for at least 3 years
that originate with this particular release.
b) Keep in mind that the next operational release may be bigger or smaller or may have different
development practices or different people developing the software. Hence, even when actual field
data is available, it will still need to be calibrated. Several of the prediction models in 6.2 can be
used for such a calibration.
6.1 Overview
There are two basic types of software reliability (SR) models. SR prediction models are used early in
development for assessment and risk analysis. SR growth models are used during testing to forecast the
future failure rate or number of defects. Table 46 is a comparison of the prediction models versus the SR
growth models:
148
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The prediction models are discussed in 6.2 and Annex B and the reliability growth models are discussed in
6.3 and Annex C.
All of the following models predict defect density and can be used prior to testing because they use
empirical versus observed inputs. Refer to 5.3.2.3 and in particular Figure 41 and Table 19 for instructions
on the steps needed for prediction models prior to using them.
This model (Neufelder [B63]) assumes that the defect density is a function of the number of risks versus
strengths regarding this release of the software. The steps are shown in Figure 103. The survey is shown in
Table 47. 10
10
Shortcut Model Survey reprinted with permission Softrel, LLC. “A Practical Toolkit for Predicting Software Reliability,” A. M.
Neufelder, presented at ARS Symposium, June 14, 2006, Orlando, Florida © 2006.
149
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Answer yes or no to each of the questions in Table 47. For questions 5, 12, and 13 in the Strengths
and 4 and 5 in the Risks, an answer of “somewhat” is allowable.
b) Count the number of yes answers in the Strengths. Assign 1 point for each yes answer. Assign 0.5
point for each “somewhat” answer.
c) Count the number of yes answers in the Risks. Assign 1 point for each yes answer. Assign
0.5 point for each “somewhat” answer.
d) Subtract the result of step c) from the result of step b).
e) If the result of step d) ≥ 4.0, predicted defect density = 0.110, if result of step d) ≤ 0.5, predicted
defect density = 0.647, otherwise predicted defect density = 0.239 in terms of defects per
normalized EKSLOC.
f) The resulting defect density prediction is then multiplied by the normalized effective KSLOC
prediction.
This model assumes that software defect density is directly related to the application type or industry. The
checklist for using this model is shown in Figure 104.
150
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) If the size estimates are in terms of KSLOC, select the application type from Table 48, which is the
closest to the software under analysis.
b) Otherwise if the size estimates are in terms of function points, select the application type from
Table 49 that most closely fits.
c) Select the associated defect density from the table.
d) If parts of the software have a different application type then compute a weighted average of the
application types based on the size of the software
e) The resulting defect density prediction is then multiplied by the normalized effective KSLOC
prediction or the estimated number of function points.
Figure 104 —Predict defect density via industry/application type average lookup tables
Note that application type lookup tables have been developed in the past. Table 48 reflects the most current
data and technology. (See Lakey, Neufelder [B46], Neufelder [B68], SAIC [B77].
Example: Forty percent of the code is low-level software supporting a device. Sixty percent of the code is
performing a healthcare function. Predicted defect density = (0.4 × 0.338) + (0.6 × 0.508) = 0.44
defects/normalized EKSLOC. The confidence bounds on the device prediction is ±0.259 and the
confidence bounds on the medical function prediction is ±0.395. The weighted average is
0.1036 + 0.2340 = 0.3376. The upper and lower bounds for the combined prediction is therefore
0.44 defects/normalized EKSLOC ±0.3376 defects/ normalized EKSLOC, which yields a range of
0.1024 to 0.7776 defects per normalized EKSLOC. See B.1 for more information on normalized EKSLOC.
Table 49 shows average defect density by several different industries in terms of defects per function
point. 11 The first column is the average defects per function point at delivery while the last column is the
average defect per function point prior to delivery. The average defect removal efficiency is also shown.
Recall that this is the percentage of total defects that are removed prior to deployment. The typical removal
efficiencies are also shown. See Jones [B41].
151
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
11
Typical defect densities by application type in terms of defects per function points reprinted with permission of Capers Jones,
“Software Industry Blindfolds: Invalid Metrics and Inaccurate Metrics,” Namcook Analytics © 2015.
152
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
153
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
This model (Neufelder [B68]) assumes that software defect density is directly related to the SEI CMMI
level. The checklist for using this model is shown in Figure 105.
a) Select the CMMI level from Table 50 that is the closest to the software organization associated
with the software LRU under analysis.
b) Select the associated defect density from the table.
c) If parts of the software are developed by different organizations then compute a weighted average
of the CMMI levels based on the relative size of the software
d) Compute the upper and lower bounds by adding/subtracting the associated value in the ± column
e) The resulting defect density prediction is then multiplied by the normalized effective KSLOC
prediction.
Figure 105 —Predict defect density via capability maturity lookup tables
Note that several models based on the CMMI have been developed over the years (Keene [B43]). However,
Table 50 has data associated with modern development and systems. Example: 40% of the code is being
developed by an organization that has been assessed at CMMI level 2. Sixty percent of the code is being
developed by an organization that has been assessed at CMMI level 3. Fielded defect density prediction =
(0.4 × 0.182) + (0.6 × 0.101) = 0.1334 defects/KSLOC. The upper and lower bounds are (0.4 × 0.086) +
(0.6 × 0.081) = 0.083 defects/KSLOC. The upper bound is therefore = 0.1334 + 0.083 = 0.2164 and the
lower bound = 0.1334 – 0.083 = 0.0504 defects/KSLOC.
154
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
6.2.2 Models that can be used for planning the failure rate
The following exponential model is used to determine when the predicted defects will become observed
faults.
where
Notice that Equation (16) is an array of values starting from i = 1, which is the first month of operational
usage extending to i = n, which is the last month of growth. This is because fault rate is assumed to be
trending downward because 1) either they will be circumvented and therefore not repeated again, or
2) they will be corrected in a maintenance release or patch prior to the next major release.
The growth rate and growth period vary as a function of each other. The bigger the growth rate, the faster
the SR grows or stabilizes. So, the bigger the growth rate, the shorter the growth period and vice versa. For
experimental systems in which the hardware is often not available until deployment, the growth rate of the
software may be very high. For systems that have staggered deployment over a very long period of time,
the growth rate might be relatively flat. See Figure 106 for an illustration of the growth periods and growth
rates.
155
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
156
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The AMSAA PM2 utilizes planning parameters that are directly influenced by program management,
which include the following:
a) Mi , the planned initial system MTBF.
b) MS, the management strategy, which is the fraction of the initial failure rate addressable via
λB
corrective action; MS is defined as . In the definition of MS, λB and λ A represent the
λB + λ A
portion of the initial system failure intensity that program management will and will not address via
corrective action, respectively. The initial failure intensity λ= i λ A + λB and the failure modes
comprising each part of the initial failure intensity are referred to as A-modes and B-modes,
respectively. Note also that MS does not represent the fraction of corrected failure modes.
c) MG, the MTBF goal for the system to achieve at the conclusion of the reliability growth test.
d) µd, the planned average fix effectiveness factor (FEF) for corrective actions, which is defined as
the fractional reduction in the rate of occurrence for a failure mode after corrective action.
e) T, the duration of reliability growth testing.
f) The average lag time associated with corrective actions.
157
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
PM2 reliability growth planning curves primarily consist of two components—an idealized curve, and
MTBF targets for each test phase. The idealized curve may be interpreted as the expected system MTBF at
test time t that would be achieved if all corrective actions for B-modes surfaced by t were implemented
with the planned average FEF. The idealized curve extends from the initial MTBF, Mi, to the goal MTBF,
MG. The idealized curve is a monotonically increasing function whose rate of increase depends on the
levels of MS, µd, and the initial and goal MTBFs used to generate the curve.
The second component of the PM2 planning curve includes a sequence of MTBF steps. Since failure modes
are not found and corrected instantaneously during testing, PM2 uses a series of MTBF targets to represent
the actual (constant configuration) MTBF goals for the system during each test phase throughout the test
program. The rate of increase in the MTBF targets depends on the planning parameters used. The targets
are also conditioned explicitly on scheduled corrective action periods, which are defined as breaks in
testing during which corrective actions to observed B-modes can be implemented.
1) Initial failure rates for failure modes that will be addressed with corrective actions (B-modes)
constitute realizations of independent random samples from a Gamma distribution with
density [see Equation (17)].
λa λ
p (λ )
= a +1
exp(− ) (17)
ab β
2) This assumption models mode-to-mode variation with respect to the initial rates of occurrence
for the modes. As a rule of thumb, the potential number of failure modes in the system should
be at least five times the number of failure modes that are expected to be surfaced during the
planned test period.
3) The rate of occurrence for each failure mode is constant both before and after any corrective
action;
4) Each failure mode occurs independently and causes system failure.
5) Corrective actions do not create new failure modes.
Both the idealized curve and the MTBF targets are generated by the equation for the expected system
failure intensity at test time t given by Equation (18):
(1 − MS )
λA =
(1 − MS )λi = (19)
Mi
MS
λB MS
= = λi (20)
Mi
h(t) is the rate of occurrence of new B-modes and is given by Equation (21):
λB
h(t ) = (21)
1+ βt
158
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
where β is a scale parameter that arises from the scale parameter for the Gamma distribution. This
parameter solely determines the fraction of the initial failure intensity that is due to B-modes surfaced by
test time t, and can be represented in terms of the planning parameters by Equation (22):
M
1− i
1 M
β = ( ) G (22)
T Mi
MS µd − (1 − M
G
A number of additional metrics can also be calculated from the chosen planning parameters. These include
the expected number of correctable failure modes, the expected rate of occurrence of new failure modes,
the fraction of the initial failure intensity associated with the observed failure modes, and the growth
potential MTBF. These metrics provide useful information that can aid in a number of decisions involving
the reliability growth effort, such as determining if a reliability target is feasible or planning for sufficient
engineering staff to develop corrective actions for the failure modes that are expected to be observed in a
given test period.
Do not use any of the following models before executing the instructions in 5.4.4 and 5.4.5. The Shooman
Constant Defect Removal Model is used during integration, which is generally when the defect profile is
peaking as discussed in 5.4.4. The Musa Basic and Logarithmic models can be used when the defect profile
is decreasing.
This model can be used during integration or early software systems testing. This model was applied to
Space Shuttle software. The model successfully predicted prior to the flight the number of software faults
discovered during the mission (Shooman, Richeson [B80]).
The assumption of this model is that if the removal rate stays constant after sufficient debugging (and no
new defects are introduced), all defects that were originally in the program will be removed. Since it is not
possible to remove all defects with 100% confidence, the model is only useful early in the integration
process where there are still many defects and the defect removal process is mainly limited by the number
of testing personnel. Removal of all the defects also means that the MTBF becomes infinite—another
impossibility. Still, the mathematics are simple and the model is usually satisfactory for a few months of
testing. For the latest model based on Bohr and Mandel defects, see Shooman [B83].
159
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
N̂ 0 − ρ0 (23)
Where ρ0 is an observed constant defect removal rate and N0 is the estimated number of inherent defects.
Where k is a shape parameter to be estimated in the next subclause and τ is the number of hours, weeks, or
months of development and testing.
1
MTBF = (25)
ˆ ˆ
k ( N 0 − ρ0τ )
ˆ ˆ
R(t ) = e − k ( N0 − ρ0τ )t (26)
Remember t is operating time and τ is development time. Typically the defect removal the rate decreases
with τ. Two possibilities are a linearly decreasing defect removal rate and an exponentially decreasing
defect removal rate. Also, if fault discovery rate is proportional to the number of defects present, the defect
removal rate becomes exponential (Shooman [B82]).
The constant defect removal model has thee parameters: k, N0, and ρ0 .
The estimate of the constant defect removal rate, ρ0 , is simply the number of defects removed in the
interval divided by the length of the interval, τ. That leaves two remaining parameters, k, N0, which can be
evaluated from the simulation test data by equating the MTBF function to the test data. Compute the actual
MTBF1 in the first increment of time such as a week or month by dividing the total operational hours by the
total defects found during that time interval. Set the actual MTBF1 to the 1/k[N0 − ρ0 × 1]. Compute
MTBF2 for the second increment of time in the same way. There are now two equations for MTBF in
which the actual MTBF1, actual MTBF2 and ρ0 are known. Dividing one MTBF equation by the other
allows one to solve for N0. Substitution of this value into the first equation yields k.
The confidence of the model is based on how complete and granular the data is, see 5.4.4. The confidence
of the estimates also depends on how much data is available. As with any statistical model, the more data
that is available, the higher the confidence of the estimates. The confidence bounds can be computed by
estimating the confidence of the two parameters k and N0. With this model, one only needs to estimate the
confidence of the N0, which is the estimated inherent defects. If there is at least 30 data points, Z charts can
be used to compute the confidence of the Y intercept estimate.
Plot each estimate of N0 for each time interval in which a fault was observed. There should be as many
estimates of N0 as there are data points on the graph. For each estimate the following:
160
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Establish the desired confidence, which is (1–α). If 95% confidence is desired then set α to 5%.
Using normal charts determine Z
(1–α)/2
The estimates will have a higher confidence as the number of data points increases. So, the range on the
estimates used should become smaller during testing. It is recommended to use confidence intervals when
specifying or measuring reliability values.
6.3.1.5 Example
Fault data is collected during testing and the defect removal rate is calculated as a constant ρ0 = 90 faults
per month. Thus, one parameter has been evaluated. The other two parameters can be evaluated from the
test data by equating the MTBF function to two different intervals of the test data.
Dividing one equation by the other cancels k and allows one to solve for N0, yielding N̂ 0 = 330.
1
Substitution of this value into the first equation yields kˆ = 0.008333 and = 1200 .
k̂
The resulting functions are as follows. The following can be computed for an array of values τ.
= 1/ k̂ [ N̂ − ρ̂ τ] = 1200/[330 − 90τ]
MTBF 0 0
6.3.2.1 Assumptions
This model class contains one of the simplest and most popular models. It assumes the following:
161
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The estimates for failure rate, MTBF, and reliability are shown in Table 52.
Two of the models are “defect based.” which means the model results only change when another failure is
observed. One model is time based, which means that the model results change as a function of each testing
hour. If the software is failing fairly regularly the time-based and defect-based models will produce similar
results. However, if the software is not failing very often, or if a substantial amount of test hours have
passed since the last fault was observed, the time-based models will take that into account while the defect-
based models assume that the failure rate estimate is unchanged until the next fault occurs. Most of the
work involved in using the models is in collecting the data and estimating the parameters. If the same
parameter estimation technique (such as the following one) is used then the amount of work required to use
all three of the preceding models may not be significantly more than using only one model. It is a good idea
to have at least one defect-based model and one time-based model.
The estimated remaining defects = N̂ 0 – n where N̂ 0 is the estimated inherent defects and n is the
observed cumulative number of faults found so far. Each model estimates failure rate is a function of two
out of three of the following:
N̂ 0 —the estimated inherent defects and n is the observed cumulative number of faults observed so
far
k̂ —the rate of change of the fault rate (the estimated per defect fault rate)
The estimated MTBF for all three models is the inverse of the estimated failure rate. The estimated
–( λ(n) × mission time)
reliability = e where mission time is the expected mission time of the software for one
complete cycle.
The model parameters for all of the General Exponential Models can be easily estimated by employing the
fault rate graph discussed in 5.4.4. As shown in Figure 109, the estimated inherent defect N̂ 0 is the
estimated Y intercept of the graph. The estimated initial failure rate is the x intercept of that same graph.
MLE and LSE (Musa et al. [B59]) can also be used to estimate the parameters. The slope parameter k can
be estimated by the absolute value of the slope of the best straight line through the data points.
162
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Estimated N0
K=(abs(1/slope))
Actual observed
n initial failure rate λ0
θ = rate of decay
Estimated initial
failure rate λ0
n/t
The confidence of the model is based on how complete and granular the data is (see 5.4.4). The confidence
of the estimates also depends on how much data is available. As with any statistical model, the more data
that is available, the higher the confidence of the estimates. The confidence bounds can be computed by
estimating the confidence of the two parameters. Since with this model, the two parameters are proportional
to each other, one only needs to estimate the confidence of the Y intercept, which are the estimated inherent
defects. If there is at least 30 data points, Z charts can be used to compute the confidence of the Y intercept
estimate.
The confidence intervals of some estimate such as N0 are determined by the following:
Plot each estimate of N0 for each time interval in a fault was observed. There should be as many estimates
of N0 as there are data points on the graph.
Establish the desired confidence, which is (1–α). If 95% confidence is desired then set α to 5%.
Using normal charts determine Z
(1-α)/2
As shown in Figure 110, the estimates will have a higher confidence as the number of data points increases.
So, the range on the estimates used should become smaller during testing. Confidence intervals are
recommended when specifying or measuring reliability values.
163
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
n/t n/t
6.3.2.5 Example
This example in Figure 111 uses the fault and usage data illustrated in 5.4.4, Figure 83.
80 k = .137225/117.77
60 Y intercept = 117.77
40
20
0
-20 0 0.05 0.1 0.15 0.2
-40
-60
Fault Rate n/t
164
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Note that the estimated inherent defects were rounded up to provide a result that is an integer. The
percentage of estimated removal is therefore 84/118 = 71%.
The estimated failure rate and MTBF of each model is shown as follows. One can see that the two defect
based models yielded virtually the same result while the time based model was more optimistic. The
expected mission time for this software is 8 h. That mission time is used to compute the estimated current
reliability. The results are shown in Table 53.
This model is applicable when the testing is done according to an OP that has variations in frequency of
application functions and when early defect corrections have a greater effect on the failure rate than later
ones. Thus, the failure rate has a decreasing slope.
This model assumes that the total estimated defects are infinite. Hence, there is no estimated of remaining
defects. From the model assumptions, the failure rate and MTBF can be estimated as follows:
ˆ
λ̂ (n) = λˆ0 × e−(θ ×n ) is the estimated failure rate at the point in time in which n faults have been observed.
165
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Estimated MTBF = (1/ θˆ ) × ln(( λ̂0 × θˆ × t) + 1) where θˆ is the estimated rate of change of the faults
observed, t is the cumulative usage hours to date, and λ̂0 is the first observed fault rate.
The parameter λ0 is the initial failure rate parameter and θ is the failure rate decay parameter with θ > 0.
The model parameters can be determined graphically by using the defect rate plot in 5.4.4. Alternatively,
the MLE method (Musa et al. [B59]) can be used. Recall that the Musa Basic model uses the estimate
inherent defects N0 and the slope value k. The logarithmic model assumes that the inherent defects are
infinite. The parameters are the rate of change θ and the observed initial failure rate λ0. λ̂0 is the first failure
rate observed in the data while θˆ can be computed by plotting the natural logarithm of the fault rate versus
the cumulative faults observed. The inverse of the slope of the best straight line through the data points is
the rate of decay θˆ . Figure 112 compares the parameters estimation for this model compared to the
parameter estimation for the Basic Model:
Figure 112 —Parameters estimation for the Musa Basic and Logarithmic models
The confidence bounds for this model can be derived similarly to the confidence bounds for the Musa Basic
model. Instead of estimating the confidence of the inherent defects, one will estimate the confidence of the
failure rate decay parameter θ. Plot each estimate of θ for each time interval in which there a fault was
observed. There should be as many estimates of θ as there are data points on the graph. For each estimate:
Establish the desired confidence that is (1–α). If 95% confidence is desired, then set α to 5%.
Using normal charts determine Z
(1–α)/2
166
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The estimates will have a higher confidence as the number of data points increases. So, the range on the
estimates used should become smaller during testing. Confidence intervals should be used when specifying
or measuring reliability values.
6.3.3.5 Example
Using the same data as shown in 6.3.2.5, the plot of the natural log of the fault rate versus the cumulative
faults is shown in Figure 113:
100
80
Cumulative faults (n)
y = -80.907x - 157.86
60
40
20
0
-3.5 -3 -2.5 -2 -1.5 -1 -0.5 0
-20
-40
Ln(n/t)
λ̂ (84) = 0.125 × exp(–0.01236 × 84) = 0.04426 failures per hour at the point in time in which 84 faults
have been observed.
Estimated reliability = e–(0.04426 × 0.8) = 0.701818, which is the probability that the software will be successful
over an 8 h mission.
167
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Annex A
(informative)
A.1 Templates for preparing the software failure modes effects analysis (SFMEA) 12
12
All tables in Annex A reprinted with permission from Ann Marie Neufelder, Softrel, LLC “Effective Application of Software
Failure Modes Effects Analysis” © 2014 [B64].
168
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Serviceability
Vulnerability
Maintenance
Functional
Interface
Usability
Detailed
Artifacts
169
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
A.2 Templates for analyzing the failure modes and root causes
The following are the SFMEA tables for each of the seven product related viewpoints as referenced by
5.2.2.
170
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Max value
Min value
Network
Interface
measure
Default
Unit of
layers
Interface
value
Type
Size
para- Failure mode Root cause
pair
meter
Describe List each Retrieve this information from the Faulty communications Create a row for each
the from/to IDS for each interface. Faulty processing applicable root cause
direction of Faulty COTS interface related to this failure
interface This information is needed to analyze Faulty OS interface mode
pair the root causes. If the information is Faulty database interface
not available that, in itself, could be a Faulty timing
process-related failure mode. Faulty sequencing
Faulty error handling
Faulty data
171
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
172
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
173
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Compensating Provisions
Preventive Measures
Effect on subsystem
Corrective action
Effect on system
Revised RPN
Failure mode
Local effect
Description
Root cause
Likelihood
Severity
RPN
174
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Compensating Provisions
Preventive Measures
Effect on subsystem
Corrective action
Effect on system
Revised RPN
Failure mode
Local effect
Description
Root cause
Likelihood
Severity
RPN
175
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Annex B
(informative)
The instructions for predicting SR before the software is in a testable state are discussed in 5.3.2.3. The
following procedures support the procedures in 5.3.2.
If the defect density models discussed in 6.2.1, B.2.1, B.2.2, B.2.3, B.2.5, or B.2.6 are used, then the
effective size should be predicted to yield a prediction of total defects. The effective size is the amount of
software code that is subject to having undetected defects. If the software is being developed by a
contractor or subcontractor then this method is applicable. If the software LRU is either COTS or FOSS the
method in B.1.2 is applicable.
The checklist for predicting size for in-house developed LRUs is shown in Figure B.1.
a) Determine whether size will be measured in KSLOC or function points. Function points are a
preferred measure to KSLOC. See B.1.1.1 for more information.
b) Select a method to predict the size in either KSLOC or function points as per B.1.1.2.
c) If the size is predicted in KSLOC it should be normalized for both language type and effectiveness
as per B.1.1.3.
d) If the unit of measure is functions points then use the model in Table 49. If the unit of measure is
normalized EKSLOC then the models in Table 48 and B.1.1.1 can be used.
There are two methods of size estimation used in industry: function points and KSLOC. Function points are
a measure of software size that does not require normalization. So, a function point on one software
program is comparable to a function point on another program. Function points have an advantage over
KSLOC size predictions in that they do not require normalization by language type or effectiveness (Jones
[B39])
Function points—A unit of size measurement to express the amount of functionality an information
system (as a product) provides to a user (Cutting [B11]).
KSLOC is a unit of measure that is not implicitly normalized. So, in order for it to be useful, particularly
across different software programs and applications, it should be normalized by both the language type and
the effectiveness. A line of reused code (code that has not been modified but it reused from a previous
software project) will have a different exposure to defects than a line of code that is new and has not been
deployed or used operationally. Lines of code that are modified typically have a defect exposure that is less
than new code but more than reused code. Auto-generated code (code generated by a tool) typically has a
176
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
much lower defect exposure than code that is not. Hence, in order to have an accurate size prediction the
number of reused, modified, auto-generated, and new lines of code should be identified. Additionally a line
of code is C will perform a different amount of work than a line of code in C++ or C# or Java. Hence, in
order to compare KSLOC across different projects with different languages the language needs to be taken
into consideration.
EKSLOC—Effective KSLOC. This is a weighted average of new, modified, reused, and auto-generated
code.
Normalized EKSLOC—EKSLOC that has been normalized for the language(s) so that it can be directly
multiplied by the defect density predictions that are in terms of defects/normalized EKSLOC.
The practitioner will need to determine which unit of measure is being employed on each of the software
LRUs. The LRUs that have size estimates in terms of KSLOC will need to be normalized for both language
type and effectiveness while LRUs that have size estimates in terms of function points will need no
normalization. Function points are hence a preferred measure of size.
The defect density prediction models in this document are presented in either defects/function points or
defects/normalized EKSLOC. The practitioner should be cautious of using any KSLOC-based defect
density prediction models that have not been normalized. These models will penalize software developed in
higher order languages such as C#, Java, etc. The practitioner should also verify that the KSLOC
predictions are normalized prior to multiplying them by the predicted defect density tables in this
document.
There are several methods for predicting the size of the software in terms of either KSLOC or function
points or both, before the code exists. It is not the purpose of this recommended practice to cover all of
these or to make a recommendation. A summary is shown as follows: 13
In addition to the preceding methods, an organization can develop their own size prediction model as
follows:
13
COCOMO is a registered trademark of Barry W. Boehm. Price is a registered trademark of Price Systems, L.L.C. This information
is given for the convenience of users of this standard and does not constitute an endorsement by the IEEE of these products.
Equivalent products may be used if they can be shown to lead to the same results.
177
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
If the practitioner is predicting size in terms of KSLOC, the size predictions will require normalization for
both effectiveness and language type. If the size is predicted in terms of function points the following steps
are not required.
Reused code—Code that has been previously deployed on a previous or similar system without
modification.
New code—Code that has not been used in operation.
Modified code—Reused code that is being modified.
Auto generated code—Code that is generated by an automated tool.
The steps shown in Table B.1 (Fischman et al. [B18], Neufelder [B68]) are executed to predict the effective
KSLOC. Effective code is code that has not been deployed operationally and therefore has not experienced
reliability growth.
New code is 100% effective. Code that is reused without modification is fractionally effective as there may
still be a few latent defects in that code. Code that is reused with modifications will have an effectiveness
that is between these two extremes. Reused code that is subject to major modifications, for example, may
be almost as effective as new code. Reused code with minor cosmetic changes, for example, may be
slightly more effective than reused code with no modifications. Auto-generated code typically has an
effectiveness that is similar to reused and not modified code since it is generated by a tool that has been
fielded. Deleted lines of code are counted depending on whether the deletion is within a function of code or
178
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
whether an entire function has been deleted. If lines within a function are deleted, the deletion counts as a
modification. If entire functions, files, or classes are deleted, the deletion simply results in less KSLOC. A
variety of tools exist for size prediction as discussed in the next subclause.
Once the effective KSLOC is predicted the last step is to normalize the EKSLOC based on the language
used to develop the code. Some languages such as object oriented languages are denser than other
languages. This normalization allows for defect density to be applied across projects developed in different
languages. Multiply the corresponding conversion shown in Table B.2 to the predicted EKSLOC to yield
the normalized EKSLOC.
Example: There are two organizations developing software for a system. Both organizations are developing
their code in C++. Weightings are determined as follows: A - 0.40, B - 0.30, C - 0.15, D - 0.05, E - 0.01.
generated
Moderate
EKSLOC
modified
modified
modified
KSLOC
KSLOC
KSLOC
KSLOC
KSLOC
Reused
code in
Minor
Major
Auto-
Total
LRU
A 50 0 0 0 0 0 50
The total EKSLOC is then shown as follows with the language type and normalization. The normalized
EKSLOC is multiplied by the predicted defect density to yield the predicted defects as shown in Table B.4.
Note that even though defects are an integer value, the predictions retain the significant digits since the
predicted defects will be used in the MTBF calculations.
Figure B.2 shows the steps for predicting the code size when the source code is not available, which is
typically the case for COTS LRUs.
179
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
a) Is this COTS component part of the operational environment? If not it should not be included in the
predictions. If the COTS is part of the operational environment then proceed to step b).
b) Has the COTS component ever been used for a previous system? If so, compute the actual number
of defects (even if it is zero) from that previous system and use that for the prediction. Otherwise
proceed to step c).
c) Install the COTS software exactly as it will be installed at deployment. Count up the number of
kilobytes (kB) of all executables and DLLs.
d) Multiple the result of step c) by 0.051. This now has the estimate size in terms of KSLOC of C
code (Hatton [B26]). Since the each line of C code expands to approximately 3 lines of assembler
code, the expansion ratio from assembler to C code is about 3:1. The normalized KSLOC is
therefore 3 × 0.051 × number of kilobytes.
e) Multiply the result of step d) by 0.1 if the COTS software has been deployed for at least 3 years.
f) Estimate the total number of installed sites for the COTS component. Multiply this by the
appropriate value in Table B.5.
Example: COTS product that will be deployed with the system is installed. It has never been used before on
any past system so there is no past history regarding actual fielded defect data. It is installed exactly as it
will be deployed and count of the number of kB in all applicable and executable type files. These files have
a suffix of “.dll” or “.exe”. The total count is 1000 kB. As per step d), the estimated KSLOC in C code is
therefore 1000 × 0.051 = 51 KSLOC and the normalized KSLOC is therefore 51 × 3 = 153 KSLOC. The
COTS software has been mass deployed several years so as per step e) the normalized EKSLOC is
multiplied by 0.1 to yield 15.3 EKSLOC. The COTS software is mass deployed so as per step f) the final
normalized EKLSOC = 15.3 × 0.01 = 0.153 EKSLOC.
The easiest way to predict the effective size of the firmware is to predict the executable size first and then
convert that to EKSLOC as per B.1.2. Alternatively if one can predict how many rungs of ladder logic code
pertain to a line of assembler or C then the size can be predicted from that.
In 6.2 the three simplest methods for predicting defect density (which is necessary for predicting the
software failure rate) were presented. This subclause provides for additional methods for predicting defect
density.
180
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
In 6.2.1.1 a method of predicting defect density based on 22 parameters was presented. All of the 22 input
parameters are relatively easy to collect or observe. The 22 parameters provide for some level of sensitivity
analysis. The full-scale model (Neufelder [B67], SAE [B86]) has three different forms ranging from
94 questions to 377 questions for those who are interested in a more detailed prediction as well as a detailed
sensitivity analysis. As shown in Table B.6, the full-scale model form A has 94 questions that are usually
described in the software development plan or via interviews with the software development team. The
full-scale model B has 132 additional questions that require knowledge of the software plans and schedule.
The full-scale model C has 151 additional questions that pertain to the requirements, design, and test
strategy. This form is useful for those who need to improve the effectiveness of their development
deliverables.
Each of the forms covers the factors shown in Table B.7, which have been shown to correlate quantitatively
to fielded defects. Each of the columns shows how many questions are in that model form that pertain to
the categories shown to the left. For example, there are 13 questions related to avoidance of big blobs in
model form B.
181
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Table B.7—Summary of the factors that have been correlated to fielded defects
(continued)
Model Model Model
Category of questions
form A form B form C
Process—Ability to repeat the practices for developing the software. 5 23 25
Requirements—Ability to clearly define the requirements for what the software
11 7 33
is required to do as well as what the software should not do.
System testing practices—Ability to test the OP, design, requirements, failure
18 19 63
modes, etc.
Unit testing practices—Ability to test the design from the developer’s point of
8 7 38
view.
Visualization—Use of pictorial representations whenever possible. 2 3 13
The metric-based software reliability prediction system (RePS), illustrated in Figure B.1, is an approach
that bridges the gap between a software metric and software reliability. RePS is a bottom-up approach
starting with a root, which is a user selected software engineering Metric (M) and it associated
measurement Primitives (P). The selection of the root metric is normally based on measurement data that
are available to an analyst. Software Defects (D) can then be predicted through a Metric-Defect Model (M-
D Model). The Defect-Reliability Model (D-R Model) further derives SR predictions based on software
defects and the operational profile. Detailed M-D models and D-R models are explained in the next
subclauses.
M-D Models
Software metrics can be directly or indirectly connected to defects information, e.g., the number of defects,
the exact locations of the defects, and types of defects. The connection can be built based on the
measurements of primitives through rigorous measurement rules. For instance, defect information could be
obtained through software quality assurance activities such as formal inspections and peer reviews or
through empirical models. There are three cases of defect information that can be derived from the M-D
models based on the current IEEE study of software metrics (IEEE Std 982.1™).
a) Only the number of defects can be estimated for the current version of software product;
b) The exact content (e.g., the number, location and type) of the defects are known for the current
version of software product;
c) The estimated number of defects in the current version of software product and the exact content of
defect found in an earlier version of the software product are known.
Thirteen such metrics have been investigated and detailed measurements rules for obtaining the preceding
defect information can be found in Smidts [B85]. The RePS proposes three different D-R models for each
of the defect information case as follows.
182
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
B.2.2.1 D-R Model I: Reliability Prediction Model using only the number of defects
As shown in 5.3.2.3 Steps 1 through 5 and in 6.2.2, the General Exponential Model is a popular model for
predicting the failure rate based on an estimate of the number of defects remaining, which is given as Ne,MC.
See Equation (B.1).
Thus, the probability of success over the expected mission time t is obtained using Equation (B.2):
Since a priori knowledge of the defects’ location and their impact on failure probability is not known, the
average growth value given in 5.3.2.3 Step 3 can be used.
B.2.2.2 D-R Model II: Reliability Prediction Model using the exact defect content
When the exact content of the defects is known to the analyst, the failure mechanism can be explicitly
modeled using the propagation, infection, and execution (PIE) theory (Voas [B94]). Per the PIE theory, the
failure mechanism involves three steps: first, defect needs to be observed in execution (E), then the
execution of this defect should infect (I) the state of the execution, and finally the abnormal state change
183
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
should propagate (P) to the output of the software and manifest itself as an abnormal output, i.e., a failure.
Failure probability can therefore be estimated using the PIE model:
j =t /τ N
=Pf Pr ( EE ( i, j ) ∩ I E ( i, j ) ∩ PE ( i, j ) ) (B.3)
=j 1 =i 1
where
EE(i,j), IE(i,j), and PE(i,j) are the events “execution of the location occupied by the ith defect during the jth
iteration,” “infection of the state that immediately follows the location of the ith defect during the
jth iteration,” “propagation of the infection created by the ith defect to the program output during
the jth iteration,” respectively
τ is the expected mission time
N is the number of defects found
An extended finite state machine model (EFSM) has been proposed and validated (Shi, Smidts [B79]) to
solve Equation (B.3). The EFSM starts with modeling software system using states and transitions. Defects
and their effects on the software are then modeled explicitly as additional states of the EFSM. In addition,
EFSM can incorporate the operational profile (OP), which is a quantitative characterization of the way in
which a system will be used (Musa [B60]). It associates a set of probabilities to the program input space
and therefore describes the behavior of the system. In D-R model II, the software OP is mapped directly in
the EFSM as the probability of transitions.
B.2.2.3 D-R Model III: Reliability Prediction Model using estimate of number of defects in
current version and exact defect content found in earlier version
Model I alone overlooks the available defect content information found in previous versions of the
software. Both Model I and Model II can be used to make use of the information found in previous
versions. More specifically, since the defect location in previous versions of the software is known, the PIE
model can be used first to obtain a software-specific growth rate Q through the propagation of known
defects in an early version of the software system using the PIE theory and the inverse of the General
Exponential Model. That is:
This new calculated Q will be much more accurate than the average Q used in Model I. Once the new
growth rate is obtained, Model I is then used for reliability prediction knowing the number of defects
remaining in the software. This model is thus named as the Combinational Model (Model III).
As part of the validation of the metric-based SR prediction system, twelve software engineering metrics are
investigated and associated detailed M-D model and D-R model are successfully developed. The thirteen
root metrics are: defects per line of code, cause and effect graphing, software capability maturity model,
completeness, cyclomatic complexity, coverage factor, defect density, defect days number, function point
analysis, requirement specification change request, requirements traceability, and test coverage. Further
RePS details can be found in Smidts [B85] and Shi et al. [B78].
An organization can define their own defect density prediction model/averages by collecting historical data
from similar projects. The similar historical data should be derived from software releases for similar
products that were deployed in the last decade. The more similar the historical release is to the release
under analysis, the more accurate the estimate. Historical data from similar application types can be
grouped. The historical data should then be calibrated based on the development practices employed for
184
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
that historical project. Any of the survey based defect density prediction models can be used to calibrate the
historical data. The checklist for using historical data to predict defect density is shown in Figure B.4.
Example: As shown in Table B.8, there is historical data from three previously fielded systems that have
the same application and industry type. Release X and release Y were deployed more than 3 years ago.
However, release Z was not. It is removed from the historical data. The CMMI level was the same on all
historical projects—level 2.
The historical fielded defect density is 0.367 defects/EKSLOC, which is the average of the defect density
for release X and release Y since release Z has not been operational long enough to be included. It is known
that the current release will be operating at CMMI level 1. The historical defect density of 0.367 is
calibrated using the CMMI lookup tables in 6.2.1.3. Since the average CMMI defect density at level 1 is
0.548 and the average at level 2 is 0.182, one can use that to calibrate the current release to the historical
averages.
Predicted defect density for current release = calibrated historical average = (0.548/0.182) × 0.367 =
3.01 × 0.367 = 1.104 defects/EKSLOC
185
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
It has been observed that the distribution of software defects over the life of a particular development
version will resemble a bell shaped (or Rayleigh curve) for virtually all software projects (Putnam [B71])
as illustrated in Figure B.5. Statistically, at the peak about 39% of the defects have been found. If one can
predict the peak one can predict the total defects by simply multiplying the peak by 2.5. The total defects
can be predicted in advance by using industry available databases and tools.
Peak
Figure B.5—Illustrative Rayleigh defect discovery profile over the development stages
This model was developed in 1992 (SAIC [B77]). Several components of the model are outdated due to the
fact that software engineering and software products have changed significantly since 1992. Interestingly,
the parts of the model that are not outdated are still relevant today as shown in the B.3.
The Neufelder Prediction Model is based on the Quanterion Solutions Incorporated 217Plus™:2015
Reliability Prediction Methodology, as implemented in the Quanterion 217Plus™:2015 Calculator
(Quanterion [B72]). 14 This model provides for a way to predict software defect density using the Process
Grade Factors defined in the 217Plus™:2015 Handbook and Calculator. Hence, the practitioner can predict
software reliability and hardware reliability using one method.
This subclause contains a summary of the factors that appear the most often in industry available SR
assessments such as the AMSAA Software Reliability Scorecard, the Rome Laboratories prediction model,
the Shortcut model, and the Full-scale model. Also shown is the percentage of organizations that had
successful, mediocre, and distressed SR when the software was deployed. More than 500 software
characteristics have been correlated to the success (from a reliability standpoint) of the software (Neufelder
[B68]). Table B.10 represents those that are referenced in the most number of predictive models and are the
most sensitive to the outcome of the software reliability, which is shown in Table B.9. One can see that
many of the factors pertain to white box unit testing (5.3.9.2), planning ahead, tracking the progress, test
metrics and test suites (5.3.9, 5.4.6, 5.4.1), subcontractor management (5.3.2.5) and identifying and testing
the exceptions and exception handling (5.4.1.5).
14
217Plus is a trademark of Quanterion Solutions Incorporated. This information is given for the convenience of users of this
standard and does not constitute an endorsement by the IEEE of these products. Equivalent products may be used if they can be
shown to lead to the same results.
186
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
For a summary of the top factors referenced in most SR assessment models and are the most sensitive to
outcome of the release, reference Table B.10.
The software system test plan is formally reviewed 4 1.00 0.71 0.00
187
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Software development methods and tools are defined 4 1.00 0.42 0.10
The best software LCM is executed with respect to the
4 1.00 0.36 0.11
particular software project
There is a coding standard 4 0.88 0.70 0.00
There are regular status reviews between software system
4 1.00 0.83 0.17
testers and SW management
The requirements documents are kept up to date after
4 1.00 0.89 0.20
development begins
There is a formal means by which to filter, assign
4 0.89 0.58 0.10
priority, and schedule customer requests
There are software requirements for boundary conditions 4 1.00 0.33 0.25
188
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Annex C
(informative)
Prior to using Annex C the reader should select the best reliability growth models as per 5.4.5. Since the
SRG models are dependent on the demonstrated failures per time period and since there are four
possibilities for what the failure trend can be at any given time, there is no “one” best model that works for
all applications. The practitioner needs to organize and analyze the failure data during testing as per 5.4.5
and then select the model(s) that are applicable or provide a good practical solution. The following models
are shown in order of ease of use from easiest to most difficult with regard to the data that needs to be
collected and the calculations that need to be performed.
C.1 Models that can be used when the fault rate is peaking
See 6.3.1.
C.2 Models that can be used when the fault rate is decreasing
189
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Table C.1—Estimations
Estimated
Estimated current reliability Parameter
Estimated remaining Estimated current
Model current as function of estimation
defects failure rate
MTBF required mission reference
time
Musa Basic N0 – n λ(n) = λ0 (1–(n/N0)) The inverse –( λ(n) × mission time) 6.3.2.3 and
e
Jelinski- λ(n)= k(N0–n) of the or 6.3.2.4
Moranda estimated –( λ(t) × mission time) confidence
e
Goel- λ(t) = N0ke–kt failure rate bounds
Okumoto
Shooman
t λ(t) = C.2.1.1
linearly N 0 − Kt 1 − k[ N 0 −
decreasing 2τ 0
defect
removal t
Kt 1 − ]
2τ 0
Duane Infinite λ(t)=btα C.2.1.3
Geometric See NOTE. λ(n)=λ0pn C.2.1.3
Where: n = observed cumulative faults found so far; t = observed total test hours so far; mission time = desired or
required mission time of the software in operation.
NOTE—The Geometric Model assumes infinite defects. However, the removal level is estimated by 1 – λ0n.
The first three following parameters are estimated as per 6.3.2.3. The confidence of the estimates is
estimated as per 6.3.2.4.
The following parameter is specific to the Geometric Model. Refer to C.2.1.3 for instructions on how to
estimate this parameter.
p—The Geometric Model takes its name from the fact that the term pi is a decreasing geometric sequence
when p is in the interval (0, 1). In the Geometric Model, the removal of the first few defects decreases the
failure rate more significantly, while the removal of later defects decreases the failure rate much less. These
trends often agree with software testing in practice because the defects discovered earlier are often those
that reside on more commonly executed paths and therefore be more likely to occur. Defects discovered
later may be more difficult to detect because they are only exposed when relatively rare combinations of
commands are executed, thereby resulting in a fault. Since pi tends to zero only as i goes to infinity, the
Geometric Model assumes an infinite number of failures, and it is not meaningful to estimate the number of
defects remaining. Instead, the purification level is used to estimate the defect removal.
While each of the models has similar assumptions, Table C.2 shows that some of them have different
assumptions for the likelihood of each fault being observed:
190
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Table C.2—Assumptions
Model Likelihood of each fault being observed
Musa Basic Equal
Jelinski-Moranda Equal
Goel-Okumoto No assumption
Shooman linearly decreasing Equal
defect removal a
Geometric Each defect causes failures with a rate that does not change over time. However,
this rate differs across defects, and the defects with a higher rate tend to be
detected earlier during testing.
a
See Shooman [B82], pp. 254–255.
There are two parameters to estimate in the defect removal model K and τ0. A simple way to estimate these
parameters is to compute the derivative of the defect removal model using Equation (C.1):
d ( Remaining defects (τ ) ) τ
= K 1 − (C.1)
dτ 2τ o
τa τb
Compute the derivatives at the midpoints of the first two intervals, and τ a + , using Equation (C.2)
2 2
and Equation C.3):
Corrected defects ( ∆τ α ) τ
= K 1 − a (C.2)
∆τ a 4τ 0
τ
Corrected defects ( ∆τ b ) τ a + b
= K (1 − 2 ) (C.3)
∆τ b 2τ 0
Solve for the parameters k and ET , using Equation (C.4) and Equation (C.5):
Ha 1
MTTF
=α
= (C.4)
ra τ
k [ N 0 − K 1 − a ]
2τ 0
Hb 1
MTTF
=b
= (C.5)
rb τ
k [ N 0 − K 1 − b ]
2τ 0
191
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Plot the observed time between failures versus test hours on log-log paper. Draw a best straight line
through the data. Using the formula for a straight line Y = mX+c.
α= the slope m
ln(b) = c = the Y intercept of this plot
The parameters λ0 and p are unknown values that are estimated. Once inter-failure times data set such as
the one given in 5.4.5.5 have been collected, one can estimate p with Equation (C.6).
n
( i − 1) ˆ
pn n
∑
= ∑ ( i − 1) pˆ (i−2) xi (C.6)
∑ pˆ xi i 1
n
=i 1 = ˆ
p i
i =1
Where xi is the ith inter-failure time and n is the total number of faults observed. The parameter estimation
can be accomplished with an available SRG tool, a spreadsheet, graphical methods, or numerical
algorithms. Once p has been estimated, λ0 can be estimated by substituting the estimate for p into
Equation (C.7):
ˆ
pn
= (C.7)
∑ i=1pˆ i xi
n
The confidence bounds for this model can be derived similarly to the confidence bounds for the Musa Basic
Model. Instead of estimating the confidence of the inherent defects, one will estimate the confidence of the
initial failure rate parameter λ. Calculate an estimate of λ for each time interval in which there a fault was
observed. There should be as many estimates of λ as there are data points on the graph. For each estimate:
Establish the desired confidence which is (1–α). If 95% confidence is desired then set α to 5%.
Using normal charts determine Z
(1–α)/2
The estimates will have a higher confidence as the number of data points increases. So, the range on the
estimates used should become smaller during testing. Confidence intervals are recommended when
specifying or measuring reliability values.
192
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Table C.3 shows the estimated remaining defects, current failure rate, current MTBF and current reliability
for each model.
Where
The observed values are: θ = the rate of change of the faults observed; n = observed cumulative faults found so far;
t = observed total test hours so far; mission time = desired or required mission time of the software in operation.
The estimated values are: N0 = the number of inherent defects within the software; λ0 = estimated initial failure rate
of the software.
For the Logarithmic Model refer to 6.3.3.3 for instructions on estimating the parameters. For the Shooman
Model, Duane Model, and Log Logistic Model, the instructions are as follows.
Assume that there are two sets of fault data collected in intervals 0 – τa and 0 – τb . The resulting
estimator formulas are shown in Equation (C.8) and Equation (C.9):
ln{n (τ a )} − ln[n (τ b )]
α = (C.8)
τa −τb
ln [n (τ a )] + ln [n (τ b )] + α (τ a + τ b )
ln[ N 0 ] = (C.9)
2
Once N0 and α are determined, one set of integration test data can be used to determine k, as shown in
Equation (C.10):
1
MTTFa = (C.10)
kNe −ατ a
That yields Equation (C.11):
193
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
1 1
=k −ατ a
× (C.11)
N 0e MTTFa
Solve the following system of two simultaneous equations [Equation (C.12) and Equation (C.13)] where:
nλˆκˆ tnκˆ
λˆ = (C.12)
tiκˆ λˆκˆ
(
2 1 + κˆtn κˆ
) ∑ i =1 1 + t κˆ λˆκˆ
n
( ) ( )
κˆ
n n log λˆtn n n λˆti log λˆti
= ˆ
− n log λ − ∑ log ti + 2∑ (C.13)
( )
κˆ
κˆ κˆ 2
ˆ
1 + λ tn ( )
=i 1 =i 1 1 + λˆti
( )
κˆ
n 1 + λˆtn
N0 = (C.14)
( )
κˆ
λˆt n
C.3 Models that can be used with increasing and then decreasing fault rate
The following model can be used when the fault rate is increasing and then decreasing. However, it should
be pointed out that the models in C.2 can be used if one filters the fault data for only the most recent
segment of data that has a decreasing fault rate.
The Yamada (Delayed) S-shaped model is one of the earliest and simplest models that can fit fault
detection processes that exhibit an S shape. It has only two parameters, making it easier to apply than some
other S-shaped models. Many data sets that cannot be characterized by a model with an exponential exhibit
an S shape.
This model uses times of failure occurrences. This model was proposed by Yamada, Ohba, and Osaki
[B97]. The difference is that the mean value function of the Poisson process is S shaped. At the beginning
of the testing phase, the fault detection rate is relatively flat but then increases exponentially as the testers
become familiar with the program. Finally, it levels off near the end of testing as faults become more
difficult to uncover.
C.3.1.1 Assumptions
194
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The estimated remaining defects =a – n, where n = actual cumulative number of defects found so far in
testing.
The estimated failure rate = ab2te–bt, with both a, b > 0, where b is a shape parameter.
The estimated MTBF is a function of the parameter estimation. The practitioner should rely on an
automated tool to compute this.
− bˆ( t )
The estimated reliability at time t is estimated as R ( t ) = e − aeˆ .
The maximum likelihood estimate for the failure detection rate parameter is shown in Equation (C.15):
ˆ e
at
ˆ
2 − bt n
= ∑
n
i i −1 i (
( y − y ) t 2 e − btˆ i − t 2 e − btˆ i−1
i −1 ) (C.15)
n
( ˆ
i =1 1 + bti −1 e
−
)
ˆ
bt i −1
(
− 1 + bt i )
ˆ e − btˆ i
Where n is the number of observation intervals, ti the time at which the ith interval ended, ki the number
yi = ∑ j =1 ti is the cumulative number of faults detected by the
i
of faults detected in the ith interval, and
end of the ith interval. The only unknown in this equation is b. Thus, the estimate of b is the value that
makes the equation on the left equal to zero. The graphical method can be applied here.
Given the estimate of parameter b, the maximum likelihood estimate for the number of faults is shown in
Equation (C.16):
∑
n
k
i =1 i
aˆ =
(C.16)
( ( ˆ e − btˆ n
1 − 1 + bt n ) )
C.3.1.4 Estimate confidence bounds
To estimate the confidence intervals for the model parameters, substitute the numerical parameter estimates
â and bˆ into the following 2×2 matrix of equations and compute the inverse:
−1
x xab
Σ = aa
xab xbb
195
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
where the entries of the matrix are shown in Equation (C.17), Equation (C.18), and Equation (C.19):
xaa =
1 − 1 + bt(
ˆ e − btˆ n
n ) (C.17)
aˆ
ˆ 2 e − btˆ n
xab = bt (C.18)
n
2
ti2 e − btˆ i − ti2−1e − btˆ i−1
n
ˆ ˆ2 ∑
xbb = ab (C.19)
i =1
ˆ
1 + bti −1 e ( ˆ
− bt i −1
) (ˆ
− 1 + bti e − bti
ˆ
)
The numerical value in Σ1,1 = Var [ aˆ ] , while the value in Σ 2,2 = Var bˆ . The upper and lower
ˆ
confidence intervals for â and b can then be calculated by substituting the estimates for the parameters
and their variances into the following:
aˆ ± Z α Var [ aˆ ]
1−
2
bˆ ± Z Var bˆ
1−
α
2
α
Where Z α is the 1 − critical value of a standard normal distribution.
1−
2 2
C.4 Models that can be used regardless of the fault rate trend
a) The defects do not necessarily cause failures with the same rate (i.e., the detection rates of the
defects may differ).
b) Once a defect causes a failure, it is corrected without introducing any new faults.
c) The software is tested similarly to how it will be used in an operational environment.
d) Specifically, it is assumed that the failure rate function has the shape of a Weibull probability
density function. Thus, this model can accommodate cases where the failure rate increases at the
beginning of testing and then decreases.
196
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
where
− kt c c −1
Estimated failure rate = abce t where b and c are shaping parameters to be estimated and t = the
cumulative amount of operational testing so far.
1
Γ + 1
The estimated mean time to failure is given by MTTF =
cˆ where Γ is the Gamma function.
1
bˆ cˆ
ˆ cˆ
− aˆ e− bt
Estimated reliability as a function of t hours is R (t ) = e
.
For failure count data, the maximum likelihood estimates for the parameters b and c are obtained by
solving the following system of two simultaneous equations [Equation (C.20) and Equation (C.21)]:
n ( ˆ cˆ
ki ti cˆ e − bti − ti −1cˆe − bti−1
ˆ cˆ
)−t e
ˆ cˆ
cˆ − bt n
∑
n
k
∑
n i =1 i
ˆ cˆ
− bt ˆ cˆ ˆ cˆ 0
= (C.20)
i =1 e i −1
− e −bti 1 − e −btn
n (
kibˆ ti cˆ ln ti e − bti − ti −1cˆ ln ti −1e − bti−1
ˆ cˆ ˆ cˆ
) − btˆ cˆ ˆ
ln tne − btn
cˆ
∑
n
k
∑
n i =1 i
ˆ cˆ
− bt ˆ cˆ
− bt ˆ cˆ
− bt
0
= (C.21)
i =1 e i −1
−e i
1− e n
Where n is the number of observation intervals, ti the time at which the ith interval ended, and ki the
number of faults detected in the ith interval.
Given the estimates of the parameters b and c, the maximum likelihood estimate for the initial number of
defects is shown in Equation (22):
∑
n
k
i =1 i
aˆ = ˆ cˆ
− bt
(C.22)
1− e n
In the course of the software development, techniques commonly used to uncover defects before software
testing are peer review, walkthrough, and inspection (IEEE Std 1012™-2012 [B33], IEEE Std 1028™, and
197
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
IEEE Std 12207-2008). Although peer review, walkthrough, and inspection differentiate themselves in their
processes and formats (inspection has the most formal process, but peer review and walkthrough are less
formal), they are all intended to discover defects in software artifacts. Results from these activities are
normally defects discovered in the software artifacts and often used to verify whether the entry criteria to
the next development phase are met.
The number of remaining defects escaped from the peer review, walkthrough, or inspection should be
accounted to better control the software development process. A statistical technique, capture/recapture
(CR), has been exploited to estimate the number of remaining defects after the review and the inspection.
The CR model was introduced in biology to estimate the size of an animal population in one closed area. In
doing so, traps are in place, animals are captured (trapped), marked, and released. The captured animals
have chances to be recaptured. The number of captured and recaptured animals correlates to the animal
population in such a way that one can argue the more recaptured, the less population size, and vice versa.
The CR model, however, was developed to statistically estimate the population size.
This CR concept can be transferred to the software engineering world in such a way that defects are
analogous to animals, inspectors/reviewers are analogous of traps. Thus defects discovered are analogous to
animals trapped, total defects in a software artifact are analogous to animal population in a closed area.
CR models assume inspectors (traps) are independent. Researchers in statistics developed multiple CR
models to reflect different assumptions on detection probability (animal trapped, defect detected) and
detection capability (trap captures animal, inspector/reviewer discovers defect). Some CR software, such as
CAPTURE, is available in public domain, to offer the estimates of the total number of defects. Multiple
estimators are provided by this software. Refer to Chao et al. [B9], Rexstad [B76], and White et al. [B96]
for more information.
The following survey was completed by 15 members of the IEEE 1633 Working Group (WG). The results
were used to identify the most popular SR growth models, which were selected in 5.4.5. While the results
from the IEEE 1633 WG are provided, the determination of which model to use should be based on two
key factors: 1) the software model's assumptions are met, and 2) the plot of the actual software failure data
(test or operational data) aligns with the model’s data plot and successfully passes a goodness-of-fit test.
Reliability/RAM 8
Software engineering 8
Other 2
Note the numbers do not sum to the number of responses (n = 15) because some individuals
indicated both reliability/RAM and software.
198
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
2. Have you ever used software reliability growth models (SRGM) in your work?
Yes 12
No 3
Yes 6
Prefer automation/Maybe 3
No 3
On a scale of from 1 to 5 with 1 being the best and 5 the lowest, please rank the following models
according to your individual experience. For example, if a particular model has characterized a
data set you analyzed very well you should assign it a score of 1. For models that rarely or never
achieved a good fit to your data, a lower score would be appropriate.
Not all WG members with SRGM experience answered this question (n = 12). Half said they would use a
model that is not automated. The percentage of users of this document that would use a model that is not
automated may be significantly lower.
199
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Annex D
(informative)
Table D.1 provides for an estimate of the typical relative effort and relative calendar time for each of the
tasks in this document. Some of the tasks require a culture change more than a cost in that it means doing
existing tasks differently as opposed to adding additional tasks. Some require calendar time but not so
much effort. Some require automation but not so much effort. Some tasks may require effort and
automation and calendar time but those are typically not the essential tasks. Recall that prediction and SR
growth models were presented that range from simple to detailed. The practitioner can reduce expenses and
calendar time by choosing the simplest of practices. As with any new engineering practice, it is the most
costly to execute the first time. After the practices have been deployed and people become more familiar
with them, the relative cost generally diminishes.
Key:
L—Low
M—Medium
H—High
V—Varies depending on the scope selected
*The same tools used for hardware reliability can be used for SR.
B—Basic tools such as spreadsheets, etc.
Culture change—Many of the SRE tasks require more of a culture change and less of a cost. For example,
SFMEAs are typically difficult to implement because software engineers have not been trained to view the
failure space. Once they accept the failure space viewpoint the analyses are much easier to perform and
take less time.
Effort—This is in terms of work hours and not necessarily calendar time. Several tasks can be combined
with already existing software tasks. For example, the SFMEA can be combined with an existing
requirements, design or code review. See the last 2 columns to identify which tasks can be merged with
either software development tasks or reliability engineering tasks.
Calendar time—Some tasks do not require a significant amount of work time but do require calendar time.
For example, it takes time for people to get trained on something new.
Requires automation—Some of the tasks are difficult to do without some automated tool. Tasks that require
only basic tools are identified as well as those that can use the same tools that are used for hardware
reliability tasks.
200
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Table D.1—Relative culture change, effort, calendar time, and automation required to
implement the SRE tasks
Culture change
Calendar time
Merge
Automation
Merge with
with
Effort
existing
SRE task existing
reliability
SW
practices?
practices?
Planning for SR
5.1.1 Characterize the software system L L L Yes
5.1.2 Define failures and criticality L L L Yes
5.1.3 Perform a reliability risk assessment M L L Yes
5.1.4 Assess the data collection system L L L
5.1.5 SRP—Software Reliability Plan L-M L-M L- Yes
M
Develop failure modes model
5.2.1 Perform software defect root cause M M L B Yes
analysis
5.2.2 Perform SFMEA M-H V V B Yes
5.2.3 Include software in the system FTA M-H V V B Yes
Apply SR during development
5.3.1 Identify/obtain the initial system reliability L L L Yes
objective
5.3.2 Perform a software reliability assessment M-H M-H M L-
and prediction M
5.3.3 Sanity check the early prediction M L L
5.3.4 Merge the SR predictions into the overall L-M L-M L * Yes
system predictions
5.3.5 Determine an appropriate overall SR L L L Yes
objective
5.3.6 Plan the reliability growth M-H L L L Yes
5.3.7 Perform a sensitivity analysis M M L L
5.3.8 Allocate the required reliability to the M-H L-M L
software LRUs
5.3.9 Employ SR metrics for transition to testing L-M L-M L B
Apply SR during testing
5.4.1 Develop a reliability test suite M-H M-H M Y Yes
5.4.2 Increase test effectiveness via software M-H M-H M Y
fault insertion
5.4.3 Measure test coverage M-H M-H M- Y Yes
H
5.4.4 Collect fault and failure data L L L B
5.4.5 Select reliability growth models M L L
5.4.6 Apply SR metrics L-M L L B
5.4.7 Determine the accuracy of the prediction L-M L L B
and reliability growth models
5.4.8 Revisit the defect root cause analysis M L-M L B
201
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Table D.1—Relative culture change, effort, calendar time, and automation required to
implement the SRE tasks (continued)
Merge
Automation
Merge with
Calendar
with
Culture
change
Effort
existing
time
SRE task existing
reliability
SW
practices?
practices?
Support release decision
5.5.1 Determine release stability M-H L L
5.5.2 Forecast additional test duration M-H L L L
5.5.3 Forecast remaining defects and effort M-H L L L
required to correct them
5.5.4 Perform an RDT M M H L Y
Apply SR in operation
5.6.1 Employ SRE metrics to monitor field SR M-H M L
5.6.2 Compare operational reliability to M-H L L
predicted reliability
5.6.3 Assess changes to previous M-H M L
characterizations or analyses
5.6.4 Archive operational data M L L
202
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Annex E
(informative)
The automation capabilities associated with each of the SRE tasks in this document are discussed in 5.1.5.
Table E.1 shows some of the tool sets from academia and industry that can be used for the SRE tasks. The
following information is given for the convenience of users of this standard and does not constitute an
endorsement by the IEEE of these products. Equivalent products may be used if they can be shown to lead
to the same results.
203
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
204
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Annex F
(informative)
Examples
205
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
SW
Item BOM
class
SW assembly 210 213 221 210
224
Company
221
owned 232
224
224
COTS 213
231
232
FOSS 224 213 213
213 224 221
The following attributes can be applied for each item on the BOM:
The example shown in Figure F.3 is a system is a commercial multi-function printer that has printing,
scanning, and faxing capabilities. The software determines what will be printed, faxed, or scanned and
how. For example, it determines the pages, orientation, quality, color, etc. The software is also responsible
for detecting errors in the hardware. The printer is sold only in the US.
The customer profile is determined by analyzing similar recent printers and determining groups of
customers and the percentage of total customers that fall into each group. A similar previous printer was
sold to small businesses 70% of the time and copy shops 30% of the time. Based on the registration data for
the printers, 40% of the small business customers are professionals such as lawyers or accountants, while
60% are high-tech professionals such as engineers and computer programmers. The copy shops have two
users—the customer who walks in the store and wants to use the printer on a page fee basis and the copy
206
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
shop employee who is performing a service for a walk-in customer. Based on surveys conducted of the
copy shops the printer is used by walk-in customers 25% of the time and copy shop employees the
remainder.
Scanning
Scanning
Scanning
Printing
Printing
Printing
Printing
Faxing
Faxing
Faxing
Faxing
System mode
profile
50% 25% 25% 60% 40% 0% 95% 5% 0% 10% 30% 60%
80% Manual dial
5% Legal size
0% Auto feed
0% Auto dial
50% Manual
40% Manual
50% Manual
20% 8.5x11
70% 8.5x11
70% 8.5x11
0% 11x17
30% 11x17
15% 11x17
15% 11x17
Functional
profile
0.00%
2.80%
3.50%
3.50%
1.40%
5.60%
1.26%
7.56%
6.72%
1.07%
1.07%
4.99%
0.00%
0.34%
0.34%
1.58%
3.38%
3.38%
0.00%
Operational
profile
The functions are printing in legal size, 11x17, and 8.5x11. The auto feeder supports scanning of multiple
page documents while the manual scan is used for one-page document or pictures. The fax can be initiated
via an auto dial or by manually dialing. The auto dial is good for several fax jobs while the manual fax is
good for a one-time fax to one recipient. Based on interviews with end users and the past service logs the
percentage of time that each of the end users at each of the customer sites is performing a particular
function has been identified as per Figure F.3.
Finally, when the profile is complete, the OP is computed for each customer type, end-user type, function
mode, and function by multiplying the probabilities. Not surprisingly printing in 8.5x11 is a high
probability function.
Example 1:
It is required that a system have water flow at 8 gal a minute upon command, and that the water stop
flowing upon command and/or if the level reaches a necessary level. Water level is critical and has to be
207
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
supplied when too low and also cannot overflow. The valve is located 100 m from tank receiving the water.
As this is a critical function, for software it is a has-to-work function and the system (with the human
operator) needs to be of the highest reliability.
Solution 1:
HW: Single setting valve (on/off) w. electronic actuator (highly reliable); two highly reliable
sensors: water detector at maximum height—allowable flow from valve over the 100 m, water
detector at minimum height.
SW: Upon command from operator (who monitors water level) activates the valve to open with
three consecutive open commands; upon detection of high water mark or command from operator,
shuts off valve (again w three consecutive close commands).
Some improvements to this solution are:
Solution 2:
HW: Single setting valve (on/off) w. electronic actuator (highly reliable); six highly reliable
sensors: three water detectors at maximum height minus allowable flow from valve over the
100 m, three water detectors set 1 m to 2 m above minimum height.
SW: Upon command from operator (who monitors water level) activates the valve to open with
three consecutive open commands OR upon detecting and reporting two or more low water level
sensors, turns the valve on and alerts the operator; upon detection of two or more high water mark
sensors, or at the command from operator, reports high water level achieved to operator and shuts
off valve (again with three consecutive close commands).
While both solutions increase the reliability of sensing and responding to a low or high water situation, the
complexity of the software has increased to a minor degree with the addition of the voting scheme and
algorithm for the extra sensors and the closed loop reporting. However, even the following third solution
may not be optimal if the system is safety critical such as would be the case if the cooling water was needed
for the rods in a nuclear reactor.
Solution 3:
– Hardware:
Single setting valve (on/off) w. electronic actuator (highly reliable);
CHANGE: Three reliable float tank level detectors with switches at the minimum and
maximum levels to continuously monitor and report on water level;
Two separate water detectors at maximum height minus allowable flow from valve over the
100 m;
Two separate water detectors set 1 m to 2 m above minimum height.
ADD: A flow detector to assure valve opened and water is flowing at expected volume.
– Software:
Monitor water level of tank and report.
Upon command from operator or a signal from either the tank level detector or the low water
sensors, command the valve to open,
Monitor for appropriate flow from the flow sensor upon an open command from any source.
If flow is not detected within 10 s, the send an emergency alert to Operator and the plant.
If flow is below a specified level insufficient within 10 s to 20 s, the send an emergency
alert to Operator.
Upon command OR upon detecting approach of the tank high-level mark OR or upon two or
more high water level sensors, turn the valve OFF.
208
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
From a reliability aspect, the simpler the software is, the more reliable it likely is. However, when software
is chosen as a means to detect, identify, and respond (recover, reduce functionality, report, restart, etc.) to
hardware malfunctions, then the hardware impacts the software design and software may need to make
additional requirements on the hardware such as additional and smarter instrumentation (sensors and
actuators).
When software itself may fail, additional hardware may also be needed to improve the reliability of the
system. While redundant software (multiple implementations of the same requirements) is usually
considered to add so much complexity that it lowers the desired reliability, there are times and places where
back-up software of reduced functionality may still be warranted, perhaps the use of a fail over to an FPGA
or ASIC upon detection of a software failure. Watchdog timers are frequently used to monitor critical
software and restart or put the system into a hold state if the software does not keep its heartbeat resetting
the timer. Hardware and software engineers should work together to find the best system solution.
This example is from a software application that estimates the reliability of software using a variety of
software reliability growth models.
The only available artifacts for the software are an overview and a set of high-level requirements.
F.2.1.1.1 Overview
The application will automate several software reliability growth models. Previously the CASRE tool had
been in use but it no longer works on modern operating systems. CASRE also was prone to crashing
whenever the input data was not as the tool expected. For example, if the failure rate is increasing or is
increasing and decreasing CASRE often crashed without any notification to the user.
The software reliability tool runs on a Linux virtual machine hosted by the University of Massachusetts
(UMass) Dartmouth Computing Information Technology Services (CITS). When ready, the tool will be
accessible online. 15 However, at this time, it is restricted to individuals with access to the University's
virtual private network (VPN) and beta testers who have shared the MAC address of their machine.
The application functionality has been developed in the R programming language and the graphical user
interface (GUI) exposes this functionality through Shiny, a web application framework for R. R is a
procedural programming language that enables modularization. These modules include sets of functions to:
1) manage various input failure data formats, 2) perform trend test calculations, 3) identify the maximum
15
This tool will be accessible at http://srt.umassd.edu/.
209
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
likelihood estimates of software reliability models, 4) determine testing time requirements for fault
detection and reliability targets, and 5) compute goodness of fit measures.
The Shiny application framework defines GUI components such as layouts, including tabs, frames, text
boxes, combo boxes, radio buttons, input text boxes, sliders, and buttons, as well as graph objects. These
layouts are defined with scripts and the underlying R functionality is included in a manner similar to
general purpose programming language header file inclusion preprocessor directives. This enables the R
functions to be invoked by events triggered within the Shiny GUI.
Specification
a) The user will be presented with a GUI consisting of four tabs to 1) select analyze and filter data, 2)
set-up and apply models, 3) query model results, and 4) evaluate models.
b) The first tab (select analyze and filter data) allows the user to:
1) Specify an input file with inter-failure, failure time, or failure count data in Excel® or CSV
format. 16
2) Plot the data as time between failures, failure rate, or cumulative failures by selecting one of
these options from a combo box.
3) Execute test such as the Laplace trend test and running arithmetic average to assess if the data
set exhibits reliability growth.
4) Save plots in various image file formats by clicking a button and specifying a name within a
file dialog box.
c) The second tab (set-up and apply models) allows the user to:
1) Select a subset of the data to which models will be applied.
2) Indicate the prefix of the data that will be used to estimate parameters.
3) Select one or more models (Jelinski-Moranda, geometric, exponential, Yamada delayed
S-shaped, and Weibull) from a list and estimate their parameters by clicking a button.
4) Plot the data and model fits as time between failure, failure rate, or cumulative failures by
selecting one of these options from a combo box.
d) The third tab (query model results) allows the user to:
1) Estimate the time required to observe k additional failures.
2) Estimate the number of failures that would be observed given an additional amount of testing
time.
3) Estimate the additional testing time required to achieve a desired reliability given a fixed
mission time.
e) The fourth tab (evaluate models) allows the user to:
1) Apply goodness of fit measures such as the Akaike information criterion (AIC) and predictive
sum of squares error (PSSE).
2) Rank models in a table according to their performance on goodness of fit measures, while also
reporting raw numerical values of these measures.
The available personnel for this SFMEA are two software reliability subject matter experts. Both are
experts with SRG models and one is an expert in SFMEA. Since there is no design documentation and no
design engineers available for the analysis, the interface and detailed SFMEA viewpoints can be
16
Excel is a registered trademark of Microsoft Corporation in the United States and/or other countries. This information is given for
the convenience of users of this standard and does not constitute an endorsement by the IEEE of these products. Equivalent products
may be used if they can be shown to lead to the same results.
210
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
eliminated. Since this is a new software system, the maintenance SFMEA can be eliminated from scope.
The serviceability is not applicable at this time since no installation scripts exist. Since there are no defined
use cases the usability viewpoint is eliminated. Since this is a brand new software version, there is no past
history for which to perform the process SFMEA. The only applicable viewpoint is therefore the functional
viewpoint.
There are no safety-related or safety-critical features. Since there is no existing code there is no assessment
of which parts of the software are high risk from a development point of view. The project manager is most
concerned with the stability of the numerical routines to fit models and the GUI logic that could lead to
mishandling of data. So, the risk assessment is based on this concern and on the features that are most
critical for the functionality required by the user.
The failure modes from CASRE are identified as follows and the FDSC is developed from this.
The failure definition and scoring criteria (FDSC) is defined as follows. There are 3 levels of criticality:
a) The results are not accurate; the results cause an overflow; no results are generated when the
selected model should generate results; the software crashes prior to generating a result, results are
generated when the model should not be used; the results of the wrong model are displayed.
b) The software takes too long to generate a result or the user has to perform too much manual labor
to use the software.
c) Any other defects.
1 or 2—Will mitigate
3 or 4—Should mitigate
5 or 6—Will mitigate if time allows
The specifications are analyzed to determine which are more critical than the others. The analysts
determine that the flow of data that leads to the results starts with tab 1 and ends at tab 4. Hence any serious
defects in tab 1 will affect the results of every other tab. Since the results is the most critical and since past
history on the CASRE tool indicates that most of the problems were due to a lack of filtering on the input
data, it is determined that the following two functional requirements are at this point in development the
most critical.
Specify an input file with either inter-failure, failure time, or failure count data in Excel or CSV
format.
211
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Execute test such as the Laplace trend test and running arithmetic average to assess if the data set
exhibits reliability growth.
The functional SFMEA template from Table A.5 is copied into the SFMEA worksheet. The two
specification statements are copied into the first column. The applicable functional failure modes are also
copied into the worksheet Table F.2 Step 1.
The five failure modes are analyzed with respect to this requirement and it is determined that “Faulty data”
and “Faulty error handling” are the most applicable. Three root causes of faulty error handling are
identified while brainstorming. The file may be valid but not have failure data in it. The file may also be in
use or the file may have more than one data format in it. Five possible root causes for faulty data are also
identified. Regardless of the format the software expects the cumulative failure count to be increasing over
time. If it is decreasing that would indicate faulty data. Similarly test time can also not be decreasing and
the time between failures cannot be non-positive. There also cannot be zero data points. At the other end of
the spectrum there have to be enough data points for the software to analyze. At this point it is assumed that
2 or 3 data points are needed as a minimum, but this will be revisited in the mitigation subclause The
results of this step are shown in Table F.3 SFMEA Step 2.
212
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Next the faulty sequencing failure mode is analyzed. The overall results will be erroneous if the Laplace
test is coded such that the code that does the Laplace test happens to be executed before the code that
checks the data format. Since there is no design and no data flow diagrams then it is assumed that the code
can be developed with the wrong sequence of operations. Next the faulty data failure mode is analyzed.
From past history on CASRE it is known by the experts that if the input data has both decreasing and then
increasing reliability growth that is in a U shape that the requirement fail. It can generate both a false
positive and a false negative. Similarly, the data may be N-shaped in that it may be increasing and then
decreasing reliability growth. It is also known that the reliability growth trend may be S shaped, which is
both U and N shaped. That shape can be increasing to decreasing.
213
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The analysts proceed to identifying the consequences of requirement 2a. The local effect is the effect on the
software itself. The effects on the software itself are either a software crash or the Laplace transform is not
working. At the system level the effects are either that the user needs to select another file or that the results
will be unpredictable. As defined in the FDSC unpredictable results is a severity 1 defect while
inconvenience is a severity 3. The likelihood is analyzed for each, based on expert knowledge. The RPN is
then computed. There are four rows that would require mitigation and two rows that should be mitigated.
The results of this step are shown in Table F.5.
214
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Likelihood
Potential R
Severity
Local Preventive
failure Potential root cause System effect P
effect measures
mode N
Faulty The user selects a file that does Software The user has an Inconvenience 3 2 6
error not have any valid format. It is a crashes opportunity to
handling CSV or Excel file but it does select a valid
not have failure data in it. file.
Faulty The input file is already in use. Software The user may not Inconvenience 3 2 6
error crashes know that the file
handling is in use or why
the software
crashed.
Faulty The end user has more than one Laplace Unpredictable Confidence 1 3 3
error format in the same file. might not outcome on level on data
handling work Laplace test.
Faulty data Failure count is not increasing. Laplace Unpredictable Confidence 1 2 2
might not outcome on level on data
work Laplace test.
Faulty data Time is not increasing. Laplace Unpredictable Confidence 1 2 2
might not outcome on level on data
work Laplace test.
Faulty data Interfailure time is not positive. Laplace Unpredictable Confidence 1 2 2
might not outcome on level on data
work Laplace test.
Faulty data There are 0 data points. Laplace Unpredictable Confidence 1 3 3
might not outcome on level on data
work Laplace test.
Faulty data There are fewer than (minimum Laplace Unpredictable Confidence 1 2 2
required) data points. might not outcome on level on data
work Laplace test.
Requirement 2c is analyzed for consequences. For each failure mode and root cause the effects on the
software are either a wrong or missing result. The effect on the user is that they are either allowed to use a
model that they should not use or they are not allowed to use a model that they can use. The likelihood is
assessed based on expert knowledge of the input data. There is one failure mode and root cause that has an
RPN of 1 and all are required for mitigation.
215
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Likelihood
Potential R
Severity
Local Preventive
failure Potential root cause System effect P
effect measures
mode N
Faulty error False positive result Result is The user will be allowed to Confidence 1 2 2
handling wrong use the models when they level on
should not data
Faulty error False negative result Result is The user is not allowed to use Confidence 1 2 2
handling wrong the model and should be level on
data
Faulty error Reliability growth is Result is The user will be allowed to Confidence 1 2 2
handling neither positive nor wrong use the models when they level on
negative should not data
Faulty data No result is generated at No result The user will be allowed to Confidence 1 2 2
all use the models when they level on
should not data
Faulty It takes too long to No result The user will be allowed to Confidence 1 2 2
timing generate a result (too use the models when they level on
many data points) should not data
Faulty Software runs Laplace No result The user will be allowed to Confidence 1 2 2
sequencing test before data format use the models when they level on
is checked should not data
Faulty data U-shaped data has both Result is The user will be allowed to Confidence 1 2 2
+ and –, which wrong use the models when they level on
generates false positive should not data
Faulty data U-shaped data has both Result is The user is not allowed to use Confidence 1 2 2
+ and –, which wrong the model and should be level on
generates false negative data
Faulty data N-shaped data causes Result is The user will be allowed to Confidence 1 1 1
false positive wrong use the models when they level on
should not data
Faulty data N-shaped data causes Result is The user is not allowed to use Confidence 2 1 2
false negative wrong the model and should be level on
data
Faulty data S shaped (U and N) Result is Unpredictable outcome on None 1 2 2
wrong Laplace test
Faulty data Decreasing S shaped Result is Unpredictable outcome on None 1 2 2
wrong Laplace test
Faulty data Increasing S shaped Result is Unpredictable outcome on None 1 2 2
wrong Laplace test
The corrective action for each of the failure modes and root causes is analyzed one at a time. The RPN is
adjusted for any item that can be mitigated. Several columns have been removed to fit page width. Most of
the failure modes and root causes can be fixed by either a change to the specification or by testing that
scenario and then modifying the specification, design and code appropriately. One of the root causes cannot
be corrected so it will be clearly identified in the user’s manual to reduce the risk of it happening. There are
216
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
several corrective actions that are similar. In the next step these will be consolidated. The results for this
step are shown in Table F.7.
Likelihood
R
Severity
Compensating
Potential root cause System effect Corrective actions P
provisions
N
The end user has Unpredictable It is difficult for the software The user can fix 1 3 3
more than one format outcome on to detect if the user mixed the file but only if
in the same file Laplace test formats so make it clear in they know about
the user manual not to do it.
this.
Failure count is not Unpredictable Modify the spec to define The user can fix 1 4 4
increasing outcome on what the software should do the file but only if
Laplace test if the fault count is irregular. they know about
it.
Time is not Unpredictable Modify the spec to define The user can fix 1 4 4
increasing outcome on what the software should do the file but only if
Laplace test if the time count is irregular. they know about
it.
Interfailure time is Unpredictable Modify the spec to identify The user can fix 1 4 4
not positive outcome on what the software should do the file but only if
Laplace test in this case. they know about
it.
There are 0 data Unpredictable Modify the spec to identify The user can fix 1 4 4
points outcome on what the software should do the file but only if
Laplace test in this case. they know about
it.
There are fewer than Unpredictable Identify the fewest number The user can fix 1 4 4
(minimum required) outcome on of data points that can be the file but only if
data points Laplace test used for a trend and then they know about
write code to advise user that it.
they do not have enough data
points.
False positive result The user will be Run many sets of different None. 1 4 4
allowed to use data and verify the output of
the models when the Laplace independently of
they should not the other results.
False negative result The user is not Run many sets of different None 1 4 4
allowed to use data and verify the output of
the model and the Laplace independently of
should be. the other results.
Reliability growth is The user will be Modify the code to handle None 1 4 4
neither positive nor allowed to use this case.
negative the models when
they should not.
No result is The user will be Test many data sets to see if None 1 4 4
generated at all allowed to use this ever happens
the models when
they should not.
It takes too long to The user will be Test very large data sets to None 1 4 4
generate a result (too allowed to use see how long it takes to get
many data points) the models when an answer
they should not
217
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Likelihood
R
Severity
Compensating
Potential root cause System effect Corrective actions P
provisions
N
The SFMEA is now sorted in order of RPN. The corrective actions that are similar are consolidated. There
are now a total of 12 corrective actions of which 10 will be mitigated. The final SFMEA is shown in
Table F.8.
218
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Likelihood
R
Severity
Compensating
Potential root cause System effect Corrective actions P
provisions
N
The end user has more Unpredictable It is difficult for the The user can fix 1 3 3
than one format in the outcome on software to detect if the the file but only if
same file Laplace test user mixed formats so they know about it.
make it clear in the user
manual not to do this.
Failure count is not Unpredictable Modify the spec to define The user can fix 1 4 4
increasing outcome on what the software should the file but only if
Laplace test do if the fault count is the user knows
Time is not increasing
irregular. about it.
219
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Likelihood
R
Severity
Compensating
Potential root cause System effect Corrective actions P
provisions
N
Problem: A system is being conceived that will be a successor to an existing system. The existing system
was first deployed 7 years ago. The average MTBF was 300 h for the entire system. Software failures were
40% of the total failures for the existing system. The goal is to derive a system MTBF objective for the new
system.
Predicted result: On the existing system one can compute the software and hardware MTTF by applying the
40% and 60% to the known 300 hardware MTBF. Therefore the existing system’s software average MTBF
is 750 h while the hardware MTBF is averaging 500 h. Since the software was developed 10 years ago, one
will first compute its relative size when compared to the existing system. If the software size grows 10% a
year it will be about 2 times larger after 7 years. Size is inversely proportional to MTBF. Hence that means
that the software MTBF will likely be 375 h if the development practices and all other parameters remain
220
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
the same other than the size. If one assumes that the hardware MTBF will be approximately the same as the
existing system, then the new specification is approximately equal to: 1/((1/375) + (1/750)) = 214 h.
These examples are for 5.3.2.3, Steps 1 through 5, and for 6.2.
Step 1. Predict the defect density. The Shortcut model was selected to predict defect density because it
has more parameters than the lookup tables, which allows for sensitivity analysis. This example shows the
reliability figures of merit for software that is the very first version for a particular product. Both the
hardware and software are brand new. There is no reused code. There are 12 software engineers who all
report to one software lead engineer. Some of the software engineers are in California and some are in
Texas. All 12 software engineers have been with the organization for several years and no turnover is
expected. There are not any short-term contractors or subcontractors. There is one COTS vendor. The
software is a military system that will be operated by persons who typically have a high school degree or
Bachelor’s degree. The system will be used almost anywhere in the world, and it can take days to reach the
equipment to perform a software upgrade. The time between the start of requirements and deployment is
expected to be 2 years. According to the software development plan, there will be a waterfall type software
life cycle. Software engineers are required to test their own code prior to delivery to software and systems
testing and the software testing starts when all code is complete. Table F.9 shows an example set of
answers from filling out the Shortcut Model Survey.
221
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
All of the code is new and is predicted to be 102 KSLOC of object oriented code. The EKSLOC is
therefore 102 since all code is new. The normalized EKSLOC = KSLOC × language type factor =
102 × 6 = 612. There is one COTS component, which is 13 529 kB installed. It has been mass deployed for
2 years. As per B.1.2, the effective normalized EKSLOC is therefore = kB (13 529) × mass produced factor
(0.01) × 0.051 (conversion from kB to KSLOC) × 3 (language factor) = 20.699 EKSLOC. The total
normalized EKSLOC is therefore 632.699. The COTS vendor has an unusually good relationship with the
development team and has also answered the shortcut survey and had the same result of
0.239 defects/KSLOC.
The total predicted defects = 632.699 × 0.239 = 151.011 defects. Obviously defects cannot be a fractional
value but the fractional value is retained so that the estimated defects per month can be accurately estimated
as shown next.
Step 2. The fault profile is predicted. The exponential model discussed in 6.2.2.1 is employed because
typical growth rates for the Duane Model or the AMSAA Crow Model are unknown. The growth rate (Q) is
estimated to be 6 and the growth period is estimated to be 48 months since there will be several installed
sites but the software will not be mass deployed. The next feature drop is schedule one year after the
deployment of this version. Hence, the reliability growth is limited to only one year.
The predicted fault profile over the 12 month growth period for each month i =
151.111 × (e(–6×(i-1)/48) – e(–6×i/48))
Table F.10 shows the total faults predicted for each month of growth.
Step 3. The failure and MTBF, MTBCF are predicted. The duty cycle for the first 12 months of this
software release is expected to be continuous. Based on past history the 10% of all operational faults were
of a critical severity. There are not estimates of the percentage of faults that historically resulted in system
aborts or essential function failures. The predicted faults to be observed for each month are divided by the
predicted duty cycle for each month to yield the predicted failure rate as shown in Table F.11.
222
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Example: Continuing from the previous example, the system will have a mission time of 8 h. When solving
for reliability (8 h mission) for the array of predicted failure rates the results are shown in Table F.12.
Example: Continuing from the previous example, the MTSWR is predicted to be 1 h as calculated in
Table F.13.
223
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
When solving for availability for the array of predicted MTBF values the results are shown in Table F.14.
Example: Assume that there are three incremental releases of the same effective size. This is a very
realistic scenario if an organization has the same number of people on each increment and the calendar time
is the same for each increment and the difficulty of the code to be implemented is similar. In each
increment it is predicted that there will be 50 EKSLOC of code that is a hybrid of object oriented and
second generation language. The defect density is predicted to be 0.09 defects/EKSLOC of normalized
code. The system is predicted to be deployed to several sites but not mass distributed so the growth rate is
predicted to be 6 and the growth period is predicted to be 48 months. The software is expected to operate
continually once deployed, which means that the duty cycle per month will be 730 h. The first increment is
scheduled for completion on January 1 of 2016, the second increment on July 1 of 2016, and the third and
final increment on January 1 of 2017. On January 1 of 2018 the next major release is scheduled. Hence the
predictions will only extend to January 1 of 2018.
Scenario #1: The requirements are defined up front and each increment is a design/implementation
increment. As per the preceding steps the defects predicted in each increment are summed and equal 60.75.
The operational MTBF is then computed based on the total defects from all increments as shown in
Table F.15.
224
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The predicted MTBF and fault profile for the 12 months of operational deployment are shown in
Table F.16. The predicted MTBF prior to the next major release is about 404 h.
Scenario #2: In the following scenario, Table F.17, the requirements are developed incrementally as well
as the design and code. Hence, the fault profile is computed for each increment independently starting from
the deployment of the first increment on January 1, 2016.
225
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Table F.18 shows the predicted MTBF and fault profile for each of the increments layered. Note that the
predicted MTBF for December 1 of 2017 is higher than the predicted MTBF for scenario 1. That is because
the increments are based on independent software requirements, which can be tested prior to transitioning
to the next increment.
Continuing from F.3.1. The system reliability objective is 214 h. That system requirement needs to be
allocated to each of the software and hardware LRUs. There are 5 software CSCIs and 5 hardware
configuration items. The first step is to perform a bottom-up analysis. The MTBF and failure rate for each
of the 10 configuration items is calculated using the best available hardware and software models. The
predicted results are shown in Table F.19:
226
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Table F.19—Predicted MTBF and failure rate for each configuration item using
bottom-up analysis
Predicted MTBF Predicted failure Allocation for each
Configuration item
(h) rate configuration item
SW 1 1940 0.000515 0.000474
SW 2 1650 0.000606 0.000558
SW 3 1595 0.000627 0.000577
SW 4 1723 0.00058 0.000534
SW 5 1489 0.000672 0.000618
HW 1 2700 0.00037 0.000341
HW 2 2500 0.0004 0.000368
HW 3 2300 0.000435 0.0004
HW 4 2540 0.000394 0.000362
HW 5 2120 0.000472 0.000434
Total 197.1994 0.00507101 0.004666667
Requirement 214.2857143 0.004666667 0.004666667
Difference between predicted 8%
and required
The allocations down to each of the LRUs are computed as a function of the top-level requirement of
214 h. Each prediction is offset by 8% to yield the allocation for each configuration item. This method
allocates to each LRU proportionally to that LRUs contribution to the system.
The second method of allocation is to allocate from the hardware allocation down to the hardware LRUs
and from the software allocation down to the software allocations. This is shown in Table F.20 and
Table F.21. The major problem with this allocation is that if the top-level allocations were not based on an
achievable prediction the resulting allocations for the software or hardware components may be out of
proportion with what is relatively achievable. In other words, some components have more cushion,
relatively speaking, than others.
Table F.20—Predicted MTBF and failure rate for each SW configuration item—
second method
Predicted MTBF Predicted failure Allocation for each
Configuration item
(h) rate configuration item
SW 1 1940 0.000515 0.000458
SW 2 1650 0.000606 0.000539
SW 3 1595 0.000627 0.000557
SW 4 1723 0.00058 0.000516
SW 5 1489 0.000672 0.000597
Total 197.1994 0.00507101 0.002667
Requirement 375 0.002667 0.002667
Difference between predicted 11.111%
and required
227
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Table F.21—Predicted MTBF and failure rate for each HW configuration item—
second method
Predicted MTBF Predicted failure Allocation for each
Configuration item
(h) rate configuration item
HW 1 2700 0.00037 0.000357
HW 2 2500 0.0004 0.000386
HW 3 2300 0.000435 0.00042
HW 4 2540 0.000394 0.00038
HW 5 2120 0.000472 0.000456
Total 197.1994 0.00507101 0.002
Requirement 214.2857143 0.004666667 0.002
Difference between predicted 3.4283%
and required
This example continues from F.3.2. The predicted MTBF is determined to be too big to meet the system
allocation. The software group first revisits the defect density prediction model to see if there are any trade-
offs that are applicable. As per the model, a net score of 4 is needed for the prediction to be upgraded from
medium to low risk. Currently the number of strengths = 4 while the number of risks = 3.25. So, the net
score that is now predicted to be 4 – 3.25 = 0.75 needs to be increased by 3.25 to reach the low risk
designation, which is needed to reduce the defect density.
Some of the items cannot be changed such as #3, #9, #10, #14, and #15 in the strengths section. In the risks
section numbers #1, #2, #4 and #6 cannot be changed. The items shaded green are optimized. That leaves
items #2, #7, #11 in the strength section and #5 in the risk section.
The group realized immediately that mitigating number #5 from the risk section does not require additional
people or calendar time—it requires only a change to the “throw over the wall” culture. They also realize
that they can mitigate #11. Their software lead would like to promote one of the senior developers so that
each has five direct reports. This would allow for smaller group sizes, which has been shown to positively
correlate to fewer defects due to less complex communication paths.
They need only one more mitigation to reduce the predicted risk from medium to low. They know that
there is nothing that they can do in the immediate timeframe to resolve the fact that they have developers in
two different parts of the country. So, they focus on #2 in the strengths section. They realize that by
employing an incremental development model instead of a waterfall model they can complete the required
features in a shorter period of time and reduce the overall risk. They decide to mitigate three development
practices.
The adjusted score is now = 4 assuming that these mitigations are made as planned. The new predicted
defect density = 0.1108 defects/normalized EKSLOC. This is less than half of the original prediction of
0.239 defects/normalized EKSLOC.
228
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
They now plan on the predicted MTBCF = 820 h at initial delivery. Next they review the system reliability
block diagram (RBD). They notice that their one CSCI is supporting three multiple hardware interfaces. As
shown in Table F.23, the reliability prediction for one of the subsystems is below the required 90%.
They revise the system prediction to assume that the one software CSCI is split into three independent
CSCIs. The updated RBD shows that with the cohesive design, the reliability for each component is now
predicted to be at least 90%. The cohesive architecture also supports their software team, which is co-
located across two time zones. Now that they are not writing code for the same CSCI, they are able to work
independently with less risk. The updated results are shown in Table F.24.
229
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Revisit the printer example in F.1.2. As a minimum the OP can be used to increase the testing focus in the
areas that are most likely to be exercised by the end users. The faxing, for example, is about 20% of the
profile while almost half of the profile is printing. Therefore, the amount of test focus should be relatively
in line with that percentage. The same applies to the customer modes and user modes. The test suite should
exercise the major functions with roughly the following percentages, which are all derived and computed
from the OP shown in 5.1.1.3. Keep in mind that some modes may take longer to execute or be a higher
development risk so the word “roughly” is emphasized.
Printing 48.58%
Scanning 30.93%
Faxing 20.50%
When printing is tested the paper used should be roughly as follows. When scanning is tested, the test effort
should be approximately 55% for auto feed and 45% for manual. When the fax mode is tested the test effort
should be roughly 7% for autodial and 93% for manual dial.
As far as what the end users are actually printing, scanning and faxing, this should be thoroughly
investigated. The actual media tested should be roughly as shown as follows. For example, what it is that
high-tech small business people are printing, scanning, and faxing should be approximately 42% of the
media tested. How big are the documents? How many pages? Are there any images or are the documents
exclusively text? Similarly the high-tech professionals can be surveyed to determine what kinds of
documents they are printing in 11×17. The faxing accounted for 19% of the OP. The professionals and
copy shop employees can be queried to determine the typical length of the fax as well as how many faxes
they send per day. It is almost certainly not sufficient to simply test a 1-page document either in the
printing mode, scanning more or faxing mode.
Professionals 28.0%
High-tech small businesses 42.0%
Walk-in copy shop 7.5%
Copy shop employee 22.5%
Example: System Under Test is an elevator simulator developed in LabView. See Lakey, Neufelder [B45],
[B46], Lakey [B47].
230
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Output Interface
• Floor Position indicator
231
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Identify Outputs
Floor Position
Arrive at Floor 0
Arrive at Floor 1 Up
Arrive at Floor 1 Down
Arrive at Floor 2 Up
Arrive at Floor 2 Down
Arrive at Floor 3
Stop at Floor 0
Stop at Floor 1
Stop at Floor 2
Stop at Floor 3
Application Responses
Application Started
Application Stopped
Identify Operating Variables
Current Position [0, 0-1, 1, 1-2, 2, 2-3, 3]
Current Direction [Stationary, Up, Down]
Motion Status [not moving, moving]
Travel Timer [inactive, active]
Floor Timer [inactive, active]
High Floor Selected [0, 1, 2, 3, null]
Low Floor Selected [0, 1, 2, 3, null]
Cabin Floor 0 [not selected, selected]
Cabin Floor 1 [not selected, selected]
Cabin Floor 2 [not selected, selected]
Cabin Floor 3 [not selected, selected]
Floor 0 Up [not selected, selected]
Floor 1 Up [not selected, selected]
Floor 1 Down [not selected, selected]
Floor 2 Up [not selected, selected]
Floor 2 Down [not selected, selected]
Floor 3 Down [not selected, selected]
Specify Behaviors
Input: Press Cabin Floor 0
Operating Variable Constraints:
Cabin Floor 0 = “not selected”
Current Position <> “0”
Behavior:
IF (Cabin Floor 0 Pressed) System WILL Set Cabin Floor 0 = Selected
AND System WILL set Low Floor Selected = 0
IF (High Floor Selected = Null) System WILL Set High Floor Selected = 0
AND Set Current Direction = Down AND Set Motion Status = Moving
AND Set Travel Timer = active
Input: Press Cabin Floor 1
Operating Variable Constraints:
Cabin Floor 1 = “not selected”
Current Position <> “1”
Behavior:
IF (Cabin Floor 1 Pressed) System WILL Set Cabin Floor 1 = Selected
IF (Low Floor Selected <> 0) System WILL Set Low Floor Selected = 1
IF (High Floor Selected <> 2 or 3) System WILL Set High Floor Selected = 1
IF (Current Direction = Stationary) THEN
[IF (Current Location < 1) System WILL Set Current Direction = Up
ELSE System WILL Set Current Direction = Down]
AND Set Motion Status = Moving and Travel Timer = Active
232
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
233
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
234
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
235
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
236
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
237
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
This concludes the specification of the elevator simulator. The resulting specification is a state machine. It
specifies all possible state combinations, all possible transitions between states, and the required response
to all inputs (transitions). It also represents the structure of the OP. To complete the OP specification for
reliability estimation purposes, relative likelihood values should be assigned to each of the inputs in the
specification. This is addressed later in the subclause.
The OP testing process is illustrated as follows. The two main elements of the process are test generation
and test execution. In Figure F.5, tests are generated from a behavior model. These tests are converted to
executable scripts that exercise the system under test in its target environment.
238
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
System
Requirements
Technical Specification
1
Requirements Test Generator
Test Traceability
Cases
Model
Coverage
Test
Scripts
2 Adaptor
Reliability
Create Test Scripts
Test Execution Mean Time
Platform To Failure Application Test Bench
Intensity
System
of Test
Under
Test Test Results Analysis Sys
Test
The next step in the reliability testing process is to convert the software specification into the form of a
behavior model that can be operated on to produce test cases. This is best accomplished with a software
tool. The practitioner may select a commercial tool or develop their own. In either case, the objective is to
represent the state machine specification as a model to automatically generate test cases.
Continuing with the elevator example, a graphical model has been constructed using a commercial tool. 17
The Figure F.6 shows the main operating modes of the elevator simulator. The elevator may either be
stationary, going up, or going down. The second figure, Figure F.7, shows state transitions while the
elevator is in stationary mode. Using the modeling tool, the entire specification defined previously is
precisely replicated so that the Markov Chain Usage Model (MCUM) contains all of the conditions and
stimuli that the elevator control system may be subjected to during operation.
17
The tool illustrated here is for example purposes only; this recommended practice does not promote or sponsor a specific tool.
239
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
240
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The next step in the OP testing process is to generate test cases from the MCUM. An illustration of this step
using a commercial tool is provided in Figure F.8. A number of tests are selected.
241
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The results of random test generation are illustrated in Figure F.9. Notice the coverage numbers.
Fortunately, with a tool that automatically generates random tests based on the OP, one can experiment
with varying numbers of test cases to determine the level of structural model coverage the sample would
achieve. A sample of 100 tests was generated from the elevator model and the following statistical results
were obtained in Figure F.10.
242
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Figure F.10—Sample of 100 tests generated for elevator model with statistical results
Observe that nearly 97% of states and more than 90% of transitions are covered by the 100 tests. If all of
these tests are executed, then one could be confident that the failure rate obtained from executing those
tests would be fairly representative of the population. A larger sample would increase confidence.
The simplified elevator simulator is a relatively basic application. In more complex applications the state
space in a behavioral model will be much larger than this example. Some applications consist of hundreds
of thousands or millions of states. For those systems, achieving 90% structural cover may require thousands
or tens of thousands of test cases. Automation is a necessity for complex systems.
Test cases generated from a modeling tool are sequences of abstract actions. The test cases are not
executable. The abstract actions for the elevator example follow. They include Input and Response actions.
Many tools and methods exist for creating executable test scripts for this purpose.
243
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
A tool for test adapter creation is not illustrated here; refer to the literature for test execution tools.
A recommended methodology for constructing test scripts associated with OP model actions is described in
this subclause. The approach is simply this: all discrete events in an auto-generated test sequence should
have an associated adapter that is a self-contained executable test function. This concept is very basic, but it
is also powerful and enables a project to achieve software reliability testing.
As stated previously, finite state machines for complex systems can be very large. Test cases generated
from those models can be long and highly variable. The sequences of events for almost every test will be
different. As a result, it is not practical to develop and deploy test functions that have dependencies on
other test functions. This means that the order and sequences of events impact the logic in a given test
function; in other words, test actions would need to know the history of previous events. Maintenance of
test cases could become unwieldy. This is not a recommended practice.
Test function independence means that a test function can execute correctly every time based upon the
current state of usage of the system when the event occurs. When a test function is called, it first searches
the global variable space of the SUT and determines current usage, then invokes the action. It also verifies
that the system responds correctly to the event given the current (known) usage state.
To illustrate the idea of self-contained test functions we’ll refer to the elevator example again. Consider the
input “Press Cabin Floor 1.” This may occur anywhere in a randomly generated test case. When called, the
associated test function should first check the current values of certain operating variables, specifically
current location, current direction, high floor selected and low floor selected. Then it should trigger the
action “Press Cabin Floor 1” and implement the logic specified for this action, as follows.
Behavior:
IF (Cabin Floor 1 Pressed) System WILL Set Cabin Floor 1 = Selected
IF (Low Floor Selected <> 0) System WILL Set Low Floor Selected = 1
IF (High Floor Selected <> 2 or 3) System WILL Set High Floor Selected = 1
IF (Current Direction = Stationary) THEN
[IF (Current Location < 1) System WILL Set Current Direction = Up
ELSE System WILL Set Current Direction = Down]
AND Set Motion Status = Moving and Travel Timer = Active
No matter what elevator actions have been previously executed in a test sequence, whenever the Press
Cabin Floor 1 Event occurs the preceding logic is executed and the system response is confirmed.
244
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Let us take one more action, “Travel Timer Expires—Floor 2 Up.” When this test function is called it will
first query the global variable space for the values of Floor 2 UP, Cabin Floor 2, and Floor 2 Down and
High Floor Selected variables. Once this information is obtained it triggers the abstract event (a timer
expiring is tracked internally; there is no concrete trigger from an external interface to the SUT). Then the
test function implements the specified behavior as follows.
Behavior:
IF (Travel Timer Expires at Floor 2 Moving UP) System MUST:
IF ((Floor 2 UP OR Cabin Floor 2 = selected )
OR (Floor 2 Down = Selected AND High Floor Selected = 2)) System MUST
Set Current Position = 2
Set Motion Status = not moving
Set Travel Timer = not active
Set Floor Timer = active
Set Cabin Floor 2 = not selected
Set Floor 2 Up = not selected
Set Low Floor Selected based on status of other buttons
Set High Floor Selected based on status of other buttons
ELSE System WILL continue traveling Up and Set Current Position = 2-3
With this self-contained concept each of the test functions implement, precisely, the specification of the
action associated with that function. All of the functions perform three generic tasks: 1) obtain current
global state, 2) trigger test action, and 3) verify correct response based on behavior specification.
The test functions are implemented in this manner regardless of the test tool used, the test language, or the
communication channels implemented to interface with the system under test. The test adapter for each
function implements the specified behavior requirements for the action. Each test function is executable
against the SUT.
Before any auto-generated test case can be executed, it should be converted to an executable test script. By
employing appropriate tools, this can be achieved fairly easily. A general method that is supported by some
test automation tools involves assigning a unique test function to each transition in the MCUM associated
with an input to the SUT. The test function development environment is natively supported by the
automatic test generation tool. In this way, an auto-generated test is a sequence of test function calls that is
directly executable in the test function execution environment. Auto-generated tests simply need to be
saved in the format that is compatible with the test execution environment. The literature contains case
studies of projects that have successfully utilized commercial test tools for this purpose.
Illustrated as follows, a test model integrated with a set of test function adapters can be utilized to
automatically generate executable test scripts. The basic test building blocks are constructed separately.
The test model contains all possible test sequences. A test execution tool contains a library of all test
functions for a SUT. With these basic elements, an unlimited number of executable tests can be created and
executed, enabling software system reliability to be estimated with a high degree of confidence. Just to
emphasize, the Markov Chain Usage Model (test model) and the self-contained test functions represent the
essential elements needed to perform OP testing. This formula is generic, robust, and repeatable. Projects
that follow this formula can produce high quality reliability estimates.
245
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The final step in the OP testing process is to execute the auto-generated, executable test sequences. For a
given sample, every test in the sample should be executed in the selected test execution environment.
Again, there are many tools to support test execution and test case management. Refer to the literature for
available tools and their applicability.
To support effective and efficient OP testing, the test execution environment should have the capability to
report pass/fail status on every event in a test sequence, and every test in a sample. Offline analysis of test
case results should be avoided to the extent possible so that the maximum information on test result success
can be obtained with minimum effort.
The outcome of executing a sample of tests generated from the test model is quantitative information on
test case successes versus test case failures. This data can be evaluated and manipulated in a number of
ways in order to estimate software reliability. This topic is covered in 6.3 and Annex C.
This subclause has purposely deferred a discussion on variations in an OP. The approach to OP testing is
the same regardless of the assignment of probability values to transitions in the MCUM. Getting the test
model structure correct is paramount to successful OP testing. The distribution of likelihood values across
the model is secondary, though still necessary for software reliability estimation.
246
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
In the elevator example the usage model was created with no special assignment of probability values;
every event in the model was assumed to be equally likely. The software tool that was used for model
development assigns a default value of “normal” to every transition in the Markov chain. See the following.
247
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Refer to the SFMEA example in F.2.1. The results of the SFMEA show some potential root causes for
failure modes. When testing for failure modes, one needs to instigate the particular failure mode and record
the results. Some potential root causes and how to test them are shown in Table F.25.
Increment 1 is developed and tested. The non-cumulative defects found per testing day are plotted. Then
increment 2 is developed and tested. Its non-cumulative defects are plotted per day as well. Based on
Figure F.14 there are two possible options for estimating the software reliability growth. The first is to
combine the defects from both increments and estimate reliability growth. The second is to apply the SRG
models to each increment independently and then merge the estimated defects.
248
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
The exponential model is applied to increment 1 and the estimated inherent defects are computed as
133 of which 66 have been found so far. For increment 2, the defect rate has just recently started to
decrease so the only model that can estimate the inherent defects is the Rayleigh model. The peak occurred
at week 27 when a total of 105 defects had been found: 105 × 2.5 = 263 of which 171 have been discovered
so far.
249
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Refer to example F.3.3. Following is the prediction for defects for each increment. The predicted defect
density is 0.09 defects/normalized EKSLOC and the predicted size for each increment is
50 normalized EKLSOC. There are two variables that should be monitored: first, the prediction for size,
and second, the prediction for defect density. See Table F.26.
The predicted defect density, however, is for operational defects and not testing defects. Hence, to verify
the accuracy of the operational defect density during the testing process, the analyst needs to determine the
typical ratio between testing and operational defects. As follows, the average system testing defect density
in terms of defects per normalized EKSLOC is between 0.056 and 3.062 for software systems that are
predicted to have between 0.0269 and 0.111 defects/normalized EKSLOC. The example system is
predicted to have a defect density of 0.09 defects/normalized EKSLOC, hence, if the prediction is accurate
the testing defect density should be in the range shown in Table F.27.
The first increment of software system testing is complete and the following actual values are measured in
Table F.28.
The predicted size is 30% higher than predicted. The predictions for future increments should be revised
since each increment was assumed to have the same amount of effective KSLOC. The actual size is used to
compute the actual testing defect density = 941/292.5 = 3.217 defects/normalized EKSLOC. The testing
defect density in increment X is outside of the expected range. This means that either the testing
organization was exceptionally aggressive at discovery faults in the code, or there are more defects in the
code than predicted. If fewer than 17 defects (0.0269 × 292.5) had been found during testing that would
have been an indication that the testing effort is possibly insufficient as it is below the typical expected
defect density.
250
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Refer to the example in 6.3.2.5. The following estimations have been made. So the question is which model
is trending the closest?
In the last week of testing there was 142 h of usage time and 5 failures. Hence the most recent actual
MTBF = 28 h. At this point in time the defect based model is trending the closest to the actual MTBF. The
relative error = (28 – 25.4)/28 = 9.3%.
A software version is being considered for deployment. The software is the very same software illustrated
in 6.3.2.5. What is known or estimated criteria for acceptance is shown in Table F.30:
Table F.30—Criteria for acceptance
Criteria Determination
Adequate defect removal
The current fault rate of the software is not increasing Rate is decreasing as per 6.3.2.5..
If a selected task, the results of any SFMEA indicates that there are no Not selected
unresolved critical items.
The estimated remaining defects does not preclude meeting the reliability 71% estimated removal
goal and/or does not require excessive resources for maintenance.
The estimated remaining escaped defects are not going to result in defect Not predicted
pileup.
Reliability estimation confidence
The relative accuracy of the estimations from 5.4.7 indicate confidence in Yes, model is tracking
the software reliability growth measurements.
Release stability—Reliability goal has been met or exceeded. No specific requirement, however,
the estimate MTBF of 25 h is
concerning.
If a selected task, the RDT indicates “accept.” Not selected
Adequate code coverage
Recommended: 100% branch/decision coverage with minimum and Not selected
maximum termination of loops.
Adequate black box coverage
An OP is developed and validated. Yes
Requirements are covered with 100% coverage. Yes
Every modeled state and transition has been executed at least once. Yes
Adequate stress case coverage Not selected
The risks are therefore as shown in Table F.31. The risk of acceptance is high because of inadequate defect
removal, no measured code coverage and no stress case coverage.
251
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
An organization has deployed a defense related software release Version 1.0. Over the next 4 years, they
collect field trouble reports related to the software. They release versions 2.0, 3.0, and 4.0 after 12, 24, and
36 months respectively. When trouble reports are reported from the field the software engineers investigate
them and determine which release the defect was originally introduced in. They record the dates of every
trouble report as well. Using the graphical techniques in 5.4.4 they can estimate the defect removal to date
for each version. They also have static code analysis tools that can determine the actual effective size of
each version deployed. The data that they have collected is shown in Table F.32:
Version 1 has been deployed for several years and has not experienced a fault in a year. Since its estimated
defect removal is very high one can use this data as historical data for predicting future software releases.
This was a medium risk project so the average defect density of 0.2 is now recorded as an historical data
252
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
point for a medium risk project. It is noted that the Shortcut model prediction for medium risk defect
density is 0.239.
Version 2 has been deployed for 3 years and has not experienced a fault in several months and has a very
high defect removal percentage. So, it is also applicable to use as historical data for a low risk project. Now
there are historical data points for both low and medium risk projects. It is noted that the Shortcut model
prediction for low risk defect density = 0.1109 compared to 0.15 for the historical data. It is decided to
continue to use the Shortcut model to predict low, medium and high risk but the historical data is used for
the actual predicted defect density.
Version 3 and 4 have not been deployed long enough to be used for historical data. Next year the data will
be revisited.
253
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
Annex G
(informative)
Bibliography
Bibliographical references are resources that provide additional or helpful material but do not need to be
understood or used to implement this standard. Reference to these resources is made for informational use
only.
[B1] “A Comparative Study of Test Coverage-Based Software Reliability Growth Models,” Proceedings
of the 2014 11th International Conference on Information Technology: New Generations (ITNG ‘14),
IEEE Computer Society, Washington, DC.
[B2] Ambler, Scott, and Mark Lines, Disciplined Agile Delivery: A Practitioner’s Guide to Agile
Software Delivery in the Enterprise. IBM Press, 2012.
[B3] AMSAA Technical Report No. TR-652, AMSAA Reliability Growth Guide, US Army Material
Analysis Activity, Aberdeen Proving Ground, MD, 2000.
[B4] Beizer, Boris, Software Testing Techniques. Van Nostrand Reinhold, 2nd Edition, June, 1990.
[B5] Binder, Robert V., Beware of Greeks bearing data. Copyright Robert V. Binder, 2014.
[B6] Binder, Robert V., Testing Object Oriented Systems—Models, Patterns, and Tools. Addison-Wesley,
1999.
[B7] Boehm, Barry, et al., Software Cost Estimation with COCOMO II (with CD-ROM). Englewood
Cliffs: Prentice-Hall, 2000.
[B8] Buglione, Luigi, and Christof Ebert, “Estimation Tools and Techniques,” IEEE Software, May/June
2011.
[B9] Chao, B., S. M. Lee, and S. L. Jeng, “Estimating Population Size for Capture-Recapture Data When
Capture Probabilities Vary by Time and Individual Animal,” Biometrics, vol. 48, pp. 201–16, 1992.
Available at http://warnercnr.colostate.edu/~gwhite/software.html.
[B10] Common Weakness Enumeration, A community developed dictionary of software weakness types.
CWE Version 2.6, edited by Steven Christey, Ryan P.Coley, and Janis F Glenn. Kenderdine and Mazella,
Project Lead: Robert B. Martin, copyright Mitre Corporation, 2014. Available at http://cwe.mitre.org/.
[B11] Cutting, Thomas, “Estimating Lessons Learned in Project Management—Traditional,” January 9,
2009.
[B12] DeMarco, Anthony, White Paper: The PRICE TruePlanning Estimating Suite, 2007.
[B13] Department of the Air Force, Software Technology Support Center, Guidelines for Successful
Acquisition and Management of Software-Intensive Systems: Weapon Systems Command and Control
Systems Management Information Systems, Version 3.0, May 2000.
[B14] Duane, J. T., “Learning curve approach to reliability monitoring,” IEEE Transactions on Aerospace,
vol. 2, no. 2, pp. 563–566, April 1964.
[B15] Engineering Design Handbook: Design for Reliability, AMCP 706-196, ADA 027370, 1976.
[B16] Erickson, Ken, “Asynchronous FPGA risks,” California Institute of Technology, Jet Propulson
Laboratory, Pasadena, CA 91109, 2000 MALPD International Conference, September 26–28, 2000.
[B17] Farr, Dr. William, “A Survey of Software Reliability Modeling and Estimation,” NSWC TR 82-171,
Naval Surface Weapons Center, Dahlgren, VA, Sept. 1983.
[B18] Fischman, Lee, Karen McRitchie, and Daniel D. Golorath, “Inside SEER-SEM,” CrossTalk, The
Journal of Defense Software Engineering, April 2005.
254
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
[B19] Goel, B., and Okumoto, K., “Time-dependent error-detection rate for software reliability and other
performance measures,” IEEE Transactions on Reliability, vol. R-28, no. 3, pp. 206–211, 1979.
[B20] Gokhale, S., and K. Trivedi, “Log-logistic software reliability growth model,” Proceedings IEEE
High-Assurance Systems Engineering Symposium, pp. 34–41, 1998.
[B21] Grottke, Michael, Allen Nikora, and Kishor Trivedi, “An empirical investigation of fault types in
space mission system software,” Proceedings 40th Annual IEEE/IFIP International Conference on
Dependable Systems and Networks, pp. 447–456, 2010.
[B22] Grottke, Michael, and Benjamin Schleich, “How does testing affect the availability of aging software
systems?” Performance Evaluation 70(3):179–196, 2013.
[B23] Grottke, Michael, and Kishor Trivedi, “Fighting bugs: Remove, retry, replicate, and rejuvenate,”
IEEE Computer 40(2):107–109, 2007.
[B24] Grottke, Michael, et al., “Recovery from software failures caused by Mandelbugs,” IEEE
Transactions on Reliability, 2016 (in press)
[B25] Grottke, Michael, Rivalino Matias Jr., and Kishor Trivedi. “The fundamentals of software aging,”
Proceedings First International Workshop on Software Aging and Rejuvenation/19th IEEE International
Symposium on Software Reliability Engineering, 2008.
[B26] Hatton, Les, “Estimating source lines of code from object code: Windows and embedded control
systems,” CISM University of Kingston, August 3, 2005. Available at http://www.leshatton.org/
Documents/LOC2005.pdf.
[B27] Hayhurst, Kelly J., et al., A Practical Tutorial on Modified Condition/Decision Coverage. TM-2001-
210876, National Aeronautics and Space Administration, Langley Research Center, Hampton, Virginia.
2001. Available at http://shemesh.larc.nasa.gov/fm/papers/Hayhurst-2001-tm210876-MCDC.pdf.
[B28] Huang, Yennun, et al., “Software rejuvenation: Analysis, module and applications,” Proceedings
25th International Symposium on Fault-Tolerant Computing, 1995, pp. 381–390.
[B29] IEC 61014:2003 (2nd Edition), Programmes for Reliability Growth. 18
[B30] IEEE P24748-5 (D3 July 2015), IEEE Draft International Standard—Systems and Software
Engineering—Life Cycle Management—Part 5: Software Development Planning. 19
[B31] IEEE Std 610™-1990, IEEE Standard Computer Dictionary: A Compilation of IEEE Standard
Computer Glossaries (withdrawn). 20
[B32] IEEE Std 730™-2014, IEEE Standard for Software Quality Assurance. 21, 22
[B33] IEEE Std 1012™-2012, IEEE Standard for System and Software Verification and Validation.
[B34] IEEE Std 15026-3™-2013, IEEE Standard Adoption of ISO/IEC 15026-3—Systems and Software
Engineering—Systems and Software Assurance—Part 3: System Integrity Levels.
[B35] ISO/IEC 19761:2011, Software engineering—COSMIC: A functional size measurement method. 23
[B36] Jacobsen, Ivar, Grady Booch, and James Rumbaugh, The Unified Software Development Process.
Addison-Wesley, 1999.
18
IEC publications are available from the International Electrotechnical Commission (http://www.iec.ch/). IEC publications are also
available in the United States from the American National Standards Institute (http://www.ansi.org/).
19
Numbers preceded by P are IEEE authorized standards projects that were not approved by the IEEE-SA Standards Board at the time
this publication went to press. For information about obtaining drafts, contact the IEEE.
20
IEEE Std 610-1991 has been withdrawn; however, copies can be obtained from The Institute of Electrical and Electronics Engineers
(http://standards.ieee.org/).
21
The IEEE standards or products referred to in this clause are trademarks of The Institute of Electrical and Electronics Engineers,
Inc.
22
IEEE publications are available from The Institute of Electrical and Electronics Engineers (http://standards.ieee.org/).
23
ISO/IEC publications are available from the ISO Central Secretariat (http://www.iso.org/). ISO publications are also available in the
United States from the American National Standards Institute (http://www.ansi.org/).
255
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
[B37] Jelinski, Z., and Moranda, P., “Software Reliability Research,” Statistical Computer Performance
Evaluation, Freiberger, W., ed., New York: Academic Press, 1972, pp. 465–484.
[B38] Joint Capabilities Integration and Development System (JCIDS) 12 February 2015.
[B39] Jones, Capers, Applied Software Measurement: Assuring Productivity and Quality. McGraw-Hill,
June 1996.
[B40] Jones, Capers, “Methods Needed to Achieve >99% Defect Removal Efficiency (DRE) for
Software,” Draft 2.0, August 10, 2015, Namcook Analytics LLC, Copyright © 2015 by Capers Jones.
[B41] Jones, Capers, “Software Industry Blindfolds: Invalid Metrics and Inaccurate Metrics”; Namcook
Analytics, November 2005.
[B42] Jones, Capers, “Software Risk Master (SRM) Sizing and Estimating Examples,” Namcook Analytics
LLC, Version 10.0, April 29, 2015.
[B43] Keene, S. J., “Modeling software R&M characteristics,” Parts I and II, Reliability Review, June and
September 1997. [The Keene-Cole model was developed in 1987 based on 14 data sets. It has not been
updated since that time.]
[B44] Kenny, G., Estimating defects in commercial software during operational use,” IEEE Transactions
on Reliability, vol. 42, no. 1, pp. 107-115, March 1993.
[B45] Lakey, Peter, and A. M. Neufelder, System Software Reliability Assurance Guidebook, Table 7-9,
1995, produced for Rome Laboratories.
[B46] Lakey, Peter, and A. M. Neufelder, System and Software Reliability Assurance Notebook, Rome
Laboratory, 1997.
[B47] Lakey, Peter, “Operational Profile Development,” 2015. Available at https://www.scribd.com/
doc/279880170/Operational-Profile-Development
[B48] Lakey, Peter, “Operational Profile Testing,” 2015. Available at https://www.scribd.com/
doc/279880252/Operational-Profile-Testing.
[B49] Lakey, Peter, “Software Reliability Assurance through Automated Operational Profile Testing,”
November 6, 2013.
[B50] Laplante, Phillip B., “Real Time Systems Design and Analysis—An Engineer’s Handbook,”
pp. 208–209, IEEE Press, Piscataway, NJ, 1992.
[B51] Larman, Craig, Agile and Iterative Development. Addison-Wesley Professional, 2004.
[B52] Mars rover; see http://www.cs.berkeley.edu/~brewer/cs262/PriorityInversion.html.
[B53] McCabe, Thomas, Structured System Testing, 12th edition, McCabe & Associates, Columbia, MD,
1985.
[B54] MIL-HDBK-338B, Military Handbook: Electronic Reliability Design Handbook, October 1, 1998. 24
[B55] MIL-HDBK-781A, Military Handbook: Reliability Test Methods, Plans, and Environments for
Engineering, Development Qualification, and Production (01 APR 1996).
[B56] MIL-STD 1629A, Procedures for Performing a Failure Mode, Effects and Criticality Analysis,
November 24, 1980.
[B57] Moranda, P., “Event-altered rate models for general reliability analysis,” IEEE Transactions on
Reliability, vol. 28, no. 5, pp. 376–381, Dec. 1979.
[B58] Musa, J. D., and Okumoto, K., “A logarithmic Poisson execution time model for software reliability
measurement,” Proceedings of the Seventh International Conference on Software Engineering, Orlando,
FL, pp. 230–238, Mar. 1984.
24
MIL publications are available from the U.S. Department of Defense (http://quicksearch.dla.mil/).
256
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
[B59] Musa, J. D., B. Iannino, and K. Okumoto, Software Reliability: Measurement, Prediction,
Application. New York: McGraw-Hill, pp. 156–158, 1987.
[B60] Musa, J. D., “Operational Profiles in Software Reliability Engineering,” AT&T Bell Laboratories.
IEEE Software, March 1993, and “The operational profile in software reliability engineering: an overview,”
in Third International Symposium on Software Reliability Engineering, 1992.
[B61] NASA-GB-8719.13, NASA Software Safety Guidebook, 6.6.5 and 7.5.14, March 31, 2004. 25
[B62] NASA/SP-2007-6105 Rev1, NASA Systems Engineering Handbook, Section 4.0.
[B63] Neufelder, A. M., “A Practical Toolkit for Predicting Software Reliability,” ARS Symposium, June
14, 2006, Orlando, Florida. Copyright Softrel, LLC 2006.
[B64] Neufelder, A. M., “Effective Application of Software Failure Modes Effects Analysis,” A CSIAC
State-of-the-Art Report, CSIAC Report Number 519193, Contract FA8075-12-D-0001, Prepared for the
Defense Technical Information Center, 2014.
[B65] Neufelder, A. M., “Four things that are almost guaranteed to reduce the reliability of a software
intensive system,” Huntsville Society of Reliability Engineers RAMS VII Conference, November 4, 2014.
Copyright 2014.
[B66] Neufelder, A. M., “Software Reliability for Practitioners,” Technical Report by Softrel, LLC,
November, 2015.
[B67] Neufelder, A. M, “Software Reliability Toolkit for Predicting and Managing Software Defects,”
November 2010.
[B68] Neufelder, A. M., “The Cold Hard Truth about Reliable Software,” edition 6e, originally published
in 1993 and updated to version 6e in 2015. [This document describes the lookup tables for defect density as
well as the data that was used to compute the average defect densities in the table.]
[B69] Neumann, Peter G., and Donn B. Parker; “A summary of computer misuse techniques,” Proceedings
of the 12th National Computer Security Conference, pp. 396–407.
[B70] Pohland, Timothy, and David Bernreuther, Scorecard Reviews for Improved Software Reliability,
Defense AT&L, Jan-Feb 2014.
[B71] Putnam, Lawrence H., Measures for Excellence. Yourdon Press, 1992.
[B72] Quanterion Solutions Inc., “Handbook of 217Plus™:2015 Reliability Prediction Models” (Dec. 15,
2014) and “217Plus™:2015 Calculator.”
[B73] Radio Technical Commission for Aeronautics (RTCA), Software Considerations in Airborne
Systems and Equipment Certification, DO-178C, 12/13/11.
[B74] Rational Unified Process available at http://en.wikipedia.org/wiki/Unified_Process.
[B75] Reinder, J. Bril, “Real-Time Architectures 2006/2007, Scheduling policies—III Resource access
protocols” (courtesy of Johan J. Lukkien). Available at http://www.win.tue.nl/~rbril/education/
2IN20/RTA.B4-Policies-3.pdf.
[B76] Rexstad, F., and K. P. Burnham, User’s Guide for Interactive Program CAPTURE, Colorado
Cooperative Fish & Wildlife Research Unit, Colorado State University, Fort Collins, Colorado, 1991.
[B77] Science Applications International Corporation & Research Triangle Institute, Software Reliability
Measurement and Testing Guidebook, Final Technical Report, Rome Air Development Center, Griffiss Air
Force Base, New York, January 1992. Available at http://www.softrel.com/publications/RL.
[B78] Shi, Y., et al., “Metric-based Software Reliability Prediction Approach and its Application,”
Empirical Software Engineering Journal, 2015.
25
NASA publications are available from the National Aeronautics and Space Administration (http://www.nasa.gov/).
257
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
[B79] Shi, Y., M. Li, and C. Smidts, “On the Use of Extended Finite State Machine Models for Software
Fault Propagation and Software Reliability Estimation,” 6th American Nuclear Society International
Topical Meeting on Nuclear Plant Instrumentation, Controls, and Human Machine Interface Technology,
Knoxville, Tennessee, 2009.
[B80] Shooman, M. L., and G. Richeson, “Reliability of Shuttle Mission Control Center Software,”
Proceedings of the Annual Reliability and Maintainability Symposium, 1983, pp. 125–135 [best conference
paper].
[B81] Shooman, M. L., Probabilistic Reliability: An Engineering Approach. New York: McGraw-Hill
Book Co., 1968 (2nd edition, Melbourne, FL: Krieger, 1990).
[B82] Shooman, M. L., Reliability of Computer Systems and Networks, Fault Tolerance, Analysis, and
Design. New York: McGraw-Hill, 2002. p. 234. [Dr. Shooman denotes the inherent defects as ET, which is
equivalent to N0.]
[B83] Shooman, M. L., “Software Reliability Growth Model Based on Bohr and Mandel Bugs,”
Proceedings of International Symposium on Software Reliability Engineering, Washington DC, Nov. 2–5,
2015.
[B84] Smidts, C., and M. Li, “Software Engineering Measures for Predicting Software Reliability in Safety
Critical Digital Systems,” NRC, Office of Nuclear Regulatory Research, Washington DC, NUREG/GR-
0019, 2000.
[B85] Smidts, C., et al., “A Large Scale Validation of a Methodology for Assessing Software Quality,”
NUREG report for the US Nuclear Regulatory Commission, NUREG/CR-7042, July 2011.
[B86] Society of Automotive Engineers, Recommended Practice, Software Reliability Program
Implementation Guide, Standard by SAE International, 05/07/2012. [The full-scale model is presented in
this document.] 26
[B87] Society of Automotive Engineers, SAE ARP 5580 Recommended Failure Modes and Effects
Analysis (FMEA) Practices for Non-Automobile Applications, July 2001.
[B88] Software Reliability Engineering: More Reliable Software Faster and Cheaper, Chapter 6: Guiding
Test, 2nd edition, Author House, 2004.
[B89] The Handbook of Software Reliability Engineering, edited by Michael R. Lyu, published by IEEE
Computer Society Press and McGraw-Hill Book Company, ISBN 9-07-039400-8. Available at
http://www.cse.cuhk.edu.hk/~lyu/book/reliability/.
[B90] Tian, J., “Integrating time domain and input domain analyses of software reliability using tree-based
models,” IEEE Transactions on Software Engineering, vol. 21, issue 12, pp. 945–958.
[B91] Tian, J., Software Quality Engineering: Testing, QA, and Quantifiable Improvement. Hoboken: John
Wiley & Sons, Inc., 2005, ISBN 0-471-71345-7. Purification is discussed briefly on p. 384. Available at
http://ff.tu-sofia.bg/~bogi/France/SoftEng/books/software_quality_engineering_testing_quality_assurance_
and_quantifiable_improvement_wiley.pdf.
[B92] US General Accounting Office, GAO-10-706T, “Defense acquisitions: observations on weapon
program performance and acquisition reforms,” May 19, 2010.
[B93] Vesely, W. E., et al., “Fault Tree Handbook NUREG 0492,” US Nuclear Regulatory Commission,
1981.
[B94] Voas, J. M., “PIE: A Dynamic Failure-Based Technique,” IEEE Transactions on Software
Engineering, vol. 18, pp. 717–727, 1992.
[B95] Von Alven, W. H., Reliability Engineering. Englewood Cliffs: Prentice Hall, 1964.
26
SAE publications are available from the Society of Automotive Engineers (http://www.sae.org/).
258
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
IEEE Std 1633-2016
IEEE Recommended Practice on Software Reliability
[B96] White, G. C., et al., User’s Manual for Program CAPTURE. Logan: Utah State University Press,
1978.
[B97] Yamada, S., M. Ohba, and S. Osaki, “S-shaped reliability growth modeling for software error
detection,” IEEE Transactions on Reliability, vol. R-32, no. 5, pp. 475–478, Dec. 1983.
259
Copyright © 2017 IEEE. All rights reserved.
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.
I
EEE
st
andards
.i
eee.
org
Phone:+17329810060 Fax:+17325621571
©IEEE
Authorized licensed use limited to: Michigan Technological University. Downloaded on January 30,2017 at 12:51:58 UTC from IEEE Xplore. Restrictions apply.