0% found this document useful (0 votes)
26 views7 pages

Test Coverage Criteria For RESTful Web APIs

The document presents a set of ten test coverage criteria for RESTful web APIs to assess the effectiveness of testing approaches. These criteria are categorized into input and output coverage, and are organized into eight Test Coverage Levels (TCLs) ranging from TCL0 to TCL7. The proposed framework aims to automate the evaluation of testing techniques based on the overall coverage and TCL achieved by their generated test suites, with validation results indicating a correlation between coverage levels and fault detection.

Uploaded by

Yago Monteiro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views7 pages

Test Coverage Criteria For RESTful Web APIs

The document presents a set of ten test coverage criteria for RESTful web APIs to assess the effectiveness of testing approaches. These criteria are categorized into input and output coverage, and are organized into eight Test Coverage Levels (TCLs) ranging from TCL0 to TCL7. The proposed framework aims to automate the evaluation of testing techniques based on the overall coverage and TCL achieved by their generated test suites, with validation results indicating a correlation between coverage levels and fault detection.

Uploaded by

Yago Monteiro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Test Coverage Criteria for RESTful Web APIs

Alberto Martin-Lopez Sergio Segura Antonio Ruiz-Cortés


amarlop@[Link] sergiosegura@[Link] aruiz@[Link]
University of Seville University of Seville University of Seville
Seville, Spain Seville, Spain Seville, Spain

ABSTRACT the size of popular API directories such as ProgrammableWeb [11]


Web APIs following the REST architectural style (so-called RESTful and RapidAPI [12], which currently index over 21K and 8K web
web APIs) have become the de-facto standard for software inte- APIs, respectively. Contemporary web APIs usually follow the REp-
gration. As RESTful APIs gain momentum, so does the testing of resentational State Transfer (REST) architectural style [7], being
them. However, there is a lack of mechanisms to assess the ade- referred to as RESTful web APIs. RESTful web APIs provide uniform
quacy of testing approaches in this context, which makes it difficult interfaces to interact with resources (e.g. a song in the Spotify API)
to automatically measure and compare their effectiveness. In this via create, read, update and delete (CRUD) operations, generally
paper, we first present a set of ten coverage criteria that allow to through HTTP interactions. RESTful APIs are commonly described
determine the degree to which a test suite exercises the different using languages such as the OpenAPI Specification (OAS), which
inputs (i.e. requests) and outputs (i.e. responses) of a RESTful API. provides a structured way to describe a RESTful API in a both
We then arrange the proposed criteria into eight Test Coverage human and machine-readable way, making it possible to automati-
Levels (TCLs), where TCL0 represents the weakest coverage level cally generate, for example, documentation, source code (clients and
and TCL7 represents the strongest one. This enables the automated servers) and tests. In what follows, we will use the terms RESTful
assessment and comparison of testing techniques according to the web API, web API, or simply API interchangeably.
overall coverage and TCL achieved by their generated test suites. RESTful APIs can be tested using black-box [2, 6, 14] and white-
Our evaluation results on two open-source APIs with real bugs box [1] approaches. The former are usually based on the API spec-
show that the proposed coverage levels nicely correlate with code ification and try to cover all elements and features defined in it,
coverage and fault detection measurements. while the latter typically focus on source code or mutation coverage
measures. While white-box approaches can be easily compared in
CCS CONCEPTS terms of source code coverage, no standardized coverage criteria
exist for black-box. This lack of criteria impedes the comparison of
• Information systems → RESTful web services; • Software
testing techniques and hinders the development of new ones, since
and its engineering → Software testing and debugging; Func-
there is no automated nor easy way to evaluate their effectiveness.
tionality.
In this paper, we present a catalogue of ten test coverage criteria
for RESTful web APIs. Each coverage criterion measures how many
KEYWORDS elements of an API are covered by a test suite, both in terms of test
REST, testing, web services, coverage criteria inputs (i.e. API requests) and outputs (i.e. API responses). To this
ACM Reference Format: end, we took inspiration on the OAS language, which allows to
Alberto Martin-Lopez, Sergio Segura, and Antonio Ruiz-Cortés. 2019. Test describe the functionality of an API in a straightforward manner.
Coverage Criteria for RESTful Web APIs. In Proceedings of the 10th ACM We propose to arrange the coverage criteria into eight Test Coverage
SIGSOFT International Workshop on Automating TEST Case Design, Selection, Levels (TCLs), where TCL0 represents the weakest coverage level
and Evaluation (A-TEST ’19), August 26–27, 2019, Tallinn, Estonia. ACM, New and TCL7 represents the strongest one. These levels constitute a
York, NY, USA, 7 pages. [Link] common framework for the assessment and comparison of testing
techniques for RESTful APIs, called the Test Coverage Model (TCM).
1 INTRODUCTION This framework aims at fully automating the evaluation of testing
Web Application Programming Interfaces (APIs) are key in the de- approaches in this context, based on the overall coverage and TCL
velopment of distributed architectures, as they enable the seamless achieved by their generated test suites. For the evaluation of our
integration of heterogeneous systems. This has, in turn, fostered the approach, we performed experiments on two open-source APIs with
emergence of new consumption models such as mobile, social and real bugs. The results show that the proposed coverage levels nicely
cloud applications. The increasing use of web APIs is reflected in correlate with code coverage and fault detection measurements,
i.e. the higher the TCL that a test suite complies with, the more
Permission to make digital or hard copies of all or part of this work for personal or chances to find faults and the more code will be covered.
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
The remaining of the paper is organized as follows: Section 2
on the first page. Copyrights for components of this work owned by others than ACM introduces the basic concepts regarding RESTful web APIs. Section
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, 3 presents our proposal of test coverage criteria in the context of
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from permissions@[Link]. REST. In Section 4, a set of coverage levels is laid out. Section 5
A-TEST ’19, August 26–27, 2019, Tallinn, Estonia exposes the results on the case study performed to validate our
© 2019 Association for Computing Machinery. proposal. The related work is discussed in Section 6. Finally, Section
ACM ISBN 978-1-4503-6850-6/19/08. . . $15.00
[Link] 7 draws the conclusions and presents future lines of research.

15
A-TEST ’19, August 26–27, 2019, Tallinn, Estonia Alberto Martin-Lopez, Sergio Segura, and Antonio Ruiz-Cortés

2 RESTFUL WEB APIS


RESTful web APIs are usually decomposed into multiple RESTful
web services [9, 13], each of which implements one or more create,
read, update or delete (CRUD) operations over a specific resource.
These operations are usually mapped to the HTTP methods POST,
GET, PUT and DELETE, respectively. Figure 1 depicts an excerpt
of an OpenAPI specification from a sample API called MyMusic,
containing seven operations. As illustrated, an OpenAPI Specifi-
cation document describes an API in terms of paths, operations,
resources, request parameters and responses. A resource is any type
of information that can be exposed to the Web (e.g. a photo, a HTML
document, information about a book) and it is addressable by a
unique Uniform Resource Identifier (URI). The API described in
Figure 1 provides operations for handling two resources: Songs and
Playlists. A path (also called route or endpoint) represents a resource
over which operations can be performed (e.g. /songs). The term
operation refers to the use of a HTTP method over a specific path.
Below there are some examples of operations on the MyMusic API.
GET /songs?q=rhapsody&type=cover Retrieve all covers that
contain the keyword ‘rhapsody’ in its name.
POST /playlists Create a new playlist.
PUT /playlists/{playlistId} Update an existing playlist, iden-
tified by the playlistId field.
DELETE /playlists/{playlistId} Remove a playlist.
Operations accept parameters. A parameter is a piece of infor-
mation that can be passed together with the request for several
purposes, such as filtering and sorting results (e.g. q and type in
the GET operation shown above). For every operation, several ex-
pected responses can be specified. A response is identified by the
returned status code and can optionally include a body. The status
code determines the result of the operation (e.g. successful or not)
and the response body includes additional information (e.g. a re-
quested resource). For instance, GET /songs in the previous example
could return a 400 status code if the required parameter q was not
included in the request, or a 200 together with a set of results in
the response body if there were no errors in the API call.

3 TEST COVERAGE CRITERIA


In this section, we present a catalogue of coverage criteria for
RESTful web APIs. These criteria are divided into two types: input
criteria (those related to the API requests) and output criteria (those
related to the API responses). It is worth noting that the proposed
criteria can be applied at different levels. Hence, for example, we
could measure the coverage achieved by a test suite in a whole API,
a certain path, or a specific operation. In what follows, we present
each coverage criterion. For the sake of understandability, we will Figure 1: Excerpt of an OpenAPI specification in YAML.
make reference to the test suite shown in Table 1. It is composed
of seven test cases (TCs) for the MyMusic API (Figure 1). Each TC Path coverage. This criterion measures the coverage of a test
is composed of some inputs (i.e. one or more API calls) and one suite according to the paths it exercises. The coverage is computed
expected output (i.e. the last API response obtained). as the number of paths executed divided by the total number of
paths of the API. To achieve 100% path coverage, at least one re-
quest must address each path of the API. For instance, the MyMu-
3.1 Input Coverage Criteria sic API exposes four paths (/songs, /songs/{songId}, /playlists and
This type of criteria measure the degree to which test cases cover /playlists/{playlistId}), so four HTTP requests are needed, one per
elements related to API requests. path, to reach 100% path coverage. There are multiple ways to meet

16
Test Coverage Criteria for RESTful Web APIs A-TEST ’19, August 26–27, 2019, Tallinn, Estonia

Table 1: Test suite for the API MyMusic.


TC Request Expected response
Status code: 200
GET /songs?q=rhapsody&type=original&year=1975
#1 Response body: array of Song objects
Accept: application/json Content-type: JSON
Status code: 200
GET /songs?q=happier&type=all
#2 Response body: array of Song objects
Accept: application/xml Content-type: XML
Status code: 404
GET /songs?q=pwbglauypw&type=cover&year=2020
#3 Response body: Error object
Accept: application/json Content-type: JSON
Status code: 400
GET /songs?type=remix
#4 Response body: Error object
Accept: application/json Content-type: JSON
Status code: 200
GET /songs/1
#5 Response body: Song object
Accept: application/json Content-type: JSON
Status code: 404
GET /songs/99999999
#6 Response body: Error object
Accept: application/xml Content-type: XML
1. POST /playlists (body containing JSON object)
Status code: 200
Content-type: application/json
#7 Response body: Playlist object
2. GET /playlists/1 Content-type: JSON
Accept: application/json

this criterion in the MyMusic API, for instance, with TCs #1, #5 and Parameter value coverage. This criterion measures the coverage
#7. Likewise, TCs #4, #6 and #7 cover all the paths as well. of a test suite according to the parameter values exercised. This
criterion applies only to parameters with a finite number of possible
Operation coverage. This criterion measures the coverage of a
values, namely, booleans and enums. The coverage is computed as
test suite according to the operations executed. The coverage is
the number of different values that parameters are given divided by
computed as the number of operations executed divided by the
the total number of possible values that all parameters can take. To
total number of operations of the API. To achieve 100% operation
achieve 100% parameter value coverage, every boolean and enum
coverage, every path must be sent one request per allowed HTTP
parameter must take all possible values. Nevertheless, it is suggested
verb (GET, POST, PUT or DELETE). Notice that this criterion can
to test multiple values with other types of parameters such as strings
be applied to the entire API or to a specific path. For example, TCs
and integers. There is only one enum parameter in the MyMusic
#1, #5 and #7 reach 57% operation coverage in the MyMusic API,
API: the parameter type in the GET /songs operation. TCs #1, #2,
since they execute 4 out of the 7 operations of the API. At the same
#3 and #4 cover the four values that it accepts (‘original’, ‘all’,
time, TC #1 on its own achieves 100% operation coverage on the
‘cover’ and ‘remix’), and therefore they achieve 100% coverage
/songs path, since only one operation can be performed on it.
under this criterion in the whole API.
Parameter coverage. This criterion measures the coverage of a
Content-type coverage. This criterion measures the coverage of a
test suite according to the operation parameters it uses. The cover-
test suite according to the input content-types used in API requests.
age is computed as the number of parameters used divided by the
This criterion applies only to operations that accept data in the
total number of parameters in the API. To achieve 100% parameter
request body (i.e. POST, PUT and DELETE). The coverage is com-
coverage, all input parameters of every operation must be used
puted as the number of input content-types used divided by the total
at least once. Exercising different combinations of parameters is
number of input content-types across all API operations. To achieve
desirable, but not strictly necessary to achieve 100% of coverage
100% input content-type coverage, for every operation that accepts
under this criterion. The reason for excluding combinatorial cover-
a request body, all data formats (e.g. JSON and XHTML) must be
age criteria (e.g. t-wise [4]) is to ease the development of coverage
tested. This criterion can be applied to each operation individually.
analysis tools. This criterion can also be considered for specific
As an example, the MyMusic API has only two operations that
subsets of the API. As an example, TC #1 achieves 100% parameter
accept a body, namely POST /playlists and PUT /playlists/{playlistId};
coverage for the /songs path, since all parameters are used once.
each of these can process JSON and XML, therefore at least four
Overall, however, the test suite reaches 60% parameter coverage
requests are needed to achieve 100% input content-type coverage
on the entire API, given that 4 out of 10 parameters are not used,
on the entire API. TC #7 reaches 50% coverage of this criterion
because these parameters belong to the three operations not exe-
for the POST /playlists operation, since the JSON content-type is
cuted by the test suite (the query parameter in the operation GET
covered but XML is not.
/playlists, the body and path parameter in PUT /playlists/{playlistId}
and the path parameter in DELETE /playlists/{playlistId}). Operation flow coverage. This criterion measures the coverage
of a test suite according to the sequences of operations it executes.

17
A-TEST ’19, August 26–27, 2019, Tallinn, Estonia Alberto Martin-Lopez, Sergio Segura, and Antonio Ruiz-Cortés

(a) GET /songs/0001 (b) GET /songs/0002

Figure 2: API responses including optional and required properties.

The definition of full coverage of this criterion highly depends on Response body properties coverage. This criterion measures
the API under test. Several proposals exist in the literature about the coverage of a test suite according to its ability to produce re-
the operation flows that should be tested [1, 2, 8, 16], but none of sponses containing all properties of resources. Figure 2 shows two
them is widely accepted and used in industry. For this reason, and API responses containing a Song resource, which is composed of
for the sake of simplicity, we propose to use a simplified version of multiple properties such as the name of the song. The coverage
the flows defined by Arcuri in [1]: for every resource that can be of this criterion is computed as the number of properties obtained
created, at most four operation flows must be tested, namely those divided by the total number of all properties from all objects that
related to its reading (one or several), updating and deletion after can be obtained in API responses. To achieve 100% coverage of
its creation. If the resource is a sub-resource of another one, the this criterion, all properties from all response objects must be ob-
creation of the parent resource must be included in the operation tained. As an example, take the two responses to the request GET
flow. In the MyMusic API, four operation flows can be executed: i) /songs/{songId} shown in Figure 2. Retrieving a single1 (left-hand
create playlist → read one playlist; ii) create playlist → read several side) does not achieve full coverage, since the property album is
playlists; iii) create playlist → update playlist; iv) create playlist → not present in the response. Retrieving a song that is part of an
delete playlist. TC #7 executes the first flow (individual read), so album (right-hand side), by contrast, meets this criterion for this
this criterion is 25% covered. specific response body, since it includes all properties of the Song
object. Intuitively, this criterion can be applied to specific responses
3.2 Output Coverage Criteria individually, like in the MyMusic API, where TC #1 covers this
criterion for the successful response to the GET /songs operation,
This type of criteria measure the degree to which test cases cover
since the response will surely include the same object depicted in
elements related to API responses.
the right-hand side of the previous figure.
Status code class coverage. This criterion measures the cover-
Content-type coverage. This criterion has the same meaning as
age of a test suite according to its ability to produce both correct
the input content-type criterion, but in this case the coverage is
and erroneous responses in the API under test. These responses
measured on the output data formats obtained in API responses.
are typically identified by 2XX and 4XX status codes respectively,
TCs #5 and #6 achieve 100% coverage of this criterion for the GET
however, this may vary depending on the API, therefore the tester
/songs/{songId} operation, since all content-types are obtained in
must define the meaning of correct and erroneous for their particular
the responses, i.e. JSON and XML.
case. It is assumed that every operation should at least return one
successful response, therefore, to achieve 100% coverage of this
criterion, at least one test case per API operation is needed; if every
4 TEST COVERAGE MODEL
operation can return both correct and erroneous status codes, two Inspired by the REST Maturity Model of Richardson [13], we pro-
test cases per operation are needed to reach 100% status code class pose to arrange the previous coverage criteria into eight different
coverage. The coverage is computed as the number of classes of Test Coverage Levels (TCLs), constituting a common framework for
status codes obtained in API responses (maximum two per opera- the assessment and comparison of test suites addressing RESTful
tion) divided by the total number of classes of status codes in the APIs, called the Test Coverage Model (TCM). The goal is to rank test
whole API. In the MyMusic API, the seven operations can return suites based on the TCL they can reach, where TCL0 represents
correct and erroneous responses, therefore fourteen test cases are the weakest coverage level and TCL7 the strongest one. In order to
needed to fully cover this criterion. Overall, the test suite achieves reach a specific TCL, all criteria belonging to previous levels must
36% status code class coverage (5 out of 14 classes covered). At the have been met. This does not mean that criteria from higher levels
same time, TCs #1 and #3 suffice to fulfill this criterion for the GET subsume those from lower levels. Figure 3 illustrates the aforemen-
/songs operation. tioned levels and the test coverage criteria they are comprised of (as
described in the previous section). Note that a distinction is made
Status code coverage. This criterion extends the previous one by between input and output coverage criteria.
considering status codes instead of simply classes of status codes. Level 0 represents a test suite where no coverage criterion is
Therefore, to achieve 100% coverage, all status codes of all opera- fully met. Any test suite, therefore, has TCL0 by default. Level 1 is
tions must be obtained. TCs #1, #3 and #4 achieve 100% coverage the easiest to achieve and the most naive, as it only requires paths
for the GET /songs operation of the MyMusic API, and also for the
/songs path, since it has no more operations. 1A single refers to a song that is released on its own, not as part of an album.

18
Test Coverage Criteria for RESTful Web APIs A-TEST ’19, August 26–27, 2019, Tallinn, Estonia

to be covered. A test suite reaching only level 1 can be thought of


as very weak. In Level 2, all operations must be covered. Level 3
requires that all content-types for every operation are tested, both
input and output. The next three levels mainly focus on parameters
and output criteria. Level 4 requires parameters and status code
classes to be covered. In Level 5, the criteria that must be fulfilled
are parameter values and status codes. In Level 6, it is required that
the response body properties criterion be covered. It is suggested to
cover different combinations of parameters in levels 4, 5 and 6, as
well as new parameter values in levels 5 and 6. Level 7 constitutes
the last stage of the TCM and the hardest to reach. It focuses on
the coverage of operation flows.
As an example, consider the test suite from Table 1 and the TCL
it achieves depending on the elements of the API considered:
Entire API: TCL1. All the four paths are exercised by the test
cases but three out of the seven operations are not executed (GET
/playlists, PUT /playlists/{playlistId} and DELETE /playlists/{playlistId}).
Figure 3: Test Coverage Model: Levels & Criteria.
Path /songs: TCL6. All operations are covered by the test cases
and, for every operation, all their sub-elements are covered as well feature models. We selected these APIs because they have medium
(input and output content-types, parameters, parameter values, sizes and they differ sufficiently from each other, e.g. the first uses
status codes and response body properties). a wide range of path, query and JSON object parameters whereas
the second mostly uses a small range of path parameters.
Operation GET /songs: TCL6. All elements of the operation are
The information about the APIs is summarized in Table 2. The
covered: parameters (q, type and year), parameter values (‘all’,
last column shows the number of real errors identified by Arcuri [1].
‘original’, ‘cover’ and ‘remix’ for the type parameter), input
An error is considered as an operation returning at least one 5XX
and output content-types (JSON and XML), status codes (200, 400
status code as a result of a test case.
and 404) and response body properties (id, name, artist, album,
type and releaseYear). Table 2: Subject APIs.
Operation GET /playlists: TCL1. This operation is not executed Name Classes LoC Operations Errors
by the test cases, so the test suite reaches only level 1, since the
scout-api 75 7479 49 33
path is actually covered by TC #7.
features-service 23 1247 18 14
Operation GET /songs/{songId}: TCL4. Both content-types (JSON
and XML) are covered, all parameters are used at least once (songId) 5.2 Setup
and both classes of status codes are covered (one test case gets a Before going in depth into the results obtained, it is necessary to
200 status code in the response and the other obtains a 404). bear in mind a number of considerations. For the evaluation of TCL4,
and as recommended in most guidelines for the design of RESTful
5 CASE STUDY APIs [9, 13], we defined correct responses as those returning 2XX
We performed a case study with two open-source APIs to validate status codes, and erroneous as those returning 4XX or 5XX status
our proposal of test coverage criteria for RESTful APIs. To this codes. On the other hand, we designed the test suites such that
end, we designed seven test suites for each API, each complying they contained the minimum number of test cases possible to meet
with every TCL defined in the TCM, i.e. fully covering the criteria the criteria of the corresponding TCL. To this end, we did not test
included in every TCL. In doing so, we aim to validate the following special combinations of parameters in TCLs 4, 5 and 6, and we did
hypothesis: The TCL of a test suite is correlated to the number of bugs not add extra test cases to test more than one value per parameter
it is able to detect and the code coverage it achieves. in TCLs 5 and 6, even though the guidelines of the TCM encourage
to do the opposite. By doing this, we intentionally assessed the
5.1 Subject APIs validity of the TCM in a worst case scenario.
We decided to use the same APIs of Arcuri [1] for the empirical Regarding the experiments performed, the test cases were man-
study, so that we could compare our work with a recent, novel ually written in Java, using the JUnit framework and the REST
technique of RESTful APIs testing. We selected two of the five APIs Assured library. The tests were run in the IDE IntelliJ IDEA, which
analysed in his work, named scout-api2 and features-service3 . The allowed us to automatically measure the failures detected and the
scout-api API allows to search for suitable activities for boys and code coverage achieved by the test suites.
girls scouts, and the features-service API allows to manage product
5.3 Results
2 [Link] Table 3 shows the results of our evaluation and a comparison with
3 [Link] Arcuri’s and the test suite provided with each API, for which we

19
A-TEST ’19, August 26–27, 2019, Tallinn, Estonia Alberto Martin-Lopez, Sergio Segura, and Antonio Ruiz-Cortés

Table 3: Evaluation results and comparison between test suites of the TCM, Arcuri and the API developers.
API Test Coverage Model (TCM) Arcuri Existing test suite
TCL Test cases Code coverage Errors Test cases Code coverage Errors Test cases Code coverage Errors
1 21 18% 3
2 49 20% 21
3 49 20% 21
scout-api 4 98 34% 23 177 38% 33 19 41% 0
5 224 35% 29
6 232 40% 33
7 232 40% 33
1 11 35% 7
2 18 39% 13
3 18 39% 13
features-service 4 36 75% 13 50 64% 14 22 78% 2
5 37 76% 13
6 37 76% 13
7 37 74% 13

only considered test cases making HTTP calls to the API, as Arcuri 5.4 Discussion
did [1]. Values in bold represent the highest values achieved among The results of this case study reflect the potential of the TCM using
our approach, Arcuri’s and the API’s existing test suites. the minimum number of test cases needed to comply with every
It can be clearly appreciated how the TCL of a test suite does TCL. We also conducted an exploratory experiment to evaluate
have an effect on the bugs found and the code covered. Whereas the TCM following a stronger approach regarding the use of pa-
fault finding gradually increases in one of the APIs, code coverage rameter values, namely, testing between 1 and 3 values for each
increases in both of them for most levels. There is one exception: parameter in TCL6, and obtained higher code coverage for both
the decrement in the code covered by TCL7 in the features-service APIs (41% in scout-api, and 79% in features-service), which suggests
API, although it is only 2% less. This is because the operation flows that parameter values are key in the efficacy of the TCM.
defined in Section 3.1 do not suffice to put the system in the same In order to obtain significant results when following the TCM
state as when populating the database as desired, which proves approach, it is required that the system database is properly popu-
that the operation flows that should be tested are API-dependant. lated, containing all kinds of resources. We noticed that the results
We tested more complex sequences of operations and were able to obtained (especially in terms of code coverage) were poorer when
cover the same code as with the previous level, but these were not it was empty, since some operations would not work as expected
considered for the case study, in order to conform to the minimum (e.g. retrieving a resource that does not exist). To this end, it may
requirements to fulfill the operation flows criterion. be necessary to manually populate the database accordingly to the
In terms of code coverage, TCL7 achieves better results than needs of each test case (as we did). However, there may also be cases
those of Arcuri and similar results to the existing test suites in where this is not possible, for example when performing black-box
both APIs. TCL4 is clearly a turning point, since it achieves the testing on an industrial web API. In this scenario, operation flows
greatest improvement in comparison with the previous level. This play a key role in the design of valid test cases, since they allow to
highlights the importance of testing all operation parameters and put the system in the desired state prior to a specific API call.
obtaining both classes of status codes, correct and erroneous. It is worth mentioning that the tests from the API’s existing
In terms of fault finding, TCL7 reveals many more bugs than the test suite are more complex than those of our work and Arcuri’s.
existing test suites of the APIs (33 vs 0 in scout-api, and 13 vs 2 in For example, we only check the status codes and test a specific
features-service), and almost the same as Arcuri (33 vs 33, and 13 CRUD operation per test case, while the API’s existing test cases
vs 14). There is only one bug that our approach was not able to may check multiple aspects such as response body properties and
find, which may be explained by the fact that we did not test many perform multiple CRUD operations on a single test case. This is
parameter values, aiming to assess the validity of the TCM with test why the existing test suites contain notably less test cases than ours
suites complying with the minimum requirements for each level. and Arcuri’s.
Especially significant is the case of the TCL2-compliant test suite As a final remark, the TCM approach does not guarantee good
of the features-service API, which detects the same number of faults fault finding statistics on its own (similarly to what happens with
as one of TCL7. This highlights the importance of testing all API other traditional coverage criteria such as statement coverage),
operations, despite not using all operation parameters. since this depends on the assertions (test oracles) used in the test
Overall, TCL7 achieves more than twice code coverage than cases. High TCLs do guarantee high code coverage, but these need
TCL1 in both APIs, and uncovers between 2 and 11 times more to be accompanied by meaningful assertions and oracles in order
bugs. This supports the idea that, by systematically covering all to increase the chances of finding bugs. For our experiments, we
criteria defined in our catalogue, it is possible to obtain sound only checked the response status codes and yet were able to find a
coverage and fault detection results.

20
Test Coverage Criteria for RESTful Web APIs A-TEST ’19, August 26–27, 2019, Tallinn, Estonia

significant amount of bugs. Had we used smarter approaches like RESTful web APIs, based on the criteria they cover and the cov-
metamorphic testing [14], we might have been able to discover erage level they reach. Furthermore, we evaluated the validity of
more and more complex faults. our proposal with two real-world APIs and found that the TCLs
nicely represent the potential of test suites, where the lowest TCLs
usually get low code coverage and find few faults and the highest
6 RELATED WORK TCLs cover as much code and find as many faults as traditional
Our work is related to black-box testing approaches for web services. and modern testing techniques. We trust that the results of our
Some of these works make an effort to measure, up to a certain de- work pave the way for the automated assessment and comparison
gree, the elements of the web services under test covered by the test of testing approaches for RESTful APIs.
cases generated by their techniques. Ed-douibi et al. [6] automati- Several challenges remain for future work. It is desirable to per-
cally generated 951 test cases for 91 RESTful APIs based on their form further evaluations of the proposed criteria and the TCM with
OAS specifications, and measured the coverage achieved in terms other APIs and test generation techniques, so as to ascertain the
of inputs, namely: paths, operations, parameters and OAS object validity of the results obtained. We aim to provide tool support that
definitions. Other authors have proposed test coverage criteria for allows to analyse test suites and find out their maximum TCL. We
web services based on their specifications such as the Web Services also plan to elaborate on the definition of the operation flows crite-
Description Language (WSDL) [5]. In this regard, Bartolini et al. [3] rion, since it is not fully formalised and there is no agreement in the
enunciated three coverage criteria, namely operation coverage (as literature about the operation flows that should be tested [1, 2, 8, 16].
defined in the WSDL file), message coverage (input messages de- Lastly, our approach opens new promising research opportunities
clared in the WSDL specification) and schema coverage (the parts in terms of test automation. For example, search-based techniques
composing each message), used for measuring the thoroughness could be used to generate test suites that maximise API coverage.
with which their tool could test a service. Bai et al. [16] also used
WSDL to analyse the test coverage achieved by test cases according ACKNOWLEDGEMENTS
to four types of elements: parameters, messages (input and output), This work has been partially supported by the European Com-
operations and operation flows. Lastly, Jokhio et al. [10] used the mission (FEDER) and Spanish Government under projects BELI
Web Service Modelling Ontology (WSMO) framework to gener- (TIN2015-70560-R) and HORATIO (RTI2018-101204-B-C21), and
ate test cases and measure the coverage with two specific criteria: the FPU scholarship program, granted by the Spanish Ministry of
boundary coverage, referred to boundary conditions such as min- Education and Vocational Training (FPU17/04077).
imum and maximum values for parameters, and transition rules
path coverage, referred to the different execution paths that the REFERENCES
program may follow when receiving a given request. In comparison [1] A. Arcuri. 2019. RESTful API Automated Test Case Generation with EvoMaster.
with these papers, our work constitutes the first and most complete ACM Trans. on Software Engineering and Methodology 28, 1 (2019), 3.
[2] V. Atlidakis, P. Godefroid, and M. Polishchuk. 2018. REST-ler: Automatic Intelligent
framework to measure black-box coverage in RESTful web services. REST API Fuzzing. Technical Report.
Furthermore, the validity of this framework has been demonstrated [3] C. Bartolini, A. Bertolino, E. Marchetti, and A. Polini. 2009. WS-TAXI: A WSDL-
with two real RESTful web APIs and several testing techniques. based Testing Tool for Web Services. In Intern. Conference on Software Testing
Verification and Validation. 326–335.
Regarding the tools available in the market, ReadyAPI [15] is the [4] D. M. Cohen, S. R. Dalal, M. L. Fredman, and G. C. Patton. 1997. The AETG
only one that provides meaningful information about the coverage System: An Approach to Testing Based on Combinatorial Design. IEEE Trans. on
Software Engineering 23, 7 (1997).
achieved by a test suite in terms of the functionality covered. Given [5] WWW Consortium. 2007. Web Services Description Language (WSDL) Version 2.0.
a Web Application Description Language (WADL) or OAS docu- Retrieved May 2019 from [Link]
ment and a test suite created in ReadyAPI, the program is able to [6] H. Ed-douibi, J.L.C. Izquierdo, and J. Cabot. 2018. Automatic Generation of
Test Cases for REST APIs: A Specification-Based Approach. In IEEE 22nd Intern.
run the test suite and compute coverage statistics on the elements Enterprise Distributed Object Computing Conference. 181–190.
covered in the API. However, it supports only 6 out of the 10 criteria [7] R. T. Fielding. 2000. Architectural Styles and the Design of Network-based Software
proposed in this paper: on the one hand, the coverage is computed Architectures. Ph.D. Dissertation.
[8] A. Ivanchikj, C. Pautasso, and S. Schreier. 2018. Visual modeling of RESTful
in terms of the sub-elements of each operation, and therefore does conversations with RESTalk. Journal of Software & Systems Modeling 17, 3 (2018),
not provide information regarding path or operation coverage; on 1031–1051.
[9] D. Jacobson, G. Brail, and D. Woods. 2011. APIs: A Strategy Guide. O’Reilly Media,
the other hand, it lacks support for the criteria of parameter values Inc.
and operation flows. Lastly, the tool does not offer any general [10] M. S. Jokhio, G. Dobbie, and J. Sun. 2009. Towards Specification Based Testing for
overview of the thoroughness of the test suite, as opposed to the Semantic Web Services. In Australian Software Engineering Conference. 54–63.
[11] ProgrammableWeb. 2019. RapidAPI API Directory. Retrieved March 2019 from
TCM approach presented in this paper. [Link]
[12] RapidAPI. 2019. RapidAPI API Directory. Retrieved March 2019 from https:
//[Link]
7 CONCLUSIONS AND FUTURE WORK [13] L. Richardson, M. Amundsen, and S. Ruby. 2013. RESTful Web APIs. O’Reilly
Media, Inc.
In this paper, we presented a catalogue of test coverage criteria [14] S. Segura, J.A. Parejo, J. Troya, and A. Ruiz-Cortés. 2018. Metamorphic Testing of
RESTful Web APIs. IEEE Trans. on Software Engineering 44, 11 (2018), 1083–1099.
for RESTful APIs and proposed a framework for the evaluation of [15] SmartBear. 2019. ReadyAPI. Retrieved March 2019 from [Link]
testing approaches in this context. To the best of our knowledge, product/ready-api/overview/
this is the first attempt to establish a common framework for the [16] X. Bai, W. Dong, W.-T. Tsai, and Y. Chen. 2005. WSDL-based Automatic Test Case
Generation for Web Services Testing. In IEEE Intern. Workshop on Service-Oriented
comparison of testing techniques, by providing an easy and auto- System Engineering. 207–212.
matic way of measuring the thoroughness of test suites addressing

21

You might also like