Test Coverage Criteria For RESTful Web APIs
Test Coverage Criteria For RESTful Web APIs
15
A-TEST ’19, August 26–27, 2019, Tallinn, Estonia Alberto Martin-Lopez, Sergio Segura, and Antonio Ruiz-Cortés
16
Test Coverage Criteria for RESTful Web APIs A-TEST ’19, August 26–27, 2019, Tallinn, Estonia
this criterion in the MyMusic API, for instance, with TCs #1, #5 and Parameter value coverage. This criterion measures the coverage
#7. Likewise, TCs #4, #6 and #7 cover all the paths as well. of a test suite according to the parameter values exercised. This
criterion applies only to parameters with a finite number of possible
Operation coverage. This criterion measures the coverage of a
values, namely, booleans and enums. The coverage is computed as
test suite according to the operations executed. The coverage is
the number of different values that parameters are given divided by
computed as the number of operations executed divided by the
the total number of possible values that all parameters can take. To
total number of operations of the API. To achieve 100% operation
achieve 100% parameter value coverage, every boolean and enum
coverage, every path must be sent one request per allowed HTTP
parameter must take all possible values. Nevertheless, it is suggested
verb (GET, POST, PUT or DELETE). Notice that this criterion can
to test multiple values with other types of parameters such as strings
be applied to the entire API or to a specific path. For example, TCs
and integers. There is only one enum parameter in the MyMusic
#1, #5 and #7 reach 57% operation coverage in the MyMusic API,
API: the parameter type in the GET /songs operation. TCs #1, #2,
since they execute 4 out of the 7 operations of the API. At the same
#3 and #4 cover the four values that it accepts (‘original’, ‘all’,
time, TC #1 on its own achieves 100% operation coverage on the
‘cover’ and ‘remix’), and therefore they achieve 100% coverage
/songs path, since only one operation can be performed on it.
under this criterion in the whole API.
Parameter coverage. This criterion measures the coverage of a
Content-type coverage. This criterion measures the coverage of a
test suite according to the operation parameters it uses. The cover-
test suite according to the input content-types used in API requests.
age is computed as the number of parameters used divided by the
This criterion applies only to operations that accept data in the
total number of parameters in the API. To achieve 100% parameter
request body (i.e. POST, PUT and DELETE). The coverage is com-
coverage, all input parameters of every operation must be used
puted as the number of input content-types used divided by the total
at least once. Exercising different combinations of parameters is
number of input content-types across all API operations. To achieve
desirable, but not strictly necessary to achieve 100% of coverage
100% input content-type coverage, for every operation that accepts
under this criterion. The reason for excluding combinatorial cover-
a request body, all data formats (e.g. JSON and XHTML) must be
age criteria (e.g. t-wise [4]) is to ease the development of coverage
tested. This criterion can be applied to each operation individually.
analysis tools. This criterion can also be considered for specific
As an example, the MyMusic API has only two operations that
subsets of the API. As an example, TC #1 achieves 100% parameter
accept a body, namely POST /playlists and PUT /playlists/{playlistId};
coverage for the /songs path, since all parameters are used once.
each of these can process JSON and XML, therefore at least four
Overall, however, the test suite reaches 60% parameter coverage
requests are needed to achieve 100% input content-type coverage
on the entire API, given that 4 out of 10 parameters are not used,
on the entire API. TC #7 reaches 50% coverage of this criterion
because these parameters belong to the three operations not exe-
for the POST /playlists operation, since the JSON content-type is
cuted by the test suite (the query parameter in the operation GET
covered but XML is not.
/playlists, the body and path parameter in PUT /playlists/{playlistId}
and the path parameter in DELETE /playlists/{playlistId}). Operation flow coverage. This criterion measures the coverage
of a test suite according to the sequences of operations it executes.
17
A-TEST ’19, August 26–27, 2019, Tallinn, Estonia Alberto Martin-Lopez, Sergio Segura, and Antonio Ruiz-Cortés
The definition of full coverage of this criterion highly depends on Response body properties coverage. This criterion measures
the API under test. Several proposals exist in the literature about the coverage of a test suite according to its ability to produce re-
the operation flows that should be tested [1, 2, 8, 16], but none of sponses containing all properties of resources. Figure 2 shows two
them is widely accepted and used in industry. For this reason, and API responses containing a Song resource, which is composed of
for the sake of simplicity, we propose to use a simplified version of multiple properties such as the name of the song. The coverage
the flows defined by Arcuri in [1]: for every resource that can be of this criterion is computed as the number of properties obtained
created, at most four operation flows must be tested, namely those divided by the total number of all properties from all objects that
related to its reading (one or several), updating and deletion after can be obtained in API responses. To achieve 100% coverage of
its creation. If the resource is a sub-resource of another one, the this criterion, all properties from all response objects must be ob-
creation of the parent resource must be included in the operation tained. As an example, take the two responses to the request GET
flow. In the MyMusic API, four operation flows can be executed: i) /songs/{songId} shown in Figure 2. Retrieving a single1 (left-hand
create playlist → read one playlist; ii) create playlist → read several side) does not achieve full coverage, since the property album is
playlists; iii) create playlist → update playlist; iv) create playlist → not present in the response. Retrieving a song that is part of an
delete playlist. TC #7 executes the first flow (individual read), so album (right-hand side), by contrast, meets this criterion for this
this criterion is 25% covered. specific response body, since it includes all properties of the Song
object. Intuitively, this criterion can be applied to specific responses
3.2 Output Coverage Criteria individually, like in the MyMusic API, where TC #1 covers this
criterion for the successful response to the GET /songs operation,
This type of criteria measure the degree to which test cases cover
since the response will surely include the same object depicted in
elements related to API responses.
the right-hand side of the previous figure.
Status code class coverage. This criterion measures the cover-
Content-type coverage. This criterion has the same meaning as
age of a test suite according to its ability to produce both correct
the input content-type criterion, but in this case the coverage is
and erroneous responses in the API under test. These responses
measured on the output data formats obtained in API responses.
are typically identified by 2XX and 4XX status codes respectively,
TCs #5 and #6 achieve 100% coverage of this criterion for the GET
however, this may vary depending on the API, therefore the tester
/songs/{songId} operation, since all content-types are obtained in
must define the meaning of correct and erroneous for their particular
the responses, i.e. JSON and XML.
case. It is assumed that every operation should at least return one
successful response, therefore, to achieve 100% coverage of this
criterion, at least one test case per API operation is needed; if every
4 TEST COVERAGE MODEL
operation can return both correct and erroneous status codes, two Inspired by the REST Maturity Model of Richardson [13], we pro-
test cases per operation are needed to reach 100% status code class pose to arrange the previous coverage criteria into eight different
coverage. The coverage is computed as the number of classes of Test Coverage Levels (TCLs), constituting a common framework for
status codes obtained in API responses (maximum two per opera- the assessment and comparison of test suites addressing RESTful
tion) divided by the total number of classes of status codes in the APIs, called the Test Coverage Model (TCM). The goal is to rank test
whole API. In the MyMusic API, the seven operations can return suites based on the TCL they can reach, where TCL0 represents
correct and erroneous responses, therefore fourteen test cases are the weakest coverage level and TCL7 the strongest one. In order to
needed to fully cover this criterion. Overall, the test suite achieves reach a specific TCL, all criteria belonging to previous levels must
36% status code class coverage (5 out of 14 classes covered). At the have been met. This does not mean that criteria from higher levels
same time, TCs #1 and #3 suffice to fulfill this criterion for the GET subsume those from lower levels. Figure 3 illustrates the aforemen-
/songs operation. tioned levels and the test coverage criteria they are comprised of (as
described in the previous section). Note that a distinction is made
Status code coverage. This criterion extends the previous one by between input and output coverage criteria.
considering status codes instead of simply classes of status codes. Level 0 represents a test suite where no coverage criterion is
Therefore, to achieve 100% coverage, all status codes of all opera- fully met. Any test suite, therefore, has TCL0 by default. Level 1 is
tions must be obtained. TCs #1, #3 and #4 achieve 100% coverage the easiest to achieve and the most naive, as it only requires paths
for the GET /songs operation of the MyMusic API, and also for the
/songs path, since it has no more operations. 1A single refers to a song that is released on its own, not as part of an album.
18
Test Coverage Criteria for RESTful Web APIs A-TEST ’19, August 26–27, 2019, Tallinn, Estonia
19
A-TEST ’19, August 26–27, 2019, Tallinn, Estonia Alberto Martin-Lopez, Sergio Segura, and Antonio Ruiz-Cortés
Table 3: Evaluation results and comparison between test suites of the TCM, Arcuri and the API developers.
API Test Coverage Model (TCM) Arcuri Existing test suite
TCL Test cases Code coverage Errors Test cases Code coverage Errors Test cases Code coverage Errors
1 21 18% 3
2 49 20% 21
3 49 20% 21
scout-api 4 98 34% 23 177 38% 33 19 41% 0
5 224 35% 29
6 232 40% 33
7 232 40% 33
1 11 35% 7
2 18 39% 13
3 18 39% 13
features-service 4 36 75% 13 50 64% 14 22 78% 2
5 37 76% 13
6 37 76% 13
7 37 74% 13
only considered test cases making HTTP calls to the API, as Arcuri 5.4 Discussion
did [1]. Values in bold represent the highest values achieved among The results of this case study reflect the potential of the TCM using
our approach, Arcuri’s and the API’s existing test suites. the minimum number of test cases needed to comply with every
It can be clearly appreciated how the TCL of a test suite does TCL. We also conducted an exploratory experiment to evaluate
have an effect on the bugs found and the code covered. Whereas the TCM following a stronger approach regarding the use of pa-
fault finding gradually increases in one of the APIs, code coverage rameter values, namely, testing between 1 and 3 values for each
increases in both of them for most levels. There is one exception: parameter in TCL6, and obtained higher code coverage for both
the decrement in the code covered by TCL7 in the features-service APIs (41% in scout-api, and 79% in features-service), which suggests
API, although it is only 2% less. This is because the operation flows that parameter values are key in the efficacy of the TCM.
defined in Section 3.1 do not suffice to put the system in the same In order to obtain significant results when following the TCM
state as when populating the database as desired, which proves approach, it is required that the system database is properly popu-
that the operation flows that should be tested are API-dependant. lated, containing all kinds of resources. We noticed that the results
We tested more complex sequences of operations and were able to obtained (especially in terms of code coverage) were poorer when
cover the same code as with the previous level, but these were not it was empty, since some operations would not work as expected
considered for the case study, in order to conform to the minimum (e.g. retrieving a resource that does not exist). To this end, it may
requirements to fulfill the operation flows criterion. be necessary to manually populate the database accordingly to the
In terms of code coverage, TCL7 achieves better results than needs of each test case (as we did). However, there may also be cases
those of Arcuri and similar results to the existing test suites in where this is not possible, for example when performing black-box
both APIs. TCL4 is clearly a turning point, since it achieves the testing on an industrial web API. In this scenario, operation flows
greatest improvement in comparison with the previous level. This play a key role in the design of valid test cases, since they allow to
highlights the importance of testing all operation parameters and put the system in the desired state prior to a specific API call.
obtaining both classes of status codes, correct and erroneous. It is worth mentioning that the tests from the API’s existing
In terms of fault finding, TCL7 reveals many more bugs than the test suite are more complex than those of our work and Arcuri’s.
existing test suites of the APIs (33 vs 0 in scout-api, and 13 vs 2 in For example, we only check the status codes and test a specific
features-service), and almost the same as Arcuri (33 vs 33, and 13 CRUD operation per test case, while the API’s existing test cases
vs 14). There is only one bug that our approach was not able to may check multiple aspects such as response body properties and
find, which may be explained by the fact that we did not test many perform multiple CRUD operations on a single test case. This is
parameter values, aiming to assess the validity of the TCM with test why the existing test suites contain notably less test cases than ours
suites complying with the minimum requirements for each level. and Arcuri’s.
Especially significant is the case of the TCL2-compliant test suite As a final remark, the TCM approach does not guarantee good
of the features-service API, which detects the same number of faults fault finding statistics on its own (similarly to what happens with
as one of TCL7. This highlights the importance of testing all API other traditional coverage criteria such as statement coverage),
operations, despite not using all operation parameters. since this depends on the assertions (test oracles) used in the test
Overall, TCL7 achieves more than twice code coverage than cases. High TCLs do guarantee high code coverage, but these need
TCL1 in both APIs, and uncovers between 2 and 11 times more to be accompanied by meaningful assertions and oracles in order
bugs. This supports the idea that, by systematically covering all to increase the chances of finding bugs. For our experiments, we
criteria defined in our catalogue, it is possible to obtain sound only checked the response status codes and yet were able to find a
coverage and fault detection results.
20
Test Coverage Criteria for RESTful Web APIs A-TEST ’19, August 26–27, 2019, Tallinn, Estonia
significant amount of bugs. Had we used smarter approaches like RESTful web APIs, based on the criteria they cover and the cov-
metamorphic testing [14], we might have been able to discover erage level they reach. Furthermore, we evaluated the validity of
more and more complex faults. our proposal with two real-world APIs and found that the TCLs
nicely represent the potential of test suites, where the lowest TCLs
usually get low code coverage and find few faults and the highest
6 RELATED WORK TCLs cover as much code and find as many faults as traditional
Our work is related to black-box testing approaches for web services. and modern testing techniques. We trust that the results of our
Some of these works make an effort to measure, up to a certain de- work pave the way for the automated assessment and comparison
gree, the elements of the web services under test covered by the test of testing approaches for RESTful APIs.
cases generated by their techniques. Ed-douibi et al. [6] automati- Several challenges remain for future work. It is desirable to per-
cally generated 951 test cases for 91 RESTful APIs based on their form further evaluations of the proposed criteria and the TCM with
OAS specifications, and measured the coverage achieved in terms other APIs and test generation techniques, so as to ascertain the
of inputs, namely: paths, operations, parameters and OAS object validity of the results obtained. We aim to provide tool support that
definitions. Other authors have proposed test coverage criteria for allows to analyse test suites and find out their maximum TCL. We
web services based on their specifications such as the Web Services also plan to elaborate on the definition of the operation flows crite-
Description Language (WSDL) [5]. In this regard, Bartolini et al. [3] rion, since it is not fully formalised and there is no agreement in the
enunciated three coverage criteria, namely operation coverage (as literature about the operation flows that should be tested [1, 2, 8, 16].
defined in the WSDL file), message coverage (input messages de- Lastly, our approach opens new promising research opportunities
clared in the WSDL specification) and schema coverage (the parts in terms of test automation. For example, search-based techniques
composing each message), used for measuring the thoroughness could be used to generate test suites that maximise API coverage.
with which their tool could test a service. Bai et al. [16] also used
WSDL to analyse the test coverage achieved by test cases according ACKNOWLEDGEMENTS
to four types of elements: parameters, messages (input and output), This work has been partially supported by the European Com-
operations and operation flows. Lastly, Jokhio et al. [10] used the mission (FEDER) and Spanish Government under projects BELI
Web Service Modelling Ontology (WSMO) framework to gener- (TIN2015-70560-R) and HORATIO (RTI2018-101204-B-C21), and
ate test cases and measure the coverage with two specific criteria: the FPU scholarship program, granted by the Spanish Ministry of
boundary coverage, referred to boundary conditions such as min- Education and Vocational Training (FPU17/04077).
imum and maximum values for parameters, and transition rules
path coverage, referred to the different execution paths that the REFERENCES
program may follow when receiving a given request. In comparison [1] A. Arcuri. 2019. RESTful API Automated Test Case Generation with EvoMaster.
with these papers, our work constitutes the first and most complete ACM Trans. on Software Engineering and Methodology 28, 1 (2019), 3.
[2] V. Atlidakis, P. Godefroid, and M. Polishchuk. 2018. REST-ler: Automatic Intelligent
framework to measure black-box coverage in RESTful web services. REST API Fuzzing. Technical Report.
Furthermore, the validity of this framework has been demonstrated [3] C. Bartolini, A. Bertolino, E. Marchetti, and A. Polini. 2009. WS-TAXI: A WSDL-
with two real RESTful web APIs and several testing techniques. based Testing Tool for Web Services. In Intern. Conference on Software Testing
Verification and Validation. 326–335.
Regarding the tools available in the market, ReadyAPI [15] is the [4] D. M. Cohen, S. R. Dalal, M. L. Fredman, and G. C. Patton. 1997. The AETG
only one that provides meaningful information about the coverage System: An Approach to Testing Based on Combinatorial Design. IEEE Trans. on
Software Engineering 23, 7 (1997).
achieved by a test suite in terms of the functionality covered. Given [5] WWW Consortium. 2007. Web Services Description Language (WSDL) Version 2.0.
a Web Application Description Language (WADL) or OAS docu- Retrieved May 2019 from [Link]
ment and a test suite created in ReadyAPI, the program is able to [6] H. Ed-douibi, J.L.C. Izquierdo, and J. Cabot. 2018. Automatic Generation of
Test Cases for REST APIs: A Specification-Based Approach. In IEEE 22nd Intern.
run the test suite and compute coverage statistics on the elements Enterprise Distributed Object Computing Conference. 181–190.
covered in the API. However, it supports only 6 out of the 10 criteria [7] R. T. Fielding. 2000. Architectural Styles and the Design of Network-based Software
proposed in this paper: on the one hand, the coverage is computed Architectures. Ph.D. Dissertation.
[8] A. Ivanchikj, C. Pautasso, and S. Schreier. 2018. Visual modeling of RESTful
in terms of the sub-elements of each operation, and therefore does conversations with RESTalk. Journal of Software & Systems Modeling 17, 3 (2018),
not provide information regarding path or operation coverage; on 1031–1051.
[9] D. Jacobson, G. Brail, and D. Woods. 2011. APIs: A Strategy Guide. O’Reilly Media,
the other hand, it lacks support for the criteria of parameter values Inc.
and operation flows. Lastly, the tool does not offer any general [10] M. S. Jokhio, G. Dobbie, and J. Sun. 2009. Towards Specification Based Testing for
overview of the thoroughness of the test suite, as opposed to the Semantic Web Services. In Australian Software Engineering Conference. 54–63.
[11] ProgrammableWeb. 2019. RapidAPI API Directory. Retrieved March 2019 from
TCM approach presented in this paper. [Link]
[12] RapidAPI. 2019. RapidAPI API Directory. Retrieved March 2019 from https:
//[Link]
7 CONCLUSIONS AND FUTURE WORK [13] L. Richardson, M. Amundsen, and S. Ruby. 2013. RESTful Web APIs. O’Reilly
Media, Inc.
In this paper, we presented a catalogue of test coverage criteria [14] S. Segura, J.A. Parejo, J. Troya, and A. Ruiz-Cortés. 2018. Metamorphic Testing of
RESTful Web APIs. IEEE Trans. on Software Engineering 44, 11 (2018), 1083–1099.
for RESTful APIs and proposed a framework for the evaluation of [15] SmartBear. 2019. ReadyAPI. Retrieved March 2019 from [Link]
testing approaches in this context. To the best of our knowledge, product/ready-api/overview/
this is the first attempt to establish a common framework for the [16] X. Bai, W. Dong, W.-T. Tsai, and Y. Chen. 2005. WSDL-based Automatic Test Case
Generation for Web Services Testing. In IEEE Intern. Workshop on Service-Oriented
comparison of testing techniques, by providing an easy and auto- System Engineering. 207–212.
matic way of measuring the thoroughness of test suites addressing
21