0% found this document useful (0 votes)
23 views13 pages

Code Coverage Criteria For Asynchronous Programs

This paper introduces new code coverage criteria specifically designed for asynchronous JavaScript programs, addressing the limitations of traditional metrics in evaluating test adequacy. The proposed criteria focus on the completion of asynchronous operations, registration of reactions, and execution of those reactions, implemented through a tool called JS���� for Visual Studio Code. An evaluation of JS���� on various JavaScript applications demonstrates its effectiveness in revealing untested behaviors and improving developers' ability to assess test adequacy in asynchronous code.

Uploaded by

yu pei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views13 pages

Code Coverage Criteria For Asynchronous Programs

This paper introduces new code coverage criteria specifically designed for asynchronous JavaScript programs, addressing the limitations of traditional metrics in evaluating test adequacy. The proposed criteria focus on the completion of asynchronous operations, registration of reactions, and execution of those reactions, implemented through a tool called JS���� for Visual Studio Code. An evaluation of JS���� on various JavaScript applications demonstrates its effectiveness in revealing untested behaviors and improving developers' ability to assess test adequacy in asynchronous code.

Uploaded by

yu pei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Code Coverage Criteria for Asynchronous Programs

Mohammad Ganji Saba Alimadadi Frank Tip


Simon Fraser University Simon Fraser University Northeastern University
Canada Canada USA
m_ganji@[Link] saba@[Link] [Link]@[Link]

ABSTRACT 1 INTRODUCTION
Asynchronous software often exhibits complex and error-prone Asynchronous programming is extensively used for web develop-
behaviors that should be tested thoroughly. Code coverage has been ment and is crucial for providing bene�ts such as non-blocking I/O,
the most popular metric to assess test suite quality. However, tradi- seamless and real-time user interactions, and e�cient client-server
tional code coverage criteria do not adequately re�ect completion, communications. JavaScript is single-threaded, and asynchronous
interactions, and error handling of asynchronous operations. execution of potentially long-running tasks is what enables the ap-
This paper proposes novel test adequacy criteria for measur- plications to remain responsive while processing events. In recent
ing: (i) completion of asynchronous operations in terms of both years, JavaScript’s Promises [1, Section 27.2] and async/await [1, Sec-
successful and exceptional execution, (ii) registration of reactions tion 15.6] have rapidly become the most popular mechanisms for
for handling both possible outcomes, and (iii) execution of said supporting asynchrony, supplanting the previous error-prone ap-
reactions through tests. We implement JS����, a tool for automati- proach based on event-based programming and callbacks. However,
cally measuring coverage according to these criteria in JavaScript understanding the �ow of asynchronous execution and identifying
applications, as an interactive plug-in for Visual Studio Code. and �xing faults remain challenging for developers [15, 47, 72, 77].
An evaluation of JS���� on 20 JavaScript applications shows that Developers typically rely on an application’s tests to identify
the proposed criteria can help improve assessment of test adequacy, faults and verify the application’s behavior. They often use code
complementing traditional criteria. According to our investigation coverage criteria such as statement and branch coverage to assess
of 15 real GitHub issues concerned with asynchrony, the new crite- the adequacy of their tests throughout the process, and to identify
ria can help reveal faulty asynchronous behaviors that are untested and address the shortcomings of existing tests in order to improve
yet are deemed covered by traditional coverage criteria. We also their quality [40, 81]. However, traditional coverage criteria are
report on a controlled experiment with 12 participants to investi- unable to examine various scenarios of exercising asynchronous
gate the usefulness of JS���� in realistic settings, demonstrating code in terms of eventual completion of asynchronous operations,
its e�ectiveness in improving programmers’ ability to assess test their interactions, and their error handling. Despite the importance
adequacy and detect untested behavior of asynchronous code. of testing asynchronous programs and the severity of the issues
that occur in such programs, there are currently no code coverage
CCS CONCEPTS criteria that target the adequacy of tests with regard to exploring
scenarios that occur in asynchronous code.
• Software and its engineering ! Software testing and debug-
This paper presents new coverage criteria for assessing the ade-
ging.
quacy of tests in exercising the asynchronous behavior of JavaScript
applications. These criteria quantify the adequacy of tests in cover-
KEYWORDS ing eventual successful or exceptional completion of asynchronous
Code coverage, Dynamic analysis, Asynchronous JavaScript operations, associating reactions with the outcomes of asynchro-
nous operations, and execution of (chains of) reactions by the ap-
ACM Reference Format: plication’s tests. These criteria target the semantics of JavaScript’s
Mohammad Ganji, Saba Alimadadi, and Frank Tip. 2023. Code Coverage promises and async/await features, and are meant to complement
Criteria for Asynchronous Programs . In Proceedings of the 31st ACM Joint existing coverage metrics such as statement and branch coverage.
European Software Engineering Conference and Symposium on the Foun- We implement our approach in a plugin for Visual Studio Code
dations of Software Engineering (ESEC/FSE ’23), December 3–9, 2023, San named JS����, which presents coverage results as a textual report,
Francisco, CA, USA. ACM, New York, NY, USA, 13 pages. [Link] and through an interactive visualization. JS���� automatically in-
1145/3611643.3616292
struments an application’s code to calculate and report coverage
according to three criteria, namely settlement coverage, reaction
registration coverage, and reaction execution coverage.
Permission to make digital or hard copies of all or part of this work for personal or An evaluation of JS���� on 20 JavaScript applications shows that
classroom use is granted without fee provided that copies are not made or distributed
for pro�t or commercial advantage and that copies bear this notice and the full citation
the proposed criteria can help improve assessment of test adequacy,
on the �rst page. Copyrights for components of this work owned by others than the complementing traditional criteria. Furthermore, an investigation
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or of 15 real GitHub issues concerned with asynchrony demonstrates
republish, to post on servers or to redistribute to lists, requires prior speci�c permission
and/or a fee. Request permissions from permissions@[Link]. that the new criteria can help reveal faulty asynchronous behaviors
ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA that are untested yet are deemed covered by traditional coverage
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM. criteria. We also report on a controlled experiment with 12 partici-
ACM ISBN 979-8-4007-0327-0/23/12. . . $15.00
[Link] pants to investigate the usefulness of JS���� in realistic settings,
ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA Mohammad Ganji, Saba Alimadadi, and Frank Tip

demonstrating that it is e�ective in improving programmers’ ability rejected. The value returned by a reaction is wrapped in another
to assess test adequacy and detect untested and buggy behavior. promise, thus enabling programmers to chain asynchronous com-
In summary, this paper makes the following contributions: putations and propagate errors. For example, the following code
• New coverage criteria that quantify the degree to which key fragment shows the creation of a promise chain that starts with p1:
scenarios are exercised in asynchronous code, 6 [Link]( function f1 (v) { console . log (v + " �world") ; } )
• An instrumentation-based technique for measuring coverage 7 . catch( f3 ( err ) { console . log ( " error � occurred : � " + err ) ; }
according to these criteria, If p1 was ful�lled with the value hello , the reaction that is regis-
• Implementation of the technique in an interactive VS Code tered by calling then on p1 on line 6 concatenates that value with
extension named JS���� that computes a coverage report another string world and prints it to the console, Line 7 registers
and provides an interactive visualization [42], and a reject reaction on the promise that is created by calling then on
• An empirical evaluation, demonstrating the ability of the pro- line 6. It prints an error message if any of the previous promises in
posed criteria to identify test inadequacies in asynchronous the chain is rejected. Therefore, the above code snippet will either
code. We also report on a user study showing that JS���� print hello world or error occurred: oops .
improves the e�ectiveness of programmers when testing and
Linking promises. Invoking the Promise constructor and the then
debugging asynchronous code.
and catch methods creates a new promise ?. However, if the resolve
associated with the Promise constructor is invoked with an argu-
2 BACKGROUND
ment that evaluates to a promise ? 0 , or when a reaction that is reg-
In recent years, many programming languages have been extended istered by calling then or catch returns a promise ? 0 , the promise
with support for asynchrony. For example, Java and Dart now sup- ? 0 becomes linked with ?. As such, if ? 0 is resolved with a value E,
port Futures [5, 6], C# and Python support async/await [2, 3], and then ? is also resolved with E, and if ? 0 is rejected with a value 4,
JavaScript �rst added promises, and then de�ned an async/await then so is ?, and if ? 0 remains pending, so does ?. This example:
feature in terms of promises. These new features in JavaScript are
8 const p3 = Promise. resolve ( " hello " )
used pervasively and pose signi�cant new challenges for testing. 9 const p4 = Promise. resolve ( " there " )
In this section, we provide an overview of promises and async 10 [Link]( () => p4 ) // establish link with p4
/await, two features that have supplanted event-driven asynchro- 11 . then( (v) => console . log (v) ) // prints " there "
nous programming in JavaScript. While our techniques do not apply creates promises and assigns them to variables p3 and p4. Given
directly to the latter, any event-driven API can be “promisi�ed” into that p3 is ful�lled, its reaction is executed and returns p4, so p4
an equivalent promise-based one using standard library functions. and the promise returned by [Link]() on line 10 become linked.
Creating promises. A promise represents the value of an asynchro- Since p4 resolves to there , the promise returned by [Link]() on
nous computation, and is in one of three states: pending, ful�lled, line 10 resolves to there as well, causing the reaction registered
or rejected. The state of a promise can change at most once: from on line 11 to execute and print this value.
pending to ful�lled, or from pending to rejected. We will say that async/await. JavaScript’s async/await feature provides a syntactic
a promise is settled if its state is ful�lled or rejected. Promises are enhancement on top of promises. A function declared as async re-
created by invoking the Promise constructor, and are initially in turns a promise that is ful�lled with the function’s return value.
the pending state. Promises come equipped with two methods, In an async function, await-expressions may be used to wait for a
resolve and reject, for ful�lling or rejecting the promise with promise settle. If an expression 4 evaluates to a promise ?, then an
a particular value, respectively. For example, the following code expression await 4 evaluates to E; if it is rejected with a value err,
assigns a promise to a variable p1 that is either ful�lled with the err is thrown as an exception that can be caught using try/catch.
value hello or rejected with an Error object. 12 async function f () {
1 const p1 = new Promise( ( resolve , reject ) => { 13 try {
2 if ([Link]() > 0.5) { resolve ( " hello " ) ; } 14 let v = await e;
3 else { reject (new Error( ' oops' ) ) ; } 15 /∗ 1 ∗/
4 }) ; 16 } catch(e) { /∗ 2 ∗/ }

Promises can also be constructed using the functions Promise. In the above example, e is an expression that evaluates to a promise
resolve and [Link]. Each of these functions takes a single ?. The execution of the code fragment /* 1 */ depends on ful�ll-
argument, i.e., the value that the promise should be ful�lled or ment of ?. So one may think of /* 1 */ as a ful�ll reaction associated
rejected with. The following example creates a promise that is with ?, and similarly the fragment /* 2 */ as a reject reaction of ?.
ful�lled with the value 3:
5 const p2 = Promise. resolve (3) ;
3 MOTIVATION AND CHALLENGES
This section elaborates on some challenges in identifying parts
Synchronization functions such as [Link] and [Link]
of asynchronous code that despite being covered by tests, are not
are other ways to create promises. They wait on a set of promises
tested “su�ciently” and thus may include bugs. We use real bug
to be settled in any order, returning a single promise.
reports from Figures 1–2 to illustrate the challenging nature of lo-
Registering reactions on promises. The then and catch methods cating bugs in asynchronous code. These challenges are intensi�ed
enable programmers to register reactions on promises, i.e., functions by developers’ con�dence in correctness of the code, when their
that are executed asynchronously when a promise is ful�lled or tests exercise that code. While existing coverage metrics may show
Code Coverage Criteria for Asynchronous Programs ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA

17 remove: async (req) => { 26 async function visibility (preview, widgetValue, params) {
18 const dbRepo = await [Link]([Link] ) 27 – await new Promise(resolve => {
19 if (dbRepo && [Link]) { 28 + await new Promise((resolve, reject) => {
20 try { 29 this . trigger_up ( ' action_demand', {
21 – [Link](req) 30 onSuccess: () => resolve () ,
22 + await [Link](req) 31 + onFailure: () => reject(), // ADDED IN FIX.
23 } catch ( error ) { // handle the error } } 32 }) ; }) ;
24 return dbRepo 33 this . trigger_up ( ' option_visibility_update ' , {show});
25 } 34 }
Figure 1: Implementation of [Link]. Figure 2: Implementation of async function visibility.

full coverage of these code segments, these metrics are unable to 3.2 Pending Asynchronous Operations
examine the execution of scenarios speci�c to asynchronous code. An asynchronous operation remains pending until it is “settled” suc-
cessfully or through a failure, i.e., ful�lled or rejected. It is common
3.1 Unhandled Exceptions to chain asynchronous operations to impose an ordering on their
An asynchronous operation can eventually terminate successfully, execution. In such cases, successful and exceptional completion of
or it may fail. While a successful completion is usually the desired an asynchronous operation each trigger respective reactions, and
outcome, the failures or exceptional cases should be tested thor- the execution of the program continues. It is typically expected for
oughly to assess the applications’ robustness and error recovery. all asynchronous operations to “settle.” In cases where this does not
Exceptional scenarios are often not thoroughly tested by many happen, the appropriate reactions are not invoked, and the chain
applications, which can lead to bugs and unexpected behaviors dur- of execution is interrupted. The following example demonstrates
ing execution should an exception occur [15]. For instance, await a real bug where a pending asynchronous operation causes the
expressions may be surrounded by try/catch for handling a failed program to freeze in a loading state, preventing the users from
completion of the async function. However, many applications do further interactions with the system.
not have adequate exception handling in place and do not su�- 3.2.1 Example 2. Figure 2 shows changes related to a bug �x from
ciently test exceptional and failure cases in their asynchronous code. Odoo, a suite of web-based open source business apps, including
In the following example, we discuss how failure to properly handle Marketing, eCommerce, and Website Builder apps. 3 It has nearly
the rejection of an asynchronous operation results in the whole 25K stars on GitHub and is forked over 16K times. The async func-
system crashing. The bug occurs despite code coverage reports tion visibility is responsible for updating the visibility of a �eld
showing that the related part of the code was in fact covered. inside a widget in the sidebar menu of the website builder. The exe-
3.1.1 Example 1. CLA Assistant is a web service that streamlines cution of this method depends on the completion of a promise that
the process of signing Contributor License Agreements (CLAs). 1 noti�es the parent widget to toggle its visibility (lines 27–32). The
This project is built by SAP SE 2 developers and has more than 1000 noti�cation occurs through trigger_up on lines 29–32. A reaction
stars. The code in Figure 1 shows the async function RepoService is assigned to this operation that is invoked upon its successful
.remove, which is responsible for removing a repository from CLA completion, ful�lling the promise (line 30). The visibility method
Assistant (using [Link] on line 18) and removing all of its then makes the �eld on the widget visible, allowing the user to
webhooks ([Link], line 21). interact with the editor (line 33).
To handle unexpected errors, the call to [Link] is placed The bug report indicates a scenario where a widget is frozen, with
inside a try/catch (lines 20–23), which assures programmers of a spinner spinning forever. The issue occurs when the event �red by
the robustness of this code segment. Programmer con�dence in trigger_up ends with an exception. Hence, the onSuccess callback
this code segment is reinforced by covering and exercising all its is not called to ful�ll the promise. As there is no reject reaction
statements through the tests. Despite this, a bug was reported devised for unsuccessful completion of the promise, it never settles.
where an unhandled rejection in this method resulted in the hard As the execution of the remaining part of the visibility method
shutdown of the service. Further investigation showed that while depends on the settlement of the promise, the pending promise
there is a try/catch in place to handle errors in removing webhooks, prevents the execution of line 33. This causes the widget to get
the developers failed to await the asynchronous [Link] stuck in a loading state, making the application dysfunctional.
method. Without an await statement, the program does not wait The �x rejects the promise upon failure of trigger_up (line 31),
for the async function to complete its execution. The execution which settles the promise and allows the execution to continue.
of [Link] could end before [Link] is rejected
with an error asynchronously. The exception was thrown outside 4 ASYNCHRONOUS COVERAGE CRITERIA
the scope of [Link] and thus the catch clause could Our goal is to de�ne coverage criteria that re�ect to what extent the
not have caught it, causing an unhandled rejection. possible asynchronous behaviors of an application are exercised,
The �x adds an await before [Link] to make RepoService focusing on promise-based asynchrony. Figure 3 illustrates the
.remove wait until its completion (line 22). life cycle of a promise: Upon creation, a promise is in the pending
state from whence it may transition to the settled state when it is
1 [Link]
2 [Link] 3 [Link]
ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA Mohammad Ganji, Saba Alimadadi, and Frank Tip

execute reaction the second argument of then is used to register a reject-reaction.


new Promise(…) fulfill Lastly, Execful�ll and Execreject events happen when a previously
Pending Settled
reject registered ful�ll-reaction or reject-reaction starts executing.
.then/.catch .then/.catch
register reaction register reaction
4.2 Coverage Criteria for Promise-Based Code
Figure 3: Illustration of the life cycle of a promise. In the de�nitions that follow, pid, pid0, · · · represent promise iden-
ti�ers, 5 , 5 0, · · · denote functions, and loc, loc0, · · · denote source
locations. De�nition 1 de�nes a trace as a sequence of trace events
ful�lled or rejected. Reactions may be registered on a promise at (see Table 1). We will use g, g 0, · · · to refer to execution traces.
any time in the pending or settled state. Such reactions will execute D��������� 1 (�����). A trace is an ordered sequence of trace
when the promise is settled. Our coverage criteria re�ect the key events as speci�ed in Table 1.
steps of promise settlement, promise registration, and promise
execution. It is noteworthy that none of these steps subsumes the For each promise pid that occurs in a trace g, there is a unique
others because: (i) settlement of a promise does not imply that trace element Create(pid, loc) corresponding to its creation. We
reactions are registered on it, (ii) registration of a reaction of a de�ne loc(pid) as the location loc that is referenced in this trace
promise does not imply that the promise will be settled (and hence element. The �rst coverage criterion we de�ne is settlement cover-
that the reaction will execute), and (iii) execution of a reaction of a age. This measures the fraction of promises de�ned by an applica-
promise requires both settlement of the promise and registration tion that are settled (i.e., ful�lled or rejected). Here, we consider
of the reaction. Further, reactions may be registered on promises a promise pid originating from location loc to be fully covered if
after they have settled. By proposing distinct criteria for each step, the trace contains both Ful�lled and Rejected events for pid, which
issues that result in failure to ful�ll a promise and failure to register requires location loc to be executed at least twice. Moreover, when a
a reaction will manifest themselves through lack of coverage. Ful�lled or Rejected event is observed for a promise pid, all promises
We de�ne our criteria in terms of events in execution traces directly or indirectly linked with pid are settled as well. To capture
that pertain to the use of asynchronous features. We de�ne three this, we �rst de�ne L(pid, g) to denote the set of promises linked
coverage criteria that target the completion of all asynchronous to pid in trace g.
operations (successful and exceptional), registration of reactions
D��������� 2 (������ ��������). Let pid be the promise identi�er
for both outcomes of the operations, and the execution of said
for a promise. Then, the set of promises linked to pid in a trace g,
reactions, respectively. We begin by de�ning coverage notions for denoted by L(pid, g), is de�ned as:
JavaScript applications that use promises, and will then explain
informally how these notions extend to async/await. Finally, we L(pid, g) = { pid0 | pid0 = pid or
will discuss the feasibility of these criteria. 9loc : Link(pid, pid0 , loc) 2 g, pid0 2 L(pid, g) }

Note that pid itself is also an element of L(pid, g).


4.1 Events and Traces Using De�nition 2, we now de�ne the notion of settlement cov-
Table 1 de�nes the promise-related events that may occur during erage as stated in De�nition 3. Informally, the de�nition computes
execution. Here, we assume that each promise that is created at run the number of locations loc0 of promises pid0 that are linked to a
time has a unique promise identi�er (pid). Further, let S de�ne the promise pid for which a Ful�lled or a Rejected event occurs in the
set of source locations where promises are created, including: (i) trace g. It then divides the sum of these by 2 ⇤ |S|, where S is the
calls to the Promise constructor, (ii) calls to [Link]() and number of locations where a promise is created.
[Link](), (iii) calls to then, catch, and finally on promise
D��������� 3 (���������� ��������). Let program P create
objects, (iv) calls to [Link], [Link], [Link], and
promises at locations in S, and let g be the trace for an execution of
[Link], and (v) the end of execution of an async func-
P. We de�ne the settlement coverage of g as:
tion (either normal or exceptional exit).
Create events occur when any of situations (i)-(v) occurs. Link | { loc0 | Ful�lled(pid, loc) 2 g, pid0 2 L(pid, g), loc0 = loc(pid0 ) } |+
events occur when the resolve function associated with a call to the | { loc0 | Rejected(pid, loc) 2 g, pid0 2 L(pid, g), loc0 = loc(pid0 ) } |
Promise constructor or [Link] is invoked with an argu-
2 ⇤ |S|
ment that is a promise. A Link event is always immediately preceded
by a Create event. Our next goal is to measure the percentage of promises on which
Ful�lled events occur when the resolve function associated with reactions are registered. Here, we consider a promise fully covered
a Promise is invoked with an argument that is not a promise, and if both a ful�ll reaction and a reject reaction are registered on it.
when a reaction returns a value that is not a promise. Likewise, However, we need to consider that the rejection of a promise ?
Rejected events occur when the reject function associated with a may be handled by a reject reaction that is not registered directly
Promise is invoked, and when a reaction throws an exception. Note on ? itself, but at the end of a promise chain that starts with ?. To
that the trace only records Ful�lled and Rejected events for promises capture this, we de�ne the set of dependent promises pid that occur
that are explicitly ful�lled or rejected (and not for linked promises). at the end of a chain of ful�ll-reactions that starts at pid. In such
Regful�ll events happen when then is used to register a ful�ll- cases, we will write pid pid0 , as de�ned below in De�nition 4.
reaction on a promise, and Regreject events happen when catch or
Code Coverage Criteria for Asynchronous Programs ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA

Create(pid, loc) creation of promise pid at location loc


Ful�lled(pid, loc) promise pid is ful�lled at location loc
Rejected(pid, loc) promise pid is rejected at location loc
Link(pid, pid0 , loc) promise pid becomes linked to promise pid0 at location loc
Regful�ll (pid, 5 , loc, [pid0 ]) register ful�ll reaction 5 on promise pid at location loc, which may chain it to promise pid0
Regreject (pid, 5 , loc, [pid0 ]) register reject reaction 5 on promise pid at location loc, which may chain it to promise pid0
Execful�ll (pid, 5 , loc) execute ful�ll reaction 5 on promise pid at location loc
Execreject (pid, 5 , loc) execute reject reaction 5 on promise pid at location loc
Table 1: Trace events for asynchronous operations.

D��������� 4 (��������� ��������). Let program P create i.e., a Regreject event. If ? is ful�lled, then an Execful�ll event is emit-
promises at locations in S, and let g be the trace for an execution of ted. Otherwise, the catch statement executes and an Execreject is
P. Then: recorded in the trace. Assuming these trace elements, the same

pid pid0 if
pid ⌘ pid0 or coverage de�nitions apply.
pid pid00 and Regful�ll (pid00 , loc, 5 , pid0 )
Using De�nition 4, De�nition 5 below computes reaction registra- 4.4 Example
tion coverage through the following steps: (i) compute the number
of locations loc0 where a Regful�ll event occurs on a promise pid for Consider the following code displaying function fun and its tests.
which a Create event occurs in the trace, (ii) compute the number
of locations loc0 where a Regreject event occurs on a promise pid0 , 35 function fun( inputStr ) {
where pid pid0 , and where a Create event for pid occurs in the 36 const p1 = new Promise((resolve ) => {
37 resolve ([Link](inputStr ) ) ;
trace, and (iii) compute the sum of these, and divide it by 2 ⇤ |S|. 38 }) . then(function f1(data) {
D��������� 5 (�������� ������������ ��������). Let program 39 console . log (data. foo . bar)
P create promises at locations in S, and let g be the trace for an 40 }) ; }
execution of P. We de�ne the reaction registration coverage of g as: 41 // Tests :
42 test ( "T1: � inputStr � is � valid �JSON", () => {
| { loc0 | Create ( pid, loc ) 2 g, Regful�ll ( pid, 5 , loc0 , pid0 ) 2 g } | +
43 fun( ' {" foo ": � {" bar ": � "Hello ."}} ' ) ; })
| { loc0 | Create ( pid, loc ) 2 g, pid pid0 , Regreject ( pid0 , 5 , loc0 , pid00 ) 2 g } |
44 test ( "T2: � inputStr � is � not�a� valid � JSON", () => {
2 ⇤ |S| 45 fun( ' Hello . ' ) ; })

Lastly, we de�ne the notion of reaction execution coverage, mea-


suring the percentage of promises with executed reactions. This is In order to measure fun’s async coverage criteria, we �rst obtain
expressed by De�nition 6 below, which is similar to De�nition 5, the following trace.
except that it checks for the presence of Execful�ll and Execreject
events in the trace instead of Regful�ll and Regreject events. Achiev- 46 Create(?83?1 , L36:L38) // Start of T1
ing full reaction execution coverage for a promise created at loc 47 Ful�lled(?83?1 , L37:L37)
requires that loc is executed at least twice. 48 Create(?83C⌘4= , L38:L40) // Promise . then () returns a promise
49 Regful�ll (?83?1 , f1, L38:L38, ?83C⌘4= )
D��������� 6 (�������� ��������� ��������). Let program P 50 Ful�lled(?83C⌘4= , L38:L40)
create promise at locations in S, and let g be the trace for an execution 51 Execful�ll (?83?1 , f1, L38:L40)
of P. We de�ne the reaction execution coverage of g as: 52 Create(?83?1 0 , L36:L38) // Start of T2

| { loc0 | Create(pid, loc) 2 g, Execful�ll (pid, 5 ,loc’) 2 g } | + 53 Rejected(?83?1 0 , L37:L37) // Error thrown by [Link] () rejects p1.

| { loc0 | Create(pid, loc) 2 g, pid pid0 , Execreject (pid0 , 5 ,loc’) 2 g } | 54 0


Create(?83C⌘4= , L38:L40)
55 0 , f1, L38:L38, ?83 0
Regful�ll (?83?1 C⌘4=
)
2 ⇤ |S|
We then identify two unique promises from the traces obtained
4.3 async/await from T1 and T2. The promise created at L36:L38 achieves full (2/2)
The semantics of JavaScript’s async/await is de�ned in terms of settlement coverage with a Ful�lled event in T1 and a Rejected event
promises, and provides a more convenient syntax that is highly in T2. However, the promise created at L38:L40 achieves partial (1/2)
similar to that of sequential code. An async function always re- settlement coverage with only one Ful�lled event in T1. Based on
turns a promise, thus upon calls to async functions a Create event the observed Regful�ll and Regreject events, the two promises achieve
is included in the trace. When an async function returns a value partial (1/2) and minimal (0/2) reaction registration coverage, re-
that is not a promise, a Ful�lled event is included in the trace to re- spectively. reaction execution coverage can also be measured in
�ect its ful�llment. A Rejected event is emitted if an async function a similar manner. Overall, we calculate a total of 75% settlement
throws an exception that is not caught within its body. The code coverage, 25% reaction registration coverage, and 25% reaction exe-
fragment following an await statement will be considered a ful�ll cution coverage for function fun. To achieve full coverage, a reject
reaction for the promise ? returned by the async function, and thus reaction needs to be registered to both promises (e.g., adding a catch
a Regful�ll event will be added to the trace. If the await-expression at the end of the chain). The reaction then needs to be executed
is in a try/catch, the catch statement will be the reject reaction, through a newly-written test that rejects the promise at L38:L40.
ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA Mohammad Ganji, Saba Alimadadi, and Frank Tip

Figure 4: JS���� coverage results for CLA Assistant. The open editor shows [Link] in [Link].

4.5 Feasibility of Asynchronous Coverage 5.2 Measuring Asynchronous Coverage


Criteria As promises can only be settled once, at least two tests are required
The proposed coverage criteria for asynchronous programs are to achieve full async coverage for a promise. As such, we uniquely
similar to traditional coverage criteria in the sense that 100% cov- identify a promise based on its static creation location in the code.
erage, while desirable, is not always attainable. For example, in a Multiple Create events with the same location across several test
conditional statement if ⇢ then ( 1 else ( 2 , if the condition ⇢ always executions in a test suite will be considered as the same promise.
evaluates to true, then the else-branch and all the statements in In such cases, coverage reported by JS���� should be interpreted
( 2 are unreachable, and branch coverage and statement coverage accordingly. In particular, if full settlement coverage is reported
will be less than 100%. for a promise created at location !, then this means that at least
Analogously, in a code fragment [Link](· · · ), where e is an ex- one promise created at ! was ful�lled, and at least one promise
pression that evaluates to a promise ?, the promise created by the created at ! was rejected, meaning that both possible outcomes
call to then will remain pending if ? is never ful�lled causing settle- were observed.
ment coverage to remain less than 100%, and reaction registration We then integrate di�erent execution paths corresponding to the
coverage and reaction execution coverage may remain below 100% same promise to locate its various settlements, registered reactions
for similar reasons. Similar scenarios arise for async functions. and execution of such reactions. Our analysis may miss promises in
unexercised parts of code due to the incomplete nature of dynamic
5 APPROACH analysis. However, the low traditional coverage of these parts will
In this section, we describe our approach and our tool, JS����, for warn the developers �rst. As such, async coverage is most e�ective
automatically measuring and visualizing asynchronous coverage when used complementary to the existing coverage criteria.
criteria as de�ned in section 4.2. We will use the term “async cov- Next, we detect relations between promises such as promise
erage” to refer to the results of settlement, reaction registration, chains and linked promises. By de�nition, a reject reaction at the
and reaction execution coverage combined, as JS���� calculates end of a chain is capable of catching all exceptions caused by any
and reports them collectively. Our approach relies on the instru- promise in that chain. In order to have a more precise representa-
mentation of asynchronous behaviors of a JavaScript application tion of su�cient error handling, our algorithm propagates a reject
on the �y. JS���� executes the instrumented code through the ap- reaction in a chain to all of its ancestor promises. Additionally, for
plication’s test suite to collect execution traces. Next, it utilizes the promises returned by catch, we only require Ful�lled event, and the
traces to locate promises, their reactions, and relations between rest are considered covered. This implies that registering reactions
them such as chains as means to calculate async coverage. Finally, for catch is optional, as ending chains with a catch is a generally
JS���� presents the results and relevant warnings in terms of a accepted way of using promises. Similarly, to avoid unresolvable
textual report and an interactive visualization, embedded within missing coverage warnings, Regful�ll events are optional for then.
the development environment of Visual Studio Code. 4 Without these heuristics achieving 100% async coverage would be
impossible, as there will always be one promise without any han-
5.1 Instrumentation and Trace Collection dlers at the end of any chain. Our algorithm also detects promise
links by locating where a promise ?1 is ful�lled with promise ?2,
To automatically collect trace events described in Table 1 for a
and applies all Ful�lled and Rejected events of ?2 to ?1 as well.
program, we instrument the behavior of JavaScript promises and
Finally, we calculate and visualize the overall async coverage
async functions on the �y. Executing the instrumented code through
by combining async coverage of all promises, and report a list of
running the program’s test suite, we obtain a trace of events created
warnings for all promises’ missing reactions.
as discussed in section 4.1.
4 [Link]
Code Coverage Criteria for Asynchronous Programs ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA

A���������� O������ T���������� C������� A���������� C�������


Name LOC #Tests #Promises Statement(%) Function(%) Branch(%) Settlement(%) Registration(%) Execution(%)
1. Node Fetch 2475 392 12 97 100 94 74 68 59
2. CLA Assistant 20406 315 225 94 94 84 59 76 56
3. Minipass Fetch 1523 57 20 100 100 100 69 53 53
4. Cacache 1878 95 99 100 100 100 66 66 55
5. Github Action Merge Dependabot 485 42 10 100 100 100 100 100 100
6. Co 470 43 10 99 100 98 84 94 94
7. Delete Empty 272 20 8 91 100 80 47 77 46
8. JSON Schema Ref Parser 3070 256 34 88 88 78 80 92 78
9. Async Cache Dedupe 1476 120 13 100 100 100 56 83 57
10. Environment 4374 328 64 81 76 72 51 70 51
11. Socket Cluster Server 2044 72 52 82 70 70 62 50 41
12. Socket Cluster Client 10648 37 13 73 54 53 68 45 36
13. Minipass 840 131 10 100 100 100 87 50 25
14. Grant 2756 495 29 98 97 89 58 70 56
15. Express HTTP Proxy 798 106 57 96 97 87 70 100 80
16. Install 556 31 7 98 98 95 46 100 78
17. Cachegoose 224 27 8 91 92 79 43 80 30
18. Enquirer 10491 179 88 68 63 61 51 49 43
19. Avvio 5460 180 13 94 95 91 50 56 37
20. Matched 274 30 9 96 100 78 60 100 64
AVERAGE 3385 144 39 92 91 85 64 74 57

Table 2: Summary of di�erent coverage metrics reported by JS���� and traditional coverage.

5.3 Visualizing the Asynchronous Coverage 6 EVALUATION


We designed an interactive visualization integrated in VS Code, a In order for our new coverage criteria to be useful, they should
widely used development environment, based on data gathered from be able to reveal untested asynchronous behaviors that are not de-
a preliminary user study we conducted. Users can invoke JS���� tected by traditional coverage criteria. To this end, we �rst measure
on demand (Figure 4, A) to present the results as a textual report coverage according to the new criteria for 20 JavaScript applications,
(Figure 4, B&C) and visual cues overlayed on the code (Figure 4, and study correlations with traditional coverage criteria. Next, we
D–F). JS���� summarizes async coverage results in the Coverage report on experiments that aim to determine (i) whether the new
Overview panel to help with overall understanding of async cov- coverage criteria identify uncovered code that contains bugs, and (ii)
erage (Figure 4, B&C). The overview includes clickable warnings, whether using JS���� can improve developers’ performance when
linked to the locations of their respective promises. JS���� overlays performing tasks related to assessing test adequacy and debugging.
relevant visual cues on the code in the editor. It highlights promises Our evaluation targets the following research questions:
using a red-yellow-green “color spectrum” to determine their level RQ1. Does having high traditional coverage imply adequate
of async coverage (Figure 4, D). As such, the promise in line 82 is testing of asynchronous code?
marked red, indicating minimal async coverage. Similarly, the green RQ2. How can asynchronous coverage criteria facilitate identi-
and yellow highlights on line 92 and 87 indicate fully and partially fying test inadequacies regarding faulty asynchronous code?
covered promises, respectively. Users can obtain more details on RQ3. How does using JS���� help improve developers’ perfor-
the warnings on demand, by hovering the mouse over warning mance in assessing test adequacy and debugging?
cues (Figure 4, E&F). By leveraging the integration of focus within RQ4. What is the performance overhead of JS����?
the context [25], we help maintain programmers’ mental model of
the overall program while working with individual promises. 6.1 Asynchronous Coverage
To answer RQ1, we ran JS���� on 20 web applications, measured
5.4 Implementation three types of asynchronous coverage criteria and studied their
correlations with traditional coverage metrics.
We used [Link] [71] for instrumentation and used JavaScript
Proxies to intercept the execution of built-in features for settling 6.1.1 Experimental Design and Procedure. We adopted a similar
promises and registering their reactions [10]. We utilized program- approach to Zhou et al. [82] and Davis et al. [27] in selecting 20
matic APIs of Mocha [8] and Tap [9] testing frameworks for au- open-source JavaScript applications from GitHub. These projects
tomatic execution of apps and VSCode’s extension development used promises and/or async/await considerably, were accompanied
API to integrate JS���� into its editor. In our implementation of by reasonable test suites, and were compatible with [Link] [7].
coverage criteria as per section 4, functions 5 that create and re- They represented various sizes, domains, and architectures and the
turn a new promise object (similar to [Link]) are treated average statement coverage of the benchmark applications was
specially: When a call to 5 is encountered, a Create event is gen- 92%. We ran JS���� on the subjects by automatically exercising
erated for that call and the promise creation inside 5 is ignored. them through their tests. We measured the results of the three
This custom notion of context-sensitivity [43, 79] during identi- asynchronous coverage metrics, and calculated statement, function,
fying promise-creation sites generally results in lower coverage. and branch coverage using Istanbul, 5 a popular JavaScript coverage
However, the results are more actionable as they enable detecting
lack of coverage when promises are created using helper functions. 5 [Link]
ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA Mohammad Ganji, Saba Alimadadi, and Frank Tip

Statement Function Branch Settlement Registration Execution

n
t

t
n
tio
en
y

en
it

io
or
m

ra
pp

m
ut
0.20 0.10 0.26 1 0.11 0.48

eg
Settlement

tle

e
ec
A

is
t

at
Co

Ca

Ex
eg
0.49 0.56 0.35 0.11 1 0.79

St
Se
Registration

R
Execution 0.31 0.33 0.29 0.48 0.79 1 1. #f56491a express-http-proxy Unhandled Exp. 63 96 74 95
2. #d902776 cla-assistant Unhandled Exp. 58 75 55 94
Table 3: Correlation coe�cients for asynchronous and tradi- 3. #8�7de7 streamroller Unhandled Exp. 60 81 67 100
tional coverage criteria. 4. #8e94a60 eslint_d.js Unhandled Exp. 70 65 65 89
5. #6bcf8ca check�re Unhandled Exp. 40 55 40 -
6. #�f6640 postgres Unhandled Exp. 71 83 60 91
7. #2fc9693 haraka Unhandled Exp. 25 33 33 -
8. #e5615da ioredis Unhandled Exp. 76 69 55 95
9. #146bb3b install Unhandled Exp. 50 100 62 98
tool. We then examined the possible correlations of our proposed 10. #0d�f52 json-schema-ref-parserUnhandled Exp. 80 91 81 94
11. #cbcdfc6 socketcluster-server Unhandled Exp. 63 50 43 79
asynchronous coverage criteria with these traditional criteria. 12. #dfbafbf clamscan Pending Op. 58 89 62 40
13. #48a2ddf cla-assistant Broken Chain 58 75 55 94
6.1.2 Results and Discussion. The results are displayed in Table 2. 14. #b0a86d4avvio Broken Chain 38 58 38 93
15. #68342f8 libnpmteam Unnecessary Async. 40 83 61 100
The �rst four columns show an application’s name, LOC, number
of tests, and number of promise objects observed in the analysis, Table 4: Asynchrony-related JavaScript issues from Github.
respectively. The next three columns depict the results of traditional
coverage criteria, i.e., statement, function, and branch coverage.
Overall, the benchmarks had relatively high traditional coverage version before the �x. We found seven bugs in six of the reposi-
scores, with an average of 92%, 91%, and 85% statement, function, tories. We expanded our search to real bugs from other projects
and branch coverage, respectively. However, it can be seen that on GitHub that met our requirements. We selected a total of 15
settlement, reaction registration, and reaction execution coverage bugs. We then ran JS���� on two versions of each project, one
scores were much lower, with an average of 64%, 74%, and 57%, immediately before and one immediately after each bug �x. We
respectively. This means that, on average, the test suite of a typical used JS����’s output to investigate the inadequacies of the tests in
JavaScript application 0 exercises 92% of the statements but about exercising the asynchronous behavior in code segments related to
65% of the expected outcomes of its promises and async functions. each bug.
0 may not even register over 25% of necessary reactions for async 6.2.2 Results and Discussion. Table 4 displays the results. Columns
operations. Even fewer reactions are actually exercised through 1–3 show the commit pertaining to the bug �x, the application
tests. name, and the bug category, respectively. The next three columns
Next, we examined the potential correlations between asyn- display the async coverage numbers before the �x. The last column
chronous and traditional coverage. We used the Kendall rank corre- shows statement coverage before the �x, reported by [Link].
lation coe�cient, which does not assume a normal distribution. The Overall, JS���� reported insu�cient coverage and relevant warn-
results, depicted in Table 3, show no strong correlations between ings for all bugs, addressing which could have helped detect and �x
traditional and asynchronous coverage metrics. This indicates that the bugs before deployment. Statement coverage, however, showed
traditional coverage metrics are not necessarily equipped for iden- no sign of warning or insu�cient testing for any of the bugs or their
tifying the su�cient execution of asynchronous scenarios through relevant code segments. Next, we discuss the main categories of
tests. In other words, covering more lines or functions does not im- studied bugs and describe how JS����’s reports and warnings could
ply covering more of the asynchronous behavior of an application. have bene�ted the bug �nding process through two examples.
Overall, while the high traditional coverage scores raise con�dence
in su�cient testing of the code, they are not equipped with identifying Unhandled Exceptions. Developers often neglect to test excep-
shortcomings of the tests in asynchronous scenarios. For instance, tional executions of asynchronous operations [15]. While current
while 92% of the statements are exercised on average, only 57% of the coverage criteria can indicate insu�cient testing of conditions and
expected reactions of asynchronous operations are invoked. branches, they are unable to detect insu�cient testing of alternative
scenarios for asynchronous operations, such as missing reactions
6.2 Asynchronous Coverage and Test for rejected asynchronous operations or missing error handling.
E�ectiveness (Example A) Eslint_d.js is an application that daemonizes ES-
Lint [4] for higher performance and has >30k weekly downloads
To address RQ2, we used JS���� and Istanbul to examine both types
on the NPM registry (Table 4, row 4). It caches a single linter ob-
of coverage for code snippets related to previously resolved issues
ject to reduce overhead. Line 272 of the left code snippet in Fig-
on GitHub. A main application of coverage criteria is identifying
ure 5-A shows how the async function getCache is invoked to asyn-
code segments that may contain bugs due to insu�cient coverage,
chronously retrieve a cached ESLint linter object from a given path.
which can be helpful during debugging. As such, given a set of
The program, using await, waits until this promise ful�lls. A bug
known bugs, we investigated (1) if traditional coverage criteria
was reported in this method despite the full coverage of this code
raise warnings about inadequate testing of faulty asynchronous
segment by the tests, as depicted by the green markings by the line
code and (2) if JS���� could have helped discover these bugs.
numbers. It stated that the application crashes with an unhandled
6.2.1 Experimental Design and Procedure. We searched the reposi- promise exception if the path given to getCache cannot be resolved.
tories of the projects in Table 2 for issues that 1) involved promises The proposed �x added a try/catch around the call to getCache
and/or async/await, 2) were closed with the �xes linked to the to allow handling exceptions caused by the rejected promise and
relevant commits, and 3) had complete statement coverage in the prevent further crashes (Figure 5-A, right snippet, lines 273–278).
Code Coverage Criteria for Asynchronous Programs ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA

Figure 5: JS���� results (highlights and warnings overlayed on code) vs. Istanbul results (markings by the line numbers).

A corresponding test was also added to the test suite that simulates Users had reported issues where the web interface shows an
the exception and exercises the catch block (lines 275-278). updated status for a pull request, whereas on GitHub, the repository
This bug had remained undetected in production for four months. is still pending CLA Assistant’s update. Two other preceding issues
However, running JS���� on the faulty version of the code reported vaguely report the same bug but were unable to reproduce it. 7
insu�cient coverage in terms of a missing reject reaction for the JS���� reported low async coverage for the promise on line 146
promise returned by getCache, shown as the highlighted code on before the �x (Figure 5-B). The warning states that the promise has
line 272 and the "Missing error handler" warning message box not settled and has no reactions, suggesting a �x through adding
(Figure 5-A). Having had access to JS���� during testing could have a then or await statement. This matches the �x provided by the de-
helped reveal this bug before production. velopers for the original issue, which added an await before the call
Our results in Table 4 showed multiple instances of unhandled to updateForClaNotRequired to wait for the function’s completion
exceptions, similarly missed by the applications’ tests. Row 3 is before sending a response to user (line 146).
an example where developers managed to achieve 100% statement
Pending Operations. If not explicitly settled, asynchronous oper-
coverage, while still failing to detect a missing reject reaction caus-
ations remain pending, causing nontermination or memory leaks.
ing a crash. Consider our �rst motivating example from section 3.1.
Such problems often happen as a result of developers treating asyn-
Ambiguous reports mention the same issue two years before the
chronous code similar to synchronous code, such as incorrectly
�x. The issue persisted to a point where it had damaged the users’
calling return inside the promise executor function to denote its
trust, with a user calling CLA Assistant a phishing tool. 6
completion instead of calling resolve as is the case in Table 4,
Broken Promise Chains. JavaScript programs will not wait for the row 12. For these cases, JS���� reports missing ful�llment and low
completion of asynchronous operations, unless explicitly speci�ed. settlement coverage for the pending promise.
In other words, the execution of operations that depend on the com-
Unnecessary Asynchrony. Developers may complicate code by
pletion of a promise is reliant on properly chaining them through
using promises where asynchrony is not required. They may also
promise reactions or await statements. Developers can mistakenly
nest promises, causing unanticipated broken promise chains. While
break the chain of asynchronous operations by not awaiting their
generally less severe, JS���� warns about their missing rejections.
completion [47]. This may alter the �ow of execution leading to
Overall, async coverage criteria can e�ectively expose test inad-
undesired outcomes. Moreover, the outcome of the promise will
equacies related to asynchrony that are not detected by traditional
not be used, and potential exceptions will not be caught, which
coverage metrics. As such, JS���� can help identify parts of code that
can lead to a myriad of issues in programs. Our �rst motivating
contain asynchrony-related bugs in practice despite being covered by
example displayed a case were this mistake led to the CLA Assistant
traditional coverage.
application crashing, caused by an unhandled exception thrown by
an un-awaited promise (section 3.1).
6.3 Usefulness of Asynchronous Coverage to
(Example B) Row 13 of Table 4 shows another issue in CLA
Assistant. Repositories that use CLA Assistant may require contrib- Developers
utors to sign a Contributor License Agreement (CLA) through CLA To address RQ3, we conducted a controlled user experiment to
Assistant’s web interface. When a user signs a CLA through CLA As- investigate the e�ectiveness of JS���� in helping programmers
sistant’s web interface, handleWebhook is invoked (partially shown identify and debug (un)covered JavaScript code.
in Figure 5-B). Upon invocation of the async function updateForCla- 6.3.1 Experimental Design and Procedure. Our experiment had a
NotRequired (line 146), a promise is returned that asynchronously
“between-subject” design to avoid the carryover e�ect. We divided
communicates the status update on the signature to GitHub servers. our participants into two groups: control and experimental groups.
It then sends a con�rmation to the user (line 153). The experimental group had access to a simpli�ed and web-based
6 [Link] and 822] 7 [Link] 697]
ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA Mohammad Ganji, Saba Alimadadi, and Frank Tip

Task Description by providing each task to the participants individually, which they
T1.A Identifying su�ciently tested functions returned after completing the task. To measure accuracy, we used
T1.B Identifying less robust functions (i.e. not su�ciently tested)
T2.A Locating all promises created during testing pre-de�ned rubrics to mark the responses later.
T2.B Identifying promises that are not properly tested Post-study. After the session, the participants responded to a
T3.A Identifying the underlying causes of a failure
T3.B Finding the �x to the failure post-questionnaire form with qualitative data on usefulness of the
Tool used and its limitations.
Table 5: Tasks used in the user study.

version of JS���� results. Both groups had access to the code, as


6.3.2 Results and Discussion. We ran the Shapiro-Wilk normality
well as statement coverage results from Istanbul, loaded on our web-
test on the data, and since the distributions were not normal, we
based user interface with a style similar to JS���� for consistency.
used Mann-Whitney U tests to analyze the results. The results
Variables. Our Independent Variable is the type of tool used,
showed a statistically signi�cant di�erence (28% on average) on
referred to as Tool from hereon, which is a nominal variable with
the total accuracy of responses for the experimental group using
two levels: JScope and Istanbul. We have two continuous Dependent
JS���� (Mean=95%, STDDev=9%), compared to the control group
Variables that represent the developers’ performance in completing
(Mean=74%, STDDev=12%).
the tasks: task completion duration (seconds) and accuracy (%).
The results also showed the control group spent slightly less time
Participants. We sent out recruitment emails to graduate stu-
in total (Mean=33:56, STDDev=4:35), compared to the experimental
dents’ mailing lists. From the replies, we selected the ones who met
group (Mean=36:29, STDDev=5:01), although the di�erence was not
our knowledge requirements of JavaScript development and testing.
statistically signi�cant. The experimental group spent an average
The majority of our participants had a medium-level expertise in
of 12:43, 7.58, and 7:54 minutes for completing T1, T2, and T3,
JavaScript programming, and familiarity with testing. We recruited
respectively. The control group spent 6:42, 11:58, 9:12 minutes for
six male and six female participants, aged 21–35, consisting of 10
performing the same tasks, on average. The results of individual
graduate students and two software engineers, with 1–5 years of ex-
tasks showed that although the experimental group spent more time
perience in software development. We assigned them randomly to
for completing T1 compared to the control group, they performed
experimental and control groups. We balanced the expertise based
all other tasks faster (14%–33% on average). It was expected for the
on our participants’ responses to a pre-questionnaire (section 6.3.1).
experimental group to spend more time on T1 due to the additional
Experimental Object. We used a simpli�ed version of the
learning curve incurred by their unfamiliarity with JS����, and
[Link] �le from Node Fetch, 8 a library implementing browsers’
they still achieved an average of 33% higher accuracy for T1. For the
[Link] in [Link]. For the debugging task, we chose a �xed
remaining tasks, the experimental group performed consistently
bug from Docusaurus, a website building application. 9 The unhan-
faster than the control group, while achieving higher accuracy.
dled reject reaction bug, covered by the tests, led to silent failure of
More Accurate Assessment of Test E�ectiveness. The tasks
the whole application.
involved performing various activities including general function
Tasks. We designed three tasks that pertained to test adequacy
coverage to more speci�c promise coverage, for all of which JS����
and quality assessment (Table 5). T1 and T2 were designed to assess
showed to improve the accuracy of the participants. We had hypoth-
e�ectiveness of tool in helping programmers identify well-tested
esized that JS���� would be most useful for tasks directly involving
and insu�ciently tested functions and promises. T3 was designed
asynchronous interactions. For instance, T2 involved examining
to investigate the usefulness of Tool in helping participants identify
promises and async/await statements, where we expected JS����
the underlying causes of the bug (T3.A) and propose a �x (T3.B).
to be helpful. Using JS���� helped the experimental group perform
Pre-study. All participants �lled a pre-questionnaire form prior
signi�cantly better for T2. They completed this tasks 33% faster
to their session, indicating their demographic information and their
(p=0.02) and 30% more accurately (p=0.04) on average.
experience in programming, JavaScript development, and testing,
Debugging. The e�ectiveness of tests is directly dependent on its
and self-assessed pro�ciency levels. We used this data to fairly
bug �nding capability. Coverage metrics do not directly attribute to
balance the participants between groups. All participants signed a
identifying and �xing bugs. However, they can facilitate the process
consent formed prior to starting the study.
by guiding programmers towards the less tested portions of the code
Training. The participants were given refresher tutorials on
that may contain bugs. Using JS���� helped the experimental group
main concepts of asynchronous JavaScript, coverage , and Istanbul,
in debugging to achieve more accurate answers while spending less
to ensure consistency in the knowledge required for completing
time locating the underlying causes of a failure (T3.A) and �nding a
the tasks.. The experimental group also received a tutorial on using
�x (T3.B). The results were statistically signi�cant for the accuracy
JS����. Both groups were given some time to familiarize themselves
of the proposed �x (T3.B) where experimental group achieved an
with the tools and the setup of the experiment.
average of 37% higher accuracy (p=0.03).
Task Completion. Next, the participants started performing the
Participants feedback. Overall, the experimental group found
tasks (Table 5). The participants were allowed to interact with the
JS���� useful. In particular, they liked the overview of the coverage
code and the tools and write their answers on a Google Doc shared
report, interactions with the overlayed visual cues, and the warning
with the examiner. We measured the duration during the session
messages that guided them towards missing functionality or tests.
8 [Link] Overall, participants using JS���� performed 28% more accurately
9 [Link] in testing and debugging asynchronous code.
Code Coverage Criteria for Asynchronous Programs ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA

6.4 Performance remediation of event races [11, 12, 29, 61], concurrency bugs [78],
We measured the performance of JS���� in terms of its overhead and schedule fuzzers for event-driven programs [26]. The extensive
of instrumentation and test suite execution time by averaging �ve research on bug detection and comprehension of asynchrony con-
executions of each test suite, with and without JS����. Our analysis �rms our argument for the necessity of test adequacy criteria that
for the applications in Table 2 indicates a median of 31 seconds of take into account the asynchrony in JavaScript and other languages.
instrumentation (23–97 seconds). The slowdown factor for execu- Visualization has been e�ectively used for comprehension and
tion of the instrumented code generally ranges 2x–100x (median: modeling event-driven and asynchronous programs [13–15, 76, 77].
15.5x). The slowdown is similar to other instrumentation-based Similar to Seifert et al. [64], we leveraged editor integration to fa-
dynamic analyses for JavaScript [15, 37, 72]. cilitate the comprehension of asynchronous coverage through an
interactive interface.
6.5 Threats to Validity Code coverage is crucial in evaluating the e�ectiveness of test
generation techniques such as feedback-directed random testing
There are threats pertaining to the representativeness of our par- [19, 59, 65], dynamic symbolic execution [23, 35, 66], and search-
ticipants, benchmark projects, or issues. We addressed these by based and evolutionary techniques [33, 34]. Nessie [19] is a feedback-
randomly selecting participants who met the minimum experience directed test generation tool for JavaScript that targets event-driven
requirements and projects of di�erent sizes from di�erent domains asynchrony. Event-driven asynchrony is rapidly being supplanted
that met the prerequisites for using JS����. To mitigate the ex- by promises and async/await because these features lead to a more
aminer’s bias in our user study, we delegated the timekeeping to readable and less error-prone code. However, Nessie does not pro-
the participants, allowing them to decide the start and end time of vide special support for promises and async/await.
each task by handing them the tasks separately and asking them Mutation testing is also used as an alternative approach for
to return it afterwards. We de�ned detailed rubrics for grading the measuring test quality [41, 50]. Despite their e�ectiveness, mutation
accuracy of the results prior to the study to address a similar bias in testing for JavaScript is typically very costly, and has yet to gain
measuring participants’ accuracy. We tried to alleviate the impact of the popularity of code coverage [18, 54, 55, 63].
expertise level in our study by balancing the participants’ expertise
levels based on their responses to our pre-questionnaire. We made
JS���� and our experimental data available to allow reproducibility.
8 CONCLUDING REMARKS
In this paper, we proposed a set of coverage criteria for assessing the
7 RELATED WORK
adequacy of tests with respect to asynchronous program behavior.
While being the most prominent test quality assessment technique We designed an interactive visualization and implemented a tool to
[83], code coverage criteria have always been under scrutiny allow programmers to view async coverage results in a typical de-
about their e�ectiveness [31, 38–40]. The generic nature of tradi- velopment environment. The results of our evaluation showed that
tional coverage criteria has led to the emergence of various domain- async coverage metrics are complementary to traditional metrics
speci�c coverage criteria [16, 44, 51, 68, 69, 74]. Several coverage and can help programmers detect insu�ciencies of tests and related
metrics have been introduced using data-�ow to target concurrency bugs in asynchronous code where traditional metrics cannot. Our
in actor-based [75], concurrent [67, 80], and distributed programs user experiment also demonstrated that our tool helps improve
[62]. Researchers have proposed novel criteria for dynamic web developers’ performance in tasks related to assessing test quality
applications [49, 58, 84, 85], or loosely typed nature of JavaScript and debugging of asynchronous code.
[22], or DOM elements [56]. None of these techniques, however, The coverage criteria presented in this paper are designed for
address the asynchronous execution and its respective challenges. JavaScript. As was pointed out in section 2, similar features have
Event-dependent and asynchronous callbacks form a majority of been added to various programming languages [2, 3, 5, 6], and
untested code in JavaScript [31]. Prior work has used static analy- adapting the coverage criteria to these languages is an interesting
sis to model event-driven JavaScript [47, 48, 70]. Other work has fo- future direction. Another avenue for future work is the development
cused on constructing promise graphs that express the relationships of test generation techniques that aim to improve asynchronous
between promises and relevant code [47] and detecting promise coverage. For example, one could imagine extending Nessie [19]
anti-patterns based on promise graphs [15]. to identify performance- to register reactions on promises returned by function calls in
related anti-patterns involving promises [77] . Arteca et al. [20] previously generated tests.
present a refactoring for enabling additional concurrency by split-
ting and moving await expressions, and Gokhale et al. [36] present a
refactoring for migrating applications from the use of synchronous
9 DATA AVAILABILITY
APIs to equivalent asynchronous APIs. Moreover, dynamic anal-
ysis has been popularly used in JavaScript [13, 14, 45, 60, 76] to JS���� and our experimental data are publicly available [42].
address the imprecision of static analysis in analyzing JavaScript’s
inherent dynamism [17]. Much research in this area targets un-
derstanding, debugging, and testing techniques for programs in ACKNOWLEDGMENTS
general[15, 24, 30, 37, 57, 64, 72] [21, 28, 32, 46, 52, 53, 73], and more This work was supported in part by an NSERC Discovery Grant and
recently for asynchronous JavaScript in particular [15] [72][64]. A National Science Foundation grant CCF-1907727. We are grateful
long line of research projects has focused on the detection and to the participants of our controlled experiments.
ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA Mohammad Ganji, Saba Alimadadi, and Frank Tip

REFERENCES [25] Andy Cockburn, Amy Karlson, and Benjamin B. Bederson. 2009. A review of
[1] 2021. ECMAScript 2021 Language Speci�cation. [Link] overview+detail, zooming, and focus+context interfaces. Comput. Surveys 41, 1,
org/ecma-262/. Article 2 (2009), 31 pages.
[2] 2022. Asynchronous programming with Async and Await. [Link] [26] James C. Davis, Arun Thekumparampil, and Dongyoon Lee. 2017. [Link]:
com/en-us/dotnet/visual-basic/programming-guide/concepts/async/ Accessed Fuzzing the Server-Side Event-Driven Architecture. In Proceedings of the Twelfth
Aug-2022. European Conference on Computer Systems, EuroSys 2017, Belgrade, Serbia, April
[3] 2022. Awaitables, python documentation. [Link] 23-26, 2017, Gustavo Alonso, Ricardo Bianchini, and Marko Vukolic (Eds.). ACM,
[Link]#awaitables Accessed Jan-2023. 145–160.
[4] 2022. Eslint: Pluggable JavaScript Linter. [Link] [27] James C. Davis, Eric R. Williamson, and Dongyoon Lee. 2018. A Sense of Time
Accessed Jan-2023. for JavaScript and [Link]: First-Class Timeouts as a Cure for Event Handler
[5] 2022. Future (Java Platform SE 8 ). [Link] Poisoning. In 27th USENIX Security Symposium (USENIX Security 18). USENIX
java/util/concurrent/[Link] Accessed Aug-2022. Association, Baltimore, MD, 343–359.
[6] 2022. Future<T> class. [Link] [28] Monika Dhok, Murali Krishna Ramanathan, and Nishant Sinha. 2016. Type-Aware
[Link] Accessed Jan-2023. Concolic Testing of JavaScript Programs. In Proceedings of the 38th International
[7] 2022. Graalvm [Link] Runtime. [Link] Accessed Aug-2023. Conference on Software Engineering (Austin, Texas) (ICSE ’16). Association for
[8] 2022. Mocha, the fun, simple, �exible JavaScript test framework. [Link] Computing Machinery, New York, NY, USA, 168–179.
org Accessed Sep-2022. [29] André Takeshi Endo and Anders Møller. 2020. NodeRacer: Event Race Detection
[9] 2022. Node Tap. [Link] Accessed Sep-2022. for [Link] Applications. In 13th IEEE International Conference on Software Testing,
[10] 2022. Proxy - JavaScript. [Link] Validation and Veri�cation, ICST 2020, Porto, Portugal, October 24-28, 2020. IEEE,
Reference/Global_Objects/Proxy Accessed Sep-2022. 120–130.
[11] Christo�er Quist Adamsen, Anders Møller, Saba Alimadadi, and Frank Tip. 2018. [30] Amin Milani Fard and Ali Mesbah. 2013. JSNOSE: Detecting JavaScript Code
Practical AJAX race detection for JavaScript web applications. In Proceedings of Smells. In 2013 IEEE 13th International Working Conference on Source Code Analysis
the 2018 ACM Joint Meeting on European Software Engineering Conference and and Manipulation (SCAM). 116–125.
Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, [31] Amin Milani Fard and Ali Mesbah. 2017. JavaScript: The (Un)Covered Parts. In
Lake Buena Vista, FL, USA, November 04-09, 2018, Gary T. Leavens, Alessandro 2017 IEEE International Conference on Software Testing, Veri�cation and Validation
Garcia, and Corina S. Pasareanu (Eds.). ACM, 38–48. (ICST). 230–240.
[12] Christo�er Quist Adamsen, Anders Møller, Rezwana Karim, Manu Sridharan, [32] Amin Milani Fard, Ali Mesbah, and Eric Wohlstadter. 2015. Generating Fixtures
Frank Tip, and Koushik Sen. 2017. Repairing event race errors by controlling for JavaScript Unit Testing. In Proceedings of the 30th IEEE/ACM International
nondeterminism. In Proceedings of the 39th International Conference on Software Conference on Automated Software Engineering (Lincoln, Nebraska) (ASE ’15).
Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017, Sebastián Uchitel, IEEE Press, 190–200.
Alessandro Orso, and Martin P. Robillard (Eds.). IEEE / ACM, 289–299. [33] Gordon Fraser and Andrea Arcuri. 2011. Evolutionary Generation of Whole Test
[13] Saba Alimadadi, Ali Mesbah, and Karthik Pattabiraman. 2016. Understanding Suites. In Proceedings of the 11th International Conference on Quality Software,
Asynchronous Interactions in Full-Stack JavaScript. In 2016 IEEE/ACM 38th Inter- QSIC 2011, Madrid, Spain, July 13-14, 2011, Manuel Núñez, Robert M. Hierons,
national Conference on Software Engineering (ICSE). 1169–1180. and Mercedes G. Merayo (Eds.). IEEE Computer Society, 31–40.
[14] Saba Alimadadi, Sheldon Sequeira, Ali Mesbah, and Karthik Pattabiraman. 2014. [34] Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: automatic test suite generation
Understanding JavaScript Event-Based Interactions. In Proceedings of the 36th for object-oriented software. In SIGSOFT/FSE’11 19th ACM SIGSOFT Symposium
International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). on the Foundations of Software Engineering (FSE-19) and ESEC’11: 13th European
Association for Computing Machinery, New York, NY, USA, 367–377. Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011,
[15] Saba Alimadadi, Di Zhong, Magnus Madsen, and Frank Tip. 2018. Finding Broken Tibor Gyimóthy and Andreas Zeller (Eds.). ACM, 416–419.
Promises in Asynchronous JavaScript Programs. Proc. ACM Program. Lang. 2, [35] Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: directed auto-
OOPSLA, Article 162 (oct 2018), 26 pages. mated random testing. In Proceedings of the ACM SIGPLAN 2005 Conference on
[16] P. Ammann, J. O�utt, and Hong Huang. 2003. Coverage criteria for logical Programming Language Design and Implementation, Chicago, IL, USA, June 12-15,
expressions. In 14th International Symposium on Software Reliability Engineering, 2005, Vivek Sarkar and Mary W. Hall (Eds.). ACM, 213–223.
2003. ISSRE 2003. 99–107. [36] Satyajit Gokhale, Alexi Turcotte, and Frank Tip. 2021. Automatic migration from
[17] Esben Andreasen, Liang Gong, Anders Møller, Michael Pradel, Marija Selakovic, synchronous to asynchronous JavaScript APIs. Proc. ACM Program. Lang. 5,
Koushik Sen, and Cristian-Alexandru Staicu. 2017. A Survey of Dynamic Analysis OOPSLA (2021), 1–27.
and Test Generation for JavaScript. ACM Comput. Surv. 50, 5, Article 66 (sep [37] Liang Gong, Michael Pradel, Manu Sridharan, and Koushik Sen. 2015. DLint:
2017), 36 pages. Dynamically Checking Bad Coding Practices in JavaScript. In Proceedings of the
[18] J. H. Andrews, L. C. Briand, and Y. Labiche. 2005. Is Mutation an Appropriate Tool 2015 International Symposium on Software Testing and Analysis (Baltimore, MD,
for Testing Experiments?. In Proceedings of the 27th International Conference on USA) (ISSTA 2015). Association for Computing Machinery, New York, NY, USA,
Software Engineering (St. Louis, MO, USA) (ICSE ’05). Association for Computing 94–105.
Machinery, New York, NY, USA, 402–411. [38] Hadi Hemmati. 2015. How E�ective Are Code Coverage Criteria?. In 2015 IEEE
[19] Ellen Arteca, Sebastian Harner, Michael Pradel, and Frank Tip. 2022. Nessie: International Conference on Software Quality, Reliability and Security. 151–156.
Automatically Testing JavaScript APIs with Asynchronous Callbacks. In 44th [39] Michael Hilton, Jonathan Bell, and Darko Marinov. 2018. A Large-Scale Study of
IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pitts- Test Coverage Evolution. Association for Computing Machinery, New York, NY,
burgh, PA, USA, May 25-27, 2022. ACM, 1494–1505. USA, 53–63.
[20] Ellen Arteca, Frank Tip, and Max Schäfer. 2021. Enabling Additional Parallelism [40] Laura Inozemtseva and Reid Holmes. 2014. Coverage is Not Strongly Correlated
in Asynchronous JavaScript Applications (Artifact). Dagstuhl Artifacts Series 7, 2 with Test Suite E�ectiveness. In Proceedings of the 36th International Conference on
(2021), 5:1–5:6. Software Engineering (Hyderabad, India) (ICSE 2014). Association for Computing
[21] Shay Artzi, Julian Dolby, Simon Holm Jensen, Anders Møller, and Frank Tip. Machinery, New York, NY, USA, 435–445.
2011. A Framework for Automated Testing of Javascript Web Applications. In [41] Yue Jia and Mark Harman. 2011. An Analysis and Survey of the Development
Proceedings of the 33rd International Conference on Software Engineering (Waikiki, of Mutation Testing. IEEE Transactions on Software Engineering 37, 5 (2011),
Honolulu, HI, USA) (ICSE ’11). Association for Computing Machinery, New York, 649–678.
NY, USA, 571–580. [42] JScope 2023. JScope. [Link]
[22] Sora Bae, Joonyoung Park, and Sukyoung Ryu. 2017. Partition-Based Coverage [43] Vineeth Kashyap, Kyle Dewey, Ethan A. Kuefner, John Wagner, Kevin Gibbons,
Metrics and Type-Guided Search in Concolic Testing for JavaScript Applica- John Sarracino, Ben Wiedermann, and Ben Hardekopf. 2014. JSAI: A Static Analy-
tions. In Proceedings of the 5th International FME Workshop on Formal Methods sis Platform for JavaScript. In Proceedings of the 22nd ACM SIGSOFT International
in Software Engineering (Buenos Aires, Argentina) (FormaliSE ’17). IEEE Press, Symposium on Foundations of Software Engineering (Hong Kong, China) (FSE
72–78. 2014). Association for Computing Machinery, New York, NY, USA, 121–132.
[23] Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. [44] Kenneth Koster and David Kao. 2007. State coverage: A structural test adequacy
Engler. 2006. EXE: automatically generating inputs of death. In Proceedings of criterion for behavior checking. 541–544.
the 13th ACM Conference on Computer and Communications Security, CCS 2006, [45] Ding Li, James Mickens, Suman Nath, and Lenin Ravindranath. 2015. Domino:
Alexandria, VA, USA, October 30 - November 3, 2006, Ari Juels, Rebecca N. Wright, Understanding Wide-Area, Asynchronous Event Causality in Web Applications.
and Sabrina De Capitani di Vimercati (Eds.). ACM, 322–335. In Proceedings of the Sixth ACM Symposium on Cloud Computing (Kohala Coast,
[24] Xiaoning Chang, Wensheng Dou, Jun Wei, Tao Huang, Jinhui Xie, Yuetang Deng, Hawaii) (SoCC ’15). Association for Computing Machinery, New York, NY, USA,
Jianbo Yang, and Jiaheng Yang. 2021. Race Detection for Event-Driven [Link] 182–188.
Applications. In 2021 36th IEEE/ACM International Conference on Automated [46] Guodong Li, Esben Andreasen, and Indradeep Ghosh. 2014. SymJS: Automatic
Software Engineering (ASE). 480–491. Symbolic Testing of JavaScript Web Applications. In Proceedings of the 22nd ACM
Code Coverage Criteria for Asynchronous Programs ESEC/FSE ’23, December 3–9, 2023, San Francisco, CA, USA

SIGSOFT International Symposium on Foundations of Software Engineering (Hong [67] Elena Sherman, Matthew B. Dwyer, and Sebastian Elbaum. 2009. Saturation-
Kong, China) (FSE 2014). Association for Computing Machinery, New York, NY, Based Testing of Concurrent Programs. In Proceedings of the 7th Joint Meeting of
USA, 449–459. the European Software Engineering Conference and the ACM SIGSOFT Symposium
[47] Magnus Madsen, Ondřej Lhoták, and Frank Tip. 2017. A Model for Reasoning on The Foundations of Software Engineering (Amsterdam, The Netherlands) (ES-
about JavaScript Promises. Proc. ACM Program. Lang. 1, OOPSLA, Article 86 (oct EC/FSE ’09). Association for Computing Machinery, New York, NY, USA, 53–62.
2017), 24 pages. [68] S. Sinha and M.J. Harrold. 1999. Criteria for testing exception-handling con-
[48] Magnus Madsen, Frank Tip, and Ondřej Lhoták. 2015. Static Analysis of Event- structs in Java programs. In Proceedings IEEE International Conference on Software
Driven [Link] JavaScript Applications. In Proceedings of the 2015 ACM SIGPLAN Maintenance - 1999 (ICSM’99). ’Software Maintenance for Business Change’ (Cat.
International Conference on Object-Oriented Programming, Systems, Languages, No.99CB36360). 265–274.
and Applications (Pittsburgh, PA, USA) (OOPSLA 2015). Association for Comput- [69] Khashayar Etemadi Someoliayi, Sajad Jalali, Mostafa Mahdieh, and Seyed-Hassan
ing Machinery, New York, NY, USA, 505–519. Mirian-Hosseinabadi. 2019. Program State Coverage: A Test Coverage Metric
[49] Alberto Martin-Lopez, Sergio Segura, and Antonio Ruiz-Cortés. 2019. Test Cover- Based on Executed Program States. In 2019 IEEE 26th International Conference on
age Criteria for RESTful Web APIs. Association for Computing Machinery, New Software Analysis, Evolution and Reengineering (SANER). 584–588.
York, NY, USA, 15–21. [70] Thodoris Sotiropoulos and Benjamin Livshits. 2019. Static Analysis for Asyn-
[50] author. Memon, Atif. 2019. Mutation Testing Advances: An Analysis and Survey. chronous JavaScript Programs. In 33rd European Conference on Object-Oriented
Advances in Computers, Vol. 112. Academic Press„ Cambridge, MA :. Programming, ECOOP 2019, July 15-19, 2019, London, United Kingdom (LIPIcs,
[51] Atif M. Memon, Mary Lou So�a, and Martha E. Pollack. 2001. Coverage Criteria Vol. 134), Alastair F. Donaldson (Ed.). Schloss Dagstuhl - Leibniz-Zentrum für
for GUI Testing. SIGSOFT Softw. Eng. Notes 26, 5 (sep 2001), 256–267. Informatik, 8:1–8:30.
[52] Amin Milani Fard, Mehdi Mirzaaghaei, and Ali Mesbah. 2014. Leveraging Existing [71] Haiyang Sun, Daniele Bonetta, Christian Humer, and Walter Binder. 2018. Ef-
Tests in Automated Test Generation for Web Applications. In Proceedings of �cient Dynamic Analysis for [Link]. In Proceedings of the 27th International
the 29th ACM/IEEE International Conference on Automated Software Engineering Conference on Compiler Construction (Vienna, Austria) (CC 2018). Association for
(Vasteras, Sweden) (ASE ’14). Association for Computing Machinery, New York, Computing Machinery, New York, NY, USA, 196–206.
NY, USA, 67–78. [72] Haiyang Sun, Daniele Bonetta, Filippo Schiavio, and Walter Binder. 2019. Rea-
[53] Shabnam Mirshokraie and Ali Mesbah. 2012. JSART: JavaScript Assertion-Based soning about the [Link] Event Loop Using Async Graphs. In Proceedings of the
Regression Testing. In Web Engineering, Marco Brambilla, Takehiro Tokuda, and 2019 IEEE/ACM International Symposium on Code Generation and Optimization
Robert Tolksdorf (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 238–252. (Washington, DC, USA) (CGO 2019). IEEE Press, 61–72.
[54] Shabnam Mirshokraie, Ali Mesbah, and Karthik Pattabiraman. 2013. PYTHIA: [73] Haiyang Sun, Andrea Rosà, Daniele Bonetta, and Walter Binder. 2021. Auto-
Generating test cases with oracles for JavaScript applications. In 2013 28th matically Assessing and Extending Code Coverage for NPM Packages. In 2021
IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE/ACM International Conference on Automation of Software Test (AST). 40–49.
610–615. [74] Youcheng Sun, Xiaowei Huang, Daniel Kroening, James Sharp, Matthew Hill, and
[55] Shabnam Mirshokraie, Ali Mesbah, and Karthik Pattabiraman. 2014. Guided Rob Ashmore. 2019. Structural Test Coverage Criteria for Deep Neural Networks.
mutation testing for JavaScript web applications. IEEE Transactions on Software ACM Trans. Embed. Comput. Syst. 18, 5s, Article 94 (oct 2019), 23 pages.
Engineering 41, 5 (2014), 429–444. [75] Samira Tasharo�, Michael Pradel, Yu Lin, and Ralph Johnson. 2013. Bita: Coverage-
[56] Mehdi Mirzaaghaei and Ali Mesbah. 2014. DOM-Based Test Adequacy Criteria guided, automatic testing of actor programs. In 2013 28th IEEE/ACM International
for Web Applications. In Proceedings of the 2014 International Symposium on Conference on Automated Software Engineering (ASE). 114–124.
Software Testing and Analysis (San Jose, CA, USA) (ISSTA 2014). Association for [76] Ena Tominaga, Yoshitaka Arahori, and Katsuhiko Gondow. 2019. AwaitViz: A
Computing Machinery, New York, NY, USA, 71–81. Visualizer of JavaScript’s Async/Await Execution Order. In Proceedings of the
[57] Erdal Mutlu, Serdar Tasiran, and Benjamin Livshits. 2015. Detecting JavaScript 34th ACM/SIGAPP Symposium on Applied Computing (Limassol, Cyprus) (SAC
Races That Matter. In Proceedings of the 2015 10th Joint Meeting on Foundations of ’19). Association for Computing Machinery, New York, NY, USA, 2515–2524.
Software Engineering (Bergamo, Italy) (ESEC/FSE 2015). Association for Comput- [77] Alexi Turcotte, Michael D. Shah, Mark W. Aldrich, and Frank Tip. 2022. DrAsync:
ing Machinery, New York, NY, USA, 381–392. Identifying and Visualizing Anti-Patterns in Asynchronous JavaScript. In Pro-
[58] Hung Nguyen, Hung Phan, Christian Kästner, and Nguyen Tien. 2019. Exploring ceedings of the 44th International Conference on Software Engineering (Pittsburgh,
output-based coverage for testing PHP web applications. Automated Software Pennsylvania) (ICSE ’22). Association for Computing Machinery, New York, NY,
Engineering 26 (03 2019). USA, 774–785.
[59] Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, and Thomas Ball. 2007. [78] Jie Wang, Wensheng Dou, Yu Gao, Chushu Gao, Feng Qin, Kang Yin, and Jun
Feedback-Directed Random Test Generation. In 29th International Conference on Wei. 2017. A comprehensive study on real world concurrency bugs in [Link]. In
Software Engineering (ICSE 2007), Minneapolis, MN, USA, May 20-26, 2007. IEEE Proceedings of the 32nd IEEE/ACM International Conference on Automated Software
Computer Society, 75–84. Engineering, ASE 2017, Urbana, IL, USA, October 30 - November 03, 2017, Grigore
[60] Ohad Rau, Caleb Voss, and Vivek Sarkar. 2021. Linear Promises: Towards Safer Rosu, Massimiliano Di Penta, and Tien N. Nguyen (Eds.). IEEE Computer Society,
Concurrent Programming. In 35th European Conference on Object-Oriented Pro- 520–531.
gramming (ECOOP 2021) (Leibniz International Proceedings in Informatics (LIPIcs), [79] Shiyi Wei and Barbara G. Ryder. 2015. Adaptive Context-sensitive Analysis for
Vol. 194), Anders Møller and Manu Sridharan (Eds.). Schloss Dagstuhl – Leibniz- JavaScript. In 29th European Conference on Object-Oriented Programming (ECOOP
Zentrum für Informatik, Dagstuhl, Germany, 13:1–13:27. 2015) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 37), John Tang
[61] Veselin Raychev, Martin T. Vechev, and Manu Sridharan. 2013. E�ective race Boyland (Ed.). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl,
detection for event-driven programs. In Proceedings of the 2013 ACM SIGPLAN Germany, 712–734.
International Conference on Object Oriented Programming Systems Languages & [80] Cheer-Sun D. Yang, Amie L. Souter, and Lori L. Pollock. 1998. All-Du-Path Cover-
Applications, OOPSLA 2013, part of SPLASH 2013, Indianapolis, IN, USA, October age for Parallel Programs. In Proceedings of the 1998 ACM SIGSOFT International
26-31, 2013, Antony L. Hosking, Patrick Th. Eugster, and Cristina V. Lopes (Eds.). Symposium on Software Testing and Analysis (Clearwater Beach, Florida, USA)
ACM, 151–166. (ISSTA ’98). Association for Computing Machinery, New York, NY, USA, 153–162.
[62] Christopher Robinson-Mallett, Robert M. Hierons, and Peter Liggesmeyer. 2006. [81] Yucheng Zhang and Ali Mesbah. 2015. Assertions Are Strongly Correlated
Achieving Communication Coverage in Testing. SIGSOFT Softw. Eng. Notes 31, 6 with Test Suite E�ectiveness. In Proceedings of the 2015 10th Joint Meeting on
(nov 2006), 1–10. Foundations of Software Engineering (Bergamo, Italy) (ESEC/FSE 2015). Association
[63] Diego Rodríguez-Baquero and Mario Linares-Vásquez. 2018. Mutode: Generic for Computing Machinery, New York, NY, USA, 214–224.
JavaScript and [Link] Mutation Testing Tool. In Proceedings of the 27th ACM [82] Jingyao Zhou, Lei Xu, Gongzheng Lu, Weifeng Zhang, and Xiangyu Zhang. 2023.
SIGSOFT International Symposium on Software Testing and Analysis (Amsterdam, NodeRT: Detecting Races in [Link] Applications Practically. In Proceedings of
Netherlands) (ISSTA 2018). Association for Computing Machinery, New York, NY, the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis
USA, 372–375. (Seattle, WA, USA) (ISSTA 2023). Association for Computing Machinery, New
[64] Dominik Seifert, Michael Wan, Jane Hsu, and Benson Yeh. 2022. An Asynchronous York, NY, USA, 1332–1344.
Call Graph for JavaScript. In 2022 IEEE/ACM 44th International Conference on [83] Hong Zhu, Patrick A. V. Hall, and John H. R. May. 1997. Software Unit Test
Software Engineering: Software Engineering in Practice (ICSE-SEIP). 29–30. Coverage and Adequacy. ACM Comput. Surv. 29, 4 (dec 1997), 366–427.
[65] Marija Selakovic, Michael Pradel, Rezwana Karim, and Frank Tip. 2018. Test [84] Yunxiao Zou, Zhenyu Chen, Yunhui Zheng, Xiangyu Zhang, and Zebao Gao. 2014.
generation for higher-order functions in dynamic languages. Proc. ACM Program. Virtual DOM Coverage for E�ective Testing of Dynamic Web Applications. In
Lang. 2, OOPSLA (2018), 161:1–161:27. Proceedings of the 2014 International Symposium on Software Testing and Analysis
[66] Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: a concolic unit testing (San Jose, CA, USA) (ISSTA 2014). Association for Computing Machinery, New
engine for C. In Proceedings of the 10th European Software Engineering Conference York, NY, USA, 60–70.
held jointly with 13th ACM SIGSOFT International Symposium on Foundations of [85] Yunxiao Zou, Chunrong Fang, Zhenyu Chen, Xiaofang Zhang, and Zhihong Zhao.
Software Engineering, 2005, Lisbon, Portugal, September 5-9, 2005, Michel Wer- 2013. A Hybrid Coverage Criterion for DynamicWeb Testing (S). In SEKE.
melinger and Harald C. Gall (Eds.). ACM, 263–272.

You might also like