Web API Search
Web API Search
1 Introduction
A Web API is an application programming interface exposed via the Web, com-
monly used as representational state transfer (RESTful) services through Hyper-
Text Transfer Protocol (HTTP). As the Internet industry progresses, Web APIs
become more concrete with emerging best practices and more popular for mod-
ern application development [1]. Web APIs provide an interface for easy software
development through abstracting a variety of complex data and web services,
c Springer Nature Switzerland AG 2020
W.-S. Ku et al. (Eds.): ICWS 2020, LNCS 12406, pp. 96–113, 2020.
https://doi.org/10.1007/978-3-030-59618-7_7
Web API Search: Discover Web API and Its Endpoint 97
Fig. 1. Examples of ProgrammableWeb page of the Dropbox API (left) and documen-
tations for Dropbox API (right).
which can greatly accelerate application development. Web APIs have also been
widely used by technical companies due to their inherent flexibility. For example,
Twitter offers public APIs to enable third parties to access and analyze historical
tweets. Amazon provides free advertising APIs to developers as a way to promote
their products. On the other hand, developers also benefit from the burgeoning
API economy [2]. Developers can access various datasets and services via Web
APIs and incorporate these resources into their development [3].
Due to these advantages, Web APIs have been widely developed in recent
years. According to ProgrammableWeb1 , there are more than 22,000 public Web
APIs available today, and this number is rapidly increasing. Moreover, an API
has a number of endpoints, which specify the location of resources that develop-
ers need to access to carry out their functions. As an example shown in Fig. 1,
the Dropbox API has 136 endpoints, and each endpoint has its own concrete
function. In order to achieve the given function, an HTTP request has to be sent
to the corresponding endpoint using a given HTTP method, as shown in Fig. 1.
The proliferation of Web APIs, however, makes it difficult for developers
to search and discover a desired API and its endpoint. As aforementioned, the
developer needs to know the endpoint in order to call an API. Therefore, the
API level search is insufficient. In light of this, in this paper, we focus on building
a Web API search system that can provide endpoint level search results based
on a natural language query describing developers’ needs. With the proposed
dataset collection and generation methods and the two-step transfer learning
model, the API search system can achieve high accuracy to search a Web API
and its endpoint to satisfy developers’ requirements.
2
Available at: https://rapidapi.com.
3
Available at: https://apiharmony-open.mybluemix.net.
4
Available at: https://stackoverflow.com.
Web API Search: Discover Web API and Its Endpoint 99
– Currently, the key bottleneck for building a machine learning-based Web API
search system is the lack of publicly available training datasets. Although the
services like API Harmony provide structured API specification, where we
can collect information such as API and its endpoint descriptions to build the
training dataset, the majority of existing web APIs lack such structured API
specifications. For example, API Harmony only supports 1,179 APIs, which
is just a small percentage of all the Web APIs. In this paper, we propose a
method to collect useful information directly from API documentation, and
then build a training dataset, which can support more than 9,000 Web APIs
for search purposes.
– As the information we collected, in particular, the endpoint descriptions from
the API documentation may contain a lot of noise. We propose deep learning
methods to predict correct API endpoint description sentences. The evalua-
tion results show that decent accuracy can be achieved.
– We propose a two-step transfer learning method to support endpoint level
Web API search, whereas all the previous works only support API level search.
The evaluation results show that our proposed model can achieve high search
accuracy.
100 L. Liu et al.
each HTML page, where i ∈ [1, ..l] and l represents the total number of end-
points in one HTML page. We extract information from semi-structured pages
by processing HTML table tags and the table headers. To this end, we define
a placeholder Pi that contains both Ei and Ei,D . Pi represents a section of the
HTML page which appears between two same HTML headers ([h1 , h2 , ..., h6 ]),
and Ei is located in the section. Therefore, a raw endpoint description is the text
around API endpoint as EiD = [SM , SN ] which denotes M sentences before and
N sentences after appearing Ei inside Pi . Algorithm 1 explains the detailed infor-
mation of extracting endpoint description for the given endpoint list. By using
the proposed method and setting M = 6 and N = 6, we collected 2, 822, 997
web-pages with the size of 208.6 GB for more than 20, 000 public APIs. Such
huge raw data contains a lot of noises. Therefore, in Sect. 3.3, we propose a deep
learning method to predict and filter the correct endpoint descriptions from the
raw descriptions.
the ground-truth endpoint description in API Harmony using spaCy5 , which cal-
culates the similarity score by comparing word vectors. The sentence with the
highest similarity can be considered as the correct endpoint description (i.e.
ground-truth selected by API Harmony). The remaining N + M − 1 sentences
which are not selected by API Harmony as endpoint descriptions can be treated
as incorrect endpoint descriptions.
If the ground-truth endpoint description in API Harmony for a given end-
point contains K sentences where K > 1, in this case, “K-grams” of M sentences
before and N sentences after need to be generated. For example, if K = 2 which
means ground-truth endpoint description contains two sentences (GTS1 , GTS2 ),
we need to collect “2-grams” sentence pairs (Ti , Tj ) from API documentation,
such as (before 3rd, before 2nd), (before 2rd, before 1st), (after 1st, after 2nd),
(after 2nd, after 3rd) where “before 3rd” means the 3rd sentence before end-
point name. After that, the average similarity score is computed according to
the following equation:
where Sim represents the similarity score between two given inputs. Simi-
larly, the “K-gram” with the highest similarity can be considered as the correct
endpoint descriptions (i.e. selected by API Harmony). The remaining “K-grams”
which are not selected by API Harmony can be treated as incorrect endpoint
descriptions.
For each correct or incorrect endpoint descriptions (with a label 1 or 0), we
compute the following features to be used for the deep learning models:
5
Available at: https://spacy.io/.
104 L. Liu et al.
Note that the features for the description classification model are selected
by observing ground-truth examples. For example, we observed that in many
cases, the endpoint name and its description are similar, then we utilized spaCy
similarity as one of the features. Extensive convincing examples can be found: in
Gmail API, the endpoint “/userId/drafts” has the description “lists the drafts in
the user’s mailbox”; in Spotify API, the endpoint “/albums” has the description
“get several albums” etc. Other features are also based on such observations.
Fig. 5. Proposed deep learning models to predict correct endpoint descriptions: (a)
CNN+ANN; (b) LSTM+ANN.
Table 1. Collected training dataset. Table 2. Testing results for the deep
learning models in Fig. 5 and tradi-
Training Dataset # of Records tional machine learning models.
Correct endpoint 5,464
description sentences Models Testing
accuracy
Incorrect endpoint 33,757
description sentences Decision Tree [19] 76.64%
Random Forest [20] 79.92%
CNN+ANN (Fig. 5(a)) 90.31%
LSTM+ANN (Fig. 5(b)) 98.13%
Since the training dataset for correct and incorrect endpoint description sen-
tences is imbalanced, we firstly randomly select 5,588 sentences out of the 33,757
incorrect endpoint description sentences, and together with the 5,464 correct
endpoint description sentences, we train the deep learning models depicted in
Fig. 5. We use 65%, 20%, and 15% of the dataset for training, validation, and
testing respectively. The testing result is shown in Table 2, which shows that
both CNN+ANN and LSTM+ANN models can achieve more than 90% testing
accuracy, and the LSTM+ANN model outperforms the CNN+ANN model. For
comparison purposes, we also evaluate the performance of two traditional learn-
ing models: Decision Tree and Random Forest. Decision Tree is a flowchart graph
or diagram that helps explore all of the decision alternatives and their possible
outcomes [19]. Random Forest is an ensemble learning method for classification,
regression, and other tasks, that operates by constructing a multitude of deci-
sion trees at training time and outputting the class based on voting [20]. The
106 L. Liu et al.
testing result in Table 2 shows that the proposed deep learning models greatly
outperform the traditional learning models such as Decision Tree and Random
Forest.
Blind Testing and Model Improvement: In the above testing, the training and
testing datasets are all retrieved from API documentation related to the APIs
included in API Harmony. However, API Harmony only covers a small percent-
age of Web APIs, and most of these APIs are made by big providers which are
likely to have high-quality documentations. However, as we are targeting a wider
coverage of Web APIs in the recommendation system, it is essential to evaluate
the model performance over a large API documentation corpus, in particular for
those not covered by API Harmony.
To conduct this blind testing, we manually label 632 sentences in documen-
tations of APIs that are not covered by API Harmony. We compute all the fea-
tures of these 632 sentences and send them as input to our trained LSTM+ANN
model aforementioned. The results are summarized in Table 3. From the results,
we can see that with only one trained model, the blind testing performance is
not good as the model cannot distinguish the incorrect endpoint descriptions
well. The reason is that when we train the model, we use the random under-
sampling method in order to have a balanced training dataset between correct
and incorrect description sentences. However, this method may discard poten-
tially useful information which could be important for training the model. The
samples chosen by random under-sampling may be biased samples, and thus,
they may not be an accurate representation and provide sufficient coverage for
incorrect descriptions, thereby, causing inaccurate results. To improve the model
to cover a wider range of APIs, we applied an ensemble approach, as shown in
Fig. 4.
In Fig. 4, each model Mi is one trained model mentioned above. Here, the
LSTM+ANN model is used as it outperforms CNN+ANN. Each Mi is trained
with correct endpoint description sentences and different incorrect endpoint
description sentences. This is achievable because we have much more incor-
rect endpoint descriptions sentences than the correct ones. In this case, each
Web API Search: Discover Web API and Its Endpoint 107
Mi makes different decisions based on the learned features. They can predict
independently and vote to jointly decide whether the input sentence is correct
endpoint descriptions or not.
Table 3 shows the performance of the ensemble approach. It can be seen that
the ensemble approach can improve the overall performance in terms of accuracy
and precision, compared with only one model. Moreover, the ensemble approach
with 7 models outperforms others, which will be used in the rest of this paper.
The only issue is that some incorrect endpoint descriptions are wrongly predicted
as correct endpoint descriptions, which result in more false-positive predictions
and will introduce some noise to the training dataset of the API search model.
In the previous steps, we have collected the data regarding API titles, API key-
words, API descriptions, and correct endpoint descriptions. The API descriptions
and correct endpoint descriptions may contain many sentences. Therefore, we
firstly conduct sentence tokenization, and in turn, for each of the tokenized sen-
tence, text normalization is carried out, including conducting word-stemming
lemmatization and removing stop words, symbols, special characters, HTML
tags, unnecessary spaces, and very short description sentence with only 1 word.
After that, these processed sentences are used to build the training dataset.
We consider there are 4 major types of queries when a developer wants to
search a Web API:
– Question type queries: developers may enter a question to search a Web API,
for example, a question type query might be “which API can get glucose?”
– Command type queries: instead of asking a question, developers may directly
enter a command type query to search an API, such as “get weather infor-
mation.”
– Keyword type queries: in many cases, developers may just input a couple of
keywords to search an API. One example query is “fitness, health, wearable.”
– API title-based queries: in some cases, developers may already have an idea
regarding what API to use. Developers may just need to search an endpoint
for this given API. One example of such a query is “post photos to Instagram.”
In this case, the search engine should return the endpoint of the Instagram
API, rather than the endpoint of other similar APIs.
Spacy, NLTK, CoreNLP, etc. In this work, we use SpaCy, and the annotation of
the POS tagging6 and dependencies7 can be found in SpaCy documentations.
Considering the fact that most of the sentences in API descriptions and
endpoint descriptions are long, whereas in real practice, developers are unlikely
to enter a very long query to a search engine, therefore we use POS tagging and
dependency parsing to synthesize simplified question-type and command-type
queries. We defined several rules, and if the description sentence satisfies a rule,
simplified question-type and command-type queries are generated. For example,
a rule is defined as
Such a rule is feasible because the syntactic relations form a tree, every word
has exactly one head. We can, therefore, iterate over the arcs in the depen-
dency tree by iterating over the words in the sentence. If the original endpoint
description sentence is “this endpoint gets a music playlist according to an artist
ID,” by using the rule, we can generate a simplified question-type query “which
endpoint get a music playlist?”, and a simplified command-type query “get a
music playlist”. The training dataset includes the API and endpoint description
sentence, as well as the simplified question-type query and simplified command-
type query. If an API or endpoint description sentence cannot be applied to
any of the pre-defined rules, no simplified question-type query and simplified
command-type query can be generated. In this case, only the API or endpoint
description sentences will be included in the training dataset.
The keyword-based queries are generated from the API keywords that we col-
lected from ProgrammableWeb. For example, the Spotify API has two category
keywords “music” and “data mining” on ProgrammableWeb. So the keyword
query can be “music, data mining”. The keyword-based query can also be gen-
erated by concatenating the noun phrases of an API or endpoint description
sentence. Given the same example, “this endpoint gets a music playlist accord-
ing to an artist ID,” the corresponding keyword-based query is “this endpoint,
a music playlist, an artist ID.”
The API title-based queries can be generated by using the API title col-
lected from ProgrammableWeb. In addition, to emulate an API title-based query,
we also attach the API title to the end of the short question-type queries and
command-type queries. For the same example, the API title-based queries are
“which endpoint get a music playlist with Spotify?” and “get a music playlist
with Spotify.”
6
https://spacy.io/api/annotation#pos-tagging.
7
https://spacy.io/api/annotation#dependency-parsing.
Web API Search: Discover Web API and Its Endpoint 109
By using the proposed methods, we can build a training dataset for API/
endpoint search. The dataset has 3 columns, which are the query, its correspond-
ing API and endpoint respectively. Note that, we cannot judge which endpoint
should be used for the synthesized queries related to API description, API title,
and ProgrammableWeb keywords. In this case, the endpoint field in the training
dataset regarding these queries is marked as “N/A”.
To evaluate the performance of the API search, we test the model performance
for the following dataset:
based on their popularity. The popularity rank is computed based on the fol-
lowing extracted subjects where each one contains a number for a given API:
(1) number of SDKs, (2) number of articles, (3) number of changelogs, (4)
number of sample source codes, (5) number of “how to” articles, (6) number
of libraries, (7) number of developers, (8) number followers, and (9) number
of Github projects using this API. Items (1)–(8) are collected directly from
ProgrammableWeb. Item (9) is collected from GitHub projects by searching
API’s host address and base path via GitHub APIs. The numbers collected
in (1)-(9) are normalized and considered with the same weight for ranking
API popularity. Based on the final ranking, we select the top 1,000 APIs for
this dataset. If the providers of the top 1,000 APIs have other APIs which
are not ranked in the top 1,000, we also add those APIs into this dataset. By
doing so, the popular API list covers 1,603 APIs, which can be considered as
the most popular Web APIs.
Performance Evaluation: The summary of the training dataset of the API Har-
mony, popular API list, and full API list is shown in Table 4. The training
dataset is split into 80% for training and 20% for testing. The testing accuracy
for the API search model is shown in Table 5. In this table, the top 1 accu-
racy shows the possibility that the correct API/endpoint is ranked as the first
search result. Similarly, the top 10 accuracy represents the possibility that the
correct API/endpoint is ranked as one of the first 10 search results. All the
APIs/endpoints in search results are ranked by the possibility score given by the
softmax function. This evaluation result shows that the proposed method can
achieve very good accuracy for endpoint level search.
We compare the performance of the proposed 2 step transfer learning with the
models that use traditional LSTM [21] or bi-LSTM [22] to recommend both API
and its endpoint using the API Harmony dataset. The result is shown in Table 6,
which validates that the proposed 2 step transfer learning model outperforms
previous LSTM or bi-LSTM in terms of endpoint search accuracy.
5 Conclusions
In this paper, we propose novel approaches to support an end-to-end procedure
to build a Web API search system on a large number of public APIs. To the
best of our knowledge, it is the first work that provides API endpoint level
searches with a large API coverage (over 9,000 APIs) and high search accuracy.
Our future work is to open the system to the public and collect users’ query
and their feedback. It is worth noting that the problem/application of Web
API search is very practical for both academia and industry. Considering the
fact that the state-of-the-art works only has a small API coverage (e.g. 1,179
APIs in API Harmony), constructing an API search system with 9,040 APIs and
49,083 endpoints is a significant improvement to this application. As Web APIs
are rapidly growing and becoming more and more important for future software
engineering, we hope the proposed application and its associated methods would
be beneficial for the whole community.
References
1. Richardson, L., Ruby, S.: RESTful web services. O’Reilly Media Inc., Reading
(2008)
2. Tan, W., Fan, Y., Ghoneim, A., et al.: From the service-oriented architecture to
the Web API economy. IEEE Internet Comput. 20(4), 64–68 (2016)
3. Verborgh, R., Dumontier, M.: A Web API ecosystem through feature-based reuse.
IEEE Internet Comput. 22(3), 29–37 (2018)
4. Rahman, M.M., Roy, C., Lo, D.: Rack: automatic API recommendation using
crowdsourced knowledge. In: IEEE 23rd International Conference on Software
Analysis, Evolution, and Reengineering (SANER), pp. 349–359. IEEE (2016)
5. Bajracharya, S., Ossher, J., Lopes, C.: Searching API usage examples in code repos-
itories with sourcerer API search. In: ICSE Workshop on Search-driven Develop-
ment: Users, Infrastructure, Tools and Evaluation, pp. 5–8. ACM (2010)
6. Duala-Ekoko, E., Robillard, M.: Asking and answering questions about unfamil-
iar APIs: An exploratory study. In: 34th International Conference on Software
Engineering (ICSE), pp. 266–276. IEEE (2012)
7. Stylos, J., et al.: MICA: a web-search tool for finding API components and exam-
ples. In: Visual Languages and Human-Centric Computing, pp. 195–202. IEEE
(2006)
8. Bahrami, M., et al.: API learning: applying machine learning to manage the rise of
API economy. In: Proceedings of the Web Conference, pp. 151–154. ACM (2018)
9. Gu, X., et al.: Deep API learning. In: Proceedings of the International Symposium
on Foundations of Software Engineering, pp. 631–642. ACM (2016)
10. Bianchini, Devis., De Antonellis, Valeria, Melchiori, Michele: A multi-perspective
framework for web API search in enterprise mashup design. In: Salinesi, Camille,
Norrie, Moira C., Pastor, Óscar (eds.) CAiSE 2013. LNCS, vol. 7908, pp. 353–368.
Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38709-8 23
11. Su, Y., et al.: Building natural language interfaces to web APIs. In: Proceedings
of Conference on Information and Knowledge Management, pp. 177–186. ACM
(2017)
Web API Search: Discover Web API and Its Endpoint 113
12. Lin, C., Kalia, A., Xiao, J., et al.: NL2API: a framework for bootstrapping service
recommendation using natural language queries. In: IEEE International Conference
on Web Services (ICWS), pp. 235–242. IEEE (2018)
13. Torres, R., Tapia, B.: Improving web API discovery by leveraging social infor-
mation. In: IEEE International Conference on Web Services, pp. 744–745. IEEE
(2011)
14. Cao, B., et al.: Mashup service recommendation based on user interest and social
network. In: International Conference on Web Services, pp. 99–106. IEEE (2013)
15. Li, C., et al.: A novel approach for API recommendation in mashup development.
In: International Conference on Web Services, pp. 289–296. IEEE (2014)
16. Gao, W., et al.: Manifold-learning based API recommendation for mashup creation.
In: International Conference on Web Services, pp. 432–439. IEEE (2015)
17. Yang, Y., Liu, P., Ding, L., et al.: ServeNet: a deep neural network for web service
classification. arXiv preprint arXiv:1806.05437 (2018)
18. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representa-
tion. In: Proceedings of the Conference on Empirical Methods in Natural Language
Processing (EMNLP), pp. 1532–1543. ACM (2014)
19. Apté, C., Weiss, S.: Data mining with decision trees and decision rules. Fut. Gen.
Comput. Syst. 13(2–3), 197–210 (1997)
20. Shi, T., Horvath, S.: Unsupervised learning with random forest predictors. J. Com-
put. Graph. Stat. 15(1), 118–138 (2006)
21. Jozefowicz, R., et al.: An empirical exploration of recurrent network architectures.
In: International Conference on Machine Learning, pp. 2342–2350 (2015)
22. Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. IEEE Trans.
Signal Process. 45(11), 2673–2681 (1997)