Duong 2019

Uploaded by

desx redj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views5 pages

Duong 2019

Uploaded by

desx redj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Question answering based on ensemble classifier for

university enrolment advising

Huu-Thanh Duong (1), Vinh Truong Hoang (2)
Faculty of Information Technology
Ho Chi Minh City Open University, Vietnam
(1) (2)
[email protected], [email protected]

Abstract—Enrolment Advising (EA) for Vietnamese cost and human resources, and help the students and their
Universities takes place many sites in Vietnam every year, from parents easily search the answers for their worries of
the high schools to the universities. Many sessions have been Vietnamese University enrolment yearly.
organized, both online and offline. It costs a lot of human The remaining of this paper is organized as follows.
resources and money for inviting the experts, printing the
Section II presents the related works. Section III presents the
leaflets, banner, poster, etc. In this context, we try to reduce the
costs for EA by an application of machine learning. So, we proposed approach and section IV presents the experimental
propose to develop an automatic university enrolment advising results. Finally, the conclusions and future works are
system by using ensemble classifier. presented in section V.

Keywords—machine learning, question answering, KNN, SVM, II. RELATED WORKS

Naïve Bayes, term weight, full-text search, ensemble classifier. There are several teams for Vietnamese Question
Answering studying in the recent years. Le-Hong et al. [1]
I. INTRODUCTION
built a factoid Question Answering for Vietnamese, they
Yearly, University Enrolment is an important event incorporated the statistical models and ontology-based
attracted a lot of students and their parents. Enrolment methods to obtain high quality mapping between natural
Advising becomes a burden of society, many activities take language and the entities in processing, and this system can
place to support the students in order to understand the exam answer a wide range of general knowledge questions. Song et
regulations, how to do the exam, the advice to select the al. built a system to answer the questions about a schedule of
careers, etc., and these ones cost so much to implement. Thus, the power outage and cut the water in Vietnamese [14]. The
many students’ worries are repeated in years, and the answers authors extracted the collection of users’ queries from
based on the experiment of experts and enrolment regulations community web services and their corresponding collection of
of the Ministry of Education and Training. The development feedbacks and comments [15]. When the system receives a
of an automatic university enrolment advising based on the new question, it will calculate the similarity between that one
expert’s answers is needed to address this problem. and set of extracted questions as the candidate answers, after
We observe that our problem concerns the question that, it applied a supervised learning to estimate the score of
answering system. Thus, this system can receive the question classification for the candidate answers; the final results is the
of a user in natural language and then respond a succinct answer with the highest score. This idea is rather close to our
answer. The question answering system usually has three problem.
phases: question processing, passage retrieval, and answer The first task, we need to build a dataset in enrolment field.
selection. In this paper, we focus on answer selection to select We incorporate many libraries such as python scrapy,
the plausible answers from the dataset by using a machine selenium, regular expression, v.v. to extract the data from the
learning approach. reputation website about enrolment advising such as tuoitre.vn,
We first collected the pairs question/answer about college thanhnien.vn, dantricom.vn, thituyensinh.vn, etc. We also
enrolment from the reputation websites, where the question is recognize facing many challenges when handling in
the interest of student about CE, and the answer is the Vietnamese, it inherently has many different processing points
response of the human experts in the last five years. A dataset in English that have been supported a lot of stuff with big
is created so that each pair question/answer is tokenized researching communities and many achievements in question
Vietnamese words and indexed for searching later to reduce answering in particular, and in machine learning in general.
searching space. Next, the answers of the natural questions of The first obvious point is the word boundary; it’s totally
users (students and their parents) based on voting between unlike with English which the word is divided by space
some machine learning algorithms are constructed. character. The words in Vietnamese can be a single word,
Our main contribution reviews the well-known classifiers duplicate words or compound words and have many
and applies them to build a question answering system for ambiguous situations. This problem is seen as word
enrolment advising which has a lot of practical applications in segmentation. Practically, there are several tools, libraries of
Vietnamese society. To our knowledge, no such the system Vietnamese natural processing communities with rather a high
like that ever provided in Vietnam, this will help to reduce

978-1-5386-7512-0/19/$31.00 @2019 IEEE 35

accuracy, such as pyvi1 in Python. This library is good enough Naïve Bayes is also a supervised machine learning
(with F1 score as 98%) and easy to implement for solving the algorithm based on probabilities and Bayes theory by giving a
word segmentation problem in Vietnamese. prediction on the predefined dataset. It assumes having n
The tf×idf weight is used to characterize the document classes in a set of {c1, c2, c3, …, cn}, and data point x = (x1,
because it’s an easy way to implement and good to represent x2,…, xd), where xi present to a word in a dictionary, d is the
the word weight, where tf is the frequency of words appearing number of dimensions. The probability is to x belongs to class
in a document, and df is the number of a document containing ci using Naïve Bayes classifier calculated as follows:
the word, and idf is an inversion of df. We consider that the
word appears many times (with tf high), it might be a ܲሺ࢞ȁܿ௜ ሻܲሺܿ௜ ሻ
keyword of that document, but that one also appears many ܲሺܿ௜ ȁ‫ݔ‬ሻ ൌ ሺͳሻ
ܲሺ࢞ሻ
times in other documents (là, thì, và, etc) is usually the less
meaning words, multiplying it with idf aims to decrease the So, we need find ci so that P(ci|x) obtains the maximum
weight of that word. The experimental results of [3] prove that value, it means finding ci satisfied:
tf×idf has a high score and obtains good results for text
classification which is rather closed to our problem. ܲሺ࢞ȁܿ௜ ሻܲሺܿ௜ ሻ
Furthermore, the list of Vietnamese stopwords (thì, là, và, ܿ௜ ൌ ܽ‫ݔܽ݉݃ݎ‬௜ୀଵǡǤǤǡ௡ ൫ܲሺܿ௜ ȁ࢞ሻ൯ ൌ ܽ‫ݔܽ݉݃ݎ‬௜ୀଵǡǤǤǡ௡ ቆ ቇሺʹሻ
ܲሺ࢞ሻ
nhé, etc) is proposed manually, and the system will remove all
these stopwords from a user’s question. Where P(ci) is a probability of a point data belongs to class ci
in the dataset. Naïve Bayes assumes that the dimensions of x
III. PROPOSED APPROACH
is independence each other, so we can calculate P(x|ci):
A. BACKGROUND ௗ
In this section, the machine learning approach is used to
ܲሺ࢞ȁܿ௜ ሻ ൌ ෑ ܲሺ‫ݔ‬௞ ȁܿ௜ ሻ ሺ͵ሻ
give a solution for Vietnamese Enrolment Advising system
௞ୀଵ
(ViEA).
In order to reduce the searching space and the This algorithm is easy to implement, high performance, so
computational cost, we use full-text search (FTS) to find the it’s usually applied the real-time system. Although it requires
questions in a dataset which relates to the user’s question. The the independence of dimensions which rarely happens, it still
answers of these questions consider as candidate answers. The obtains the good accuracy. Usually, there are two main
final answer is decided by voting by machine learning approaches for Naïve Bayes: Multivariate Bernoulli is used
algorithms. for binary vector representing for documents and Multinomial
FTS refers to a technique for searching in a collection of Naïve Bayes which is suitable for discrete features is used for
documents. It’s different from the traditional methods which documents whose feature vector is calculated by a bag of
only based on a part of original texts represented in a database, word. However, in practice, tf×idf also works well. When
FTS exams all of the words in the documents which match a P(xk|ci) in (3) is calculated as follows:
searching criteria. For searching on the large number of
documents, FTS is divided into two tasks: indexing and ܰ௞௜
searching. The indexing stage builds a list of terms from the ܲሺ‫ݔ‬௞ ȁܿ௜ ሻ ൌ ሺͶሻ
ܰ௜
whole documents and the searching stage searches the
matching documents based on the index instead of the text of
Where Nki is a frequency of word xk appears in documents
the originals documents.
belong to class ci, and Ni is the total of words in documents
In order to implement the idea, we choose three well-
belong to class ci. Besides, some word xk appear zero times in
known classifiers represented in the groups with the lowest,
the documents of class ci lead to (4) = 0 regardless of the
intermediate, highest results in the current, respectively KNN,
remaining values, this makes the calculating results become
Naïve Bayes and SVM. Next looks at the thesis of these
incorrect more, so it should use a smoothing parameter, call
algorithms voting the predicted answer.
Laplace smoothing, to avoid this, (4) is revised:
KNN (K-nearest neighbors) is a well-known supervised
machine learning algorithm. To assign data point di to a ܰ௞௜ ൅ ߙ
certain group, KNN will find K-nearest data points of di based ܲሺ‫ݔ‬௞ ȁܿ௜ ሻ ൌ ሺͷሻ
on specific similarity or distance measures, this method will ܰ௜ ൅ ݀ߙ
assign di to the same group of its nearest data points. This
approach is easy to implement, and good performance. ߙ is a positive number to avoid zero value, d is the number
However, it’s high computational cost and it’s sensitive to of dimensions.
noise data if K is small. Furthermore, SVM (Support Vector Machine) can be used
for both classification and regression. This algorithm based on
PLA (Perceptron Learning Algorithm) which assumes existing
line/plane/hyper-lane (M) to separate the dataset into two
1
https://pypi.org/project/pyvi/ classes, the main responsibility of PLA is to find M. There are

36
lots of M satisfied above conditions. For the dataset of points The results of three classifiers (KNN, Naïve Bayes and
(x1, y1), (x2, y2), … (xn, yn), xi is a d-dimensional vector, yi is SVM) are voted to select the answer. Fig. 2 is the architecture
the label of xi. Assume that it has two labels as positive (yi = 1) of our ViEA system.
and negative (yi = -1), let M0 is the most optimal M, according Firstly, the dataset is built by pairing the couple of question
to SVM, M0 must satisfy two conditions: two-nearest points and answer, where the question is a student’s worry, and the
of two respective classes is the same distance to M0, and that answer is the respond of the human experts. All of them have
one is the largest distance compares to other M (Fig. 1: M1 tokenized by pyvi library. The system accepts a question as
and M2 contains these points, and need to find w, b obtaining Vietnamese natural language in enrolment field, it will be
the largest margin). tokenized into Vietnamese words, remove some stopwords.
It considers that the distance from any points (xi,yi) to M0 is Next, FTS is applied to find the relevant questions in the
calculated as follows: dataset to reduce the searching space. The output of this stage
is the questions whose answers may be the expected answers.
ȁ࢝ࢀ ‫ݔ‬௜ ൅ ܾȁ ‫ݕ‬௜ ሺ࢝ࢀ ‫ݔ‬௜ ൅ ܾሻ The system has vectorized the user’s question and above
ൌ ሺ͸ሻ relevant questions. Each dimension is a term wi calculating by
ԡ࢝ԡଶ ԡ࢝ԡଶ
tf×idf weight as follows:
Because (wTxi + b) and yi is the same sign, so in (6), the left
side infers to the right side. The purpose of SVM is to find w, ܰ
ሺ‫ ݂ݐ‬ൈ ݂݅݀ሻ௪೔ ൌ ݂‫ݍ݁ݎ‬ሺ‫ݓ‬௜ ሻ݈‫݃݋‬ ሺͻሻ
b so that (6) obtains the maximum value. For the points (xi, yi) ͳ ൅ ݂݀
belong to M1 and M2, assume that yi(wTxi + b) = 1. So, the
problem becomes: Where N is the number of documents in dataset, df is the
ͳ number of documents in dataset containing wi, freq(wi) is the
࢝ǡ ܾ ൌ ܽ‫ ݔܽ݉݃ݎ‬൬ ൰ ǡ ‫ݕ݁ݎ݄݁ݓ‬௜ ሺ࢝ࢀ ‫ݔ‬௜ ൅ ܾሻ ൒ ͳሺ͹ሻ frequency wi appearing in a document.
ԡ‫ݓ‬ԡଶ

Using Lagrange multiplier to solve the root of this problem. Answer Selection
Afterward, a category of a new data point is determined by the
function f(x) = sign(wTx + b).
Majority Voting

KNN Naïve Bayes SVM

Vectors
Enrolment
FTS
Website
Tokenized question

Question
Processing

Question ViEA
System Dataset
Fig. 1. Support Vector Machine

For nearly linear separable dataset uses soft margin SVM. Fig. 2. ViEA System Architecture

ே The vectorized data will be executed by machine learning

ԡ࢝ԡଶଶ
ሺ࢝ǡ ܾǡ ߦሻ ൌ ܽ‫࢝݊݅݉݃ݎ‬ǡ௕ǡక ൅ ‫ ܥ‬෍ ߦ௡ ሺͺሻ algorithms to find the most matching question with the user’s
ʹ question, its answer is namely ViEA’s answer to the user’s
௜ୀଵ
question. They are including KNN, Naïve Bayes and SVM for
Where C is a positive cost constant, ߦ calls slack variable, this regression problem. For KNN is used to find the nearest
ߦ௜ ൌ ȁ࢝ࢀ ࢞࢏ ൅ ܾ െ ‫ݕ‬௜ ȁ and satisfies ‫ݕ‬௡ ሺ࢝ࢀ ‫ݔ‬௡ ൅ ܾሻ ൒ ͳ െ question (or similar) to user’s question and select its answer.
ߦ௡ ǡ ܽ݊݀ߦ௡ ൒ Ͳǡ ‫ ݊׊‬ൌ ͳǡʹǡ ǥ ǡ ܰ. This can be solved by using For Naïve Bayes, the Multinomial Bayes is used for answer
Lagrange multiplier. selection since this method is suitable for feature vector as
tf×idf weight. Then, SVM algorithm is applied. Table 1 shows
B. THE ViEA ARCHITECTURE the parameters of the algorithms of the proposed system.

37
The final answer is one of answers of the relevant questions h͕c khác nh˱ là môn_h͕c t͹ ch͕n cͯa mình, n͇u sinh_viên
which is the majority voting of these algorithms. The ÿ̫m_b̫o ti͇n_ÿ͡ h͕c_t̵p theo qui_ÿ͓nh. sinh_viên có_th͋
remaining ones are the secondary answers. ch͕n các môn_h͕c t͹ ch͕n phù_hͫp vͣi sͧ_thích và năng_l͹c
TABLE 1. THE PARAMETERS APPLIED TO THE ALGORITHMS
cͯa b̫n_thân. các môn_h͕c t͹ ch͕n ÿ˱ͫc th͋_hi͏n trong
b̫ng ÿi͋m t͙t_nghi͏p là lͫi_th͇ cho sinh_viên khi ÿi xin vi͏c .
Algorithms Parameters N͇u môn_h͕c t͹ ch͕n là môn_h͕c l̯n ÿ̯u, sinh_viên không
KNN K=1 ph̫i ÿóng thêm h͕c_phí cho các môn_h͕c này vì ÿã ÿ˱ͫc tính
Multinomial Naïve Bayes alpha = 0.01 vào h͕c_phí tr͕n_gói cͯa sinh_viên.”. In general, this answer
kernel = linear said about the regulations to switch between majors, learn two
SVM
C = 1e5
parallel majors, and register for the credit courses. This isn’t
an expected answer.
For example, the question is “Cho em h͗i quy ÿ͓nh cͯa B͡
The system selects the majority voting of these algorithms.
giáo dͭc và ÿào t̩o năm nay yêu c̯u thí sinh ph̫i có h͕c l͹c
In this question, KNN and SVM votes the same answer, so it
gi͗i lͣp 12 mͣi ÿ˱ͫc ÿăng ký xét tuy͋n các ngành s˱ ph̩m
select this one as the main answer of the system. Another one
ÿúng không ̩?” (In this year, the regulation of the Ministry of
of Naive Bayes is the secondary answer.
Education and Training, a student must have a rank of level 12
as the distinction to register for pedagogy?). IV. THE EXPERIMENTS
It’s tokenized: “Cho em h͗i quy_ÿ͓nh cͯa B͡ giáo_dͭc và
One thousand pairs of question/answer are extracted from
ÿào_t̩o năm nay yêu_c̯u thí_sinh ph̫i có h͕c_l͹c gi͗i lͣp
the reputation websites about enrolment to build the dataset.
12 mͣi ÿ˱ͫc ÿăng_ký xét tuy͋n các ngành s˱_ph̩m ÿúng
Due to the limited time, we have prepared about other one
không ̩?”, and removed stop words such as “cho”, “em”,
hundred of questions and their answers for verifying the
“hӓi”, “cӫa”, “ҥ”, “ÿúng”, “không”, “các”, “và”, “mӟi”,
accuracy manually. The F1 score obtained is 76%, almost of
“ÿѭӧc”. The question becomes “quy_ÿ͓nh B͡ giáo_dͭc
wrong answers have not in dataset yet. We observe this is a
ÿào_t̩o năm nay yêu_c̯u thí_sinh ph̫i có h͕c_l͹c gi͗i lͣp
promising result for the EA systems.
12 ÿăng_ký xét tuy͋n ngành s˱_ph̩m”.
In order to show our evaluation, next to the example in part
In order to reduce searching space, the system continues to
III, we give more examples as follows:
find the relevant questions of user’s question in the dataset by
Question 1: “Trong tài li͏u h˱ͣng d̳n ghi rõ các chͱng ch͑
using FTS. The whole ones and user question are vectorized
có giá tr͓ s͵ dͭng ÿ͇n ngày 23/06/2018. Tuy nhiên, hi͏n t̩i
for representative features. In final, KNN, Naïve Bayes and
m͡t s͙ h͕c sinh mͣi ÿang ÿăng ký thi, n͇u ch˱a ḽy ÿ˱ͫc
SVM are applied, the results are as follows:
chͱng ch͑ tr˱ͣc ngày n͡p h͛ s˯ thì có ÿ˱ͫc ÿăng ký mͭc mi͍n
KNN and SVM give the same answer: “vͣi hình_thͱc thͱc
thi ngo̩i ngͷ hay không?” (In the guideline, the certificates
xét tuy͋n d͹a vào k͇t_qu̫ h͕c_t̵p_trung_h͕c ph͝_thông,
are valid until June 06, 2018. However, some students is
ng˱ͩng ÿ̫m_b̫o ch̭t_l˱ͫng ÿ̯u_vào ngành thu͡c nhóm
registering for the exam until now, if they have not obtained
ngành ÿào_t̩o giáo_viên trình_ÿ͡ ÿ̩i_h͕c là h͕c_sinh x͇p
the certificates before submitting period, can they register for
lo̩i h͕c_l͹c lͣp 12 tͳ gi͗i trͧ lên. Vͣi trình_ÿ͡ cao_ÿ̻ng,
the exemption of foreign language exam?)
trung_c̭p xét tuy͋n h͕c_sinh x͇p lo̩i h͕c_l͹c lͣp 12 tͳ khá
With this question, after ViEA pre-processes. Three
trͧ lên.” (In order to assure the threshold of output quality of
algorithms predict the same result, it will the final answer of
pedagogy, a student must have a rank of level 12 as a
ViEA system as follows: “h͕c_viên ph̫i n͡p ÿͯ h͛_s˯ trong
distinction grade for the formal university system, and as a
thͥi_gian quy_ÿ͓nh, và có_th͋ gia_h̩n (linh_ÿ͡ng không quá
good grade for college and intermediate system based on the
1 tu̯n). Tr˱ͥng_hͫp thͥi_gian b͝_sung v˱ͫt quá thͥi_gian
academic results at the high school).
thi_tuy͋n thì v̳n không ÿ˱ͫc xét mi͍n thi anh văn.”. It
Naïve Bayes answers: “a) chuy͋n ngành sinh_viên h͏ ÿ˱ͫc
compares this answer with a predefined answer of this
phép xin chuy͋n sang ngành ÿào_t̩o khác cͯa tr˱ͥng n͇u
question (in the couple of question/answer), we marks this is a
th͗a các ÿi͉u_ki͏n sau: ÿ̩t s͙ tín_ch͑ tích_lǊy t͙i thi͇u theo
correct answer.
qui_ÿ͓nh, có ÿi͋m trung_bình_tích_lǊy tͳ 6.5 trͧ lên. ÿi͋m xét
Question 2: “Các nguy͏n v͕ng có bình ÿ̻ng trong xét tuy͋n
tuy͋n cͯa sinh_viên b̹ng ho̿c cao h˯n ÿi͋m_chu̱n cͯa
không?” (The aspirations are equal in enrolment?). With this
ngành/ch˱˯ng_trình mà sinh_viên mu͙n chuy͋n ÿ͇n ho̿c
question, three of algorithms returns three different results and
sinh_viên thu͡c di͏n tuy͋n th̻ng, ˱u_tiên xét tuy͋n .
they also don’t similar to the predefined answers, so we marks
Sinh_viên không b͓ x͵_lý kͽ_lu̵t, ÿ˱ͫc s͹ ÿ͛ng_ý cͯa tr˱ͧng
this is wrong result.
khoa n˯i chuy͋n ÿi và tr˱ͧng khoa ti͇p_nh̵n. Chi_ti͇t
qui_ÿ͓nh sinh_viên liên_h͏ khoa và phòng ÿào_t̩o ÿ͋ ÿ˱ͫc TABLE 2. THE EXPERIMENT OF KNN, NAIVE BAYES, SVM AND VOTING
APPROACH
h˱ͣng_d̳n, sinh_viên các ch˱˯ng_trình tiên_ti͇n, ch̭t_l˱ͫng
cao, tài_năng liên_h͏ văn_phòng các ch˱˯ng_trình ÿ̿c_bi͏t KNN Naïve Bayes SVM Voting
ÿ͋ t˱_v̭n theo qui_trình và qui_ÿ͓nh riêng. b) h͕c song b̹ng No. of the
64 66 63 76
sinh_viên ÿ˱ͫc phép h͕c cùng lúc hai ch˱˯ng_trình ÿào_t̩o right answers
ÿ͋ khi t͙t_nghi͏p nh̵n hai văn_b̹ng. c) h͕c tín_ch͑ t͹ ch͕n No. of the
36 34 37 24
cͯa ngành h͕c khác trong quá_trình h͕c t̩i tr˱ͥng, sinh_viên wrong answer
có_th͋ ÿăng_ký h͕c các môn chuyên_ngành cͯa các ngành

38
In the experiment, three classifiers (KNN, Naïve Bayes, [12] Thien Khai Tran, Tuoi Thi Phan, “Computing Sentiment Scores of
Verb Phrases for Vietnamese,” The 2016 Conference on
SVM) is applied to find the appropriate answer by voting. The
Computational Linguishtics and Speech Processing, 2016.
experimental results show that KNN and SVM usually give [13] Wanpeng Song, Liu Wenyin, Naijie Gu, Xiaojun Quan, Tianyong Hao,
the same results, the accuracy of Naïve Bayes is better than “Automatic categorization of questions for user-interactive question
the others. So, when they give three different results, the answering,” Information Processing and Management (2011) 147-156.
[14] Huy Vu Nguyen, Dang Tuan Nguyen, “Application of First-Order
answer from Naïve Bayes is selected based on a priority.
Logic Inference in Vietnamese Question Answering System,” 6th
Table 2 gives a statistics (number of the right and wrong International Conference in Intelligent Systems, Modelling and
answers) of three KNN, Naïve Bayes and SVM independently Simulation, 2015.
and the proposed approach by voting. This also shows the [15] Quan Hung Tran, Nien Dinh Nguyen, Kien Duc Do, Thinh Khanh
Nguyen, Dang Hai Tran, Minh Le Nguyen, Son Bao Pham, “A
voting approach is better than the original classifiers with the
Community-Based Vietnamese Question Answering System,” 6th
most number of right answers. International Conference in Knowledge and Systems Engineering,
2014.
V. CONCLUSIONS AND FUTURE WORKS [16] Jun Suzuki, Yutaka Sasaki, Eisaku Maeda, “SVM answer selection for
open-domain question answering,” COLING’02 Proceedings of the
In this paper, we proposed an approach based on multi- 19th international conference on Computational linguistics – Volume 1,
classifiers for university enrolment advising. The dataset is p. 1-7, 2002.
constructed by pairing the couple of question/answer. When a
new question is raised, the segmentation of Vietnamese words
is applied to remove stop words. Then, FTS is performed to
find the relevant questions of user’s question. Finally, the
system applies three classifiers including: KNN, Naïve Bayes
and SVM for voting to select the appropriate answer. The
proposed approach can answer the complex questions as
shown in the experimental results. In fact, this method can be
applied in other domain. The main drawback is to have a “big
enough” dataset.
This work is now extended to improve the answer selection
step. In order to improve the accuracy of ViEA system for the
next query after each answer is given, the users can rate it
based on their satisfaction.

REFERENCES
[1] Phuong Le-Hong, Duc-Thien Bui, “A Factoid Answering System for
Vietnamese,” Companion of the The Web Conference, 2018.
[2] Toan Pham Van, Ta Minh Thanh, “Vietnamese News Classification
based on BoW with Keywords Extraction and Neural Network,” 21st
Asia Pacific Symposium on Intelligent and Evolutionary System, 2017.
[3] Ahmad Mazyad, Fabien Teytaud, Cyril Fonlupt, “A Comparative
Study on Term Weighting Schemes for Text Calssification,” The Third
International Conference on Machine Learning, Optimization and Big
Data, 2017.
[4] Dat Quoc Nguyen, Dai Quoc Nguyen, and Bao Son Pham, “Ripple
Down Rules for Question Answering,” Semantic Web 8, 4 (2017),
511–532.
[5] Alejandro Figueroa, “Automatically generating effective search queries
directly from community question-answering questions for finding
related questions,” Expert Systems With Application 77, p.11-19, 2017.
[6] Xuelian Deng, Yuqing Li, Jian Weng, “Feature Selection for text
classification: A review,” Springer Science+Business Media, LLC,
2018.
[7] Zhang S, Li X, Zong M, Zhu X, Wang R, “Efficient knn classification
with different numbers of nearest neighbors,” IEEE transactions on
neural networks and learning systems, 2017.
[8] Tang B, Kay S, He H, “Toward optimal feature selection in naive
bayes for text categorization,” IEEE Trans Knowl Data Eng
28(9):2508–2521, 2016.
[9] Tang J, Alelyani S, Liu H, “Feature selection for classification: a
review,” Data Classification: Algorithms and Applications, p 37, 2014.
[10] Wanfu Gao, Liang Hu, Ping Zhang, Jialong He, “Feature selection
considering the composition of feature relevancy,” Pattern Recognition
Latters 112 (2018) 70-74, 2018.
[11] Boris Galitsky, “Matching parse thinkets for open domain question
anwering,” Data & Knowledge Engineering 107 (2017) 24-55, 2017.

Vietnamese Text Classification Methods
No ratings yet
Vietnamese Text Classification Methods
7 pages
Vnlawbert: A Vietnamese Legal Answer Selection Approach Using Bert Language Model
No ratings yet
Vnlawbert: A Vietnamese Legal Answer Selection Approach Using Bert Language Model
5 pages
Luận Văn Ripple Down Rules for Question Analysis
No ratings yet
Luận Văn Ripple Down Rules for Question Analysis
16 pages
Download
No ratings yet
Download
9 pages
Afaan Oromo Question Classification Using Deep Learning Approach
100% (4)
Afaan Oromo Question Classification Using Deep Learning Approach
14 pages
Automated FAQ System Evaluation
No ratings yet
Automated FAQ System Evaluation
7 pages
Article 52
No ratings yet
Article 52
4 pages
Neural Network Question Answering System
No ratings yet
Neural Network Question Answering System
30 pages
Review 3 - Journal Submission Format: Team Number Title (New)
No ratings yet
Review 3 - Journal Submission Format: Team Number Title (New)
28 pages
Vietnamese Ontology-Based QA System
No ratings yet
Vietnamese Ontology-Based QA System
22 pages
Job Opportunity Finding by Text Classification: Procedia Engineering
No ratings yet
Job Opportunity Finding by Text Classification: Procedia Engineering
5 pages
Luận Văn Named Entity Recognition in Vietnamese Documents Nhận Dạng Thực Thể Có Tên Trong Văn Bản Tiếng Việt
No ratings yet
Luận Văn Named Entity Recognition in Vietnamese Documents Nhận Dạng Thực Thể Có Tên Trong Văn Bản Tiếng Việt
16 pages
11679-Article Text-17452-1-10-20200425
No ratings yet
11679-Article Text-17452-1-10-20200425
12 pages
Question Answering: CNLP at The TREC-10 Question Answering Track
No ratings yet
Question Answering: CNLP at The TREC-10 Question Answering Track
10 pages
1604-Article Text-2993-1-10-20210407
No ratings yet
1604-Article Text-2993-1-10-20210407
7 pages
Luận Văn Advanced Deep Learning Methods and Applications in Opendomain Question Answering Các Phương Pháp Học Sâu Tiên Tiến Và Ứng Dụng Vào Bài Toán Hệ Hỏi Đáp Miền Mở
No ratings yet
Luận Văn Advanced Deep Learning Methods and Applications in Opendomain Question Answering Các Phương Pháp Học Sâu Tiên Tiến Và Ứng Dụng Vào Bài Toán Hệ Hỏi Đáp Miền Mở
16 pages
89 838 1 PB
No ratings yet
89 838 1 PB
41 pages
NLP PBL
No ratings yet
NLP PBL
21 pages
VNHSGE Dataset for LLM Evaluation
No ratings yet
VNHSGE Dataset for LLM Evaluation
74 pages
JPNR S10 3301
No ratings yet
JPNR S10 3301
7 pages
Question Answering System
No ratings yet
Question Answering System
3 pages
Natural Language Question Answering: The View From Here: L.Hirschman R.Gaizauskas
No ratings yet
Natural Language Question Answering: The View From Here: L.Hirschman R.Gaizauskas
26 pages
Survey Template With Expansion Guides
No ratings yet
Survey Template With Expansion Guides
5 pages
D R V K B: Ifferentiable Easoning Over A Irtual Nowledge ASE
No ratings yet
D R V K B: Ifferentiable Easoning Over A Irtual Nowledge ASE
16 pages
Factoid Question Answering
No ratings yet
Factoid Question Answering
2 pages
A Information Retrieval Based On Questio
No ratings yet
A Information Retrieval Based On Questio
23 pages
Marathi Ontology-Based QA System
No ratings yet
Marathi Ontology-Based QA System
12 pages
Vietnamese QA Dataset for MRC Evaluation
No ratings yet
Vietnamese QA Dataset for MRC Evaluation
12 pages
Building A Foundation System For Producing Short Answers To Factual Questions
No ratings yet
Building A Foundation System For Producing Short Answers To Factual Questions
10 pages
A Study On The Architecture For Text Categorization and Summarization
No ratings yet
A Study On The Architecture For Text Categorization and Summarization
4 pages
Indian Language Document Classifiers
No ratings yet
Indian Language Document Classifiers
5 pages
Kim 2016
No ratings yet
Kim 2016
5 pages
NEU Chatbot: AI for University Admissions
No ratings yet
NEU Chatbot: AI for University Admissions
6 pages
Aqua
No ratings yet
Aqua
25 pages
Ontology-Based Academic Article Recommendation
No ratings yet
Ontology-Based Academic Article Recommendation
4 pages
L3MNGET24 Paper2
No ratings yet
L3MNGET24 Paper2
10 pages
Automated Question Generator System Using NLP Libraries
No ratings yet
Automated Question Generator System Using NLP Libraries
5 pages
Intelligent Question Answering System
No ratings yet
Intelligent Question Answering System
9 pages
NLP Module 5
No ratings yet
NLP Module 5
92 pages
Chaucv en
No ratings yet
Chaucv en
8 pages
Spet 22
No ratings yet
Spet 22
20 pages
A Study On Various Approaches Towards Non-Factoid Question Answering Systems
No ratings yet
A Study On Various Approaches Towards Non-Factoid Question Answering Systems
9 pages
Novel Approach For Semantic Similarity Measuremwnt For High Quality Answer Selection in QA Using DL Methods
No ratings yet
Novel Approach For Semantic Similarity Measuremwnt For High Quality Answer Selection in QA Using DL Methods
5 pages
Multilingual Chatbot Framework
No ratings yet
Multilingual Chatbot Framework
9 pages
Information Extraction From Free-Form CV Documents
No ratings yet
Information Extraction From Free-Form CV Documents
18 pages
Logical Reasoning Project
No ratings yet
Logical Reasoning Project
13 pages
Very Good For Transformer
No ratings yet
Very Good For Transformer
34 pages
Advances in Natural Language Question Answering: A Review
No ratings yet
Advances in Natural Language Question Answering: A Review
7 pages
Abstract: This Study Explored The Application of Interview Robots On Recruitment Process. by
No ratings yet
Abstract: This Study Explored The Application of Interview Robots On Recruitment Process. by
51 pages
Question-And-Answer System Using Natural Language Processing
No ratings yet
Question-And-Answer System Using Natural Language Processing
19 pages
Automatic Question Generation and Student Answer Assessment in Di
No ratings yet
Automatic Question Generation and Student Answer Assessment in Di
130 pages
How To Build An Open-Domain Question Answering System - Lil'Log
No ratings yet
How To Build An Open-Domain Question Answering System - Lil'Log
27 pages
Web Question Answering: Is More Always Better?: Susan Dumais, Michele Banko, Eric Brill, Jimmy Lin, Andrew NG
No ratings yet
Web Question Answering: Is More Always Better?: Susan Dumais, Michele Banko, Eric Brill, Jimmy Lin, Andrew NG
9 pages
Vietnamese Word Segmentation with CRFs/SVMs
No ratings yet
Vietnamese Word Segmentation with CRFs/SVMs
8 pages
Base Paper
No ratings yet
Base Paper
15 pages
JPNR S10 3301
No ratings yet
JPNR S10 3301
7 pages
2017-Jayalakshmi S. Et Al-Automated Question Answering System Using Ontology and Semantic Role
No ratings yet
2017-Jayalakshmi S. Et Al-Automated Question Answering System Using Ontology and Semantic Role
5 pages
NLP Model for Document Q&A System
No ratings yet
NLP Model for Document Q&A System
4 pages
LMSusabilitytowardsonlinelearning Alit
No ratings yet
LMSusabilitytowardsonlinelearning Alit
15 pages
Student Portal Effectiveness Study
No ratings yet
Student Portal Effectiveness Study
5 pages
Enrollment Forecasting in e-SMS
No ratings yet
Enrollment Forecasting in e-SMS
5 pages
Full Text
No ratings yet
Full Text
41 pages
An Assessment by The Students Regarding Enrolment System
No ratings yet
An Assessment by The Students Regarding Enrolment System
5 pages
Blockchain for Secure EHR Sharing
No ratings yet
Blockchain for Secure EHR Sharing
15 pages
Virtual Machine Introspection: Techniques and Applications
No ratings yet
Virtual Machine Introspection: Techniques and Applications
10 pages
RK Perbaikan & Realisasi Feb' 19 (Kiln & Coal Mill)
No ratings yet
RK Perbaikan & Realisasi Feb' 19 (Kiln & Coal Mill)
9 pages
Chapter 7: Quality Management: Objectives
100% (1)
Chapter 7: Quality Management: Objectives
48 pages
Discover 7951 Teka St, Makati
No ratings yet
Discover 7951 Teka St, Makati
1 page
Scherer Vs Hyland
No ratings yet
Scherer Vs Hyland
1 page
Cive1219 1630 At2
0% (1)
Cive1219 1630 At2
6 pages
Yles 20th Campus Drive Pitch Deck
No ratings yet
Yles 20th Campus Drive Pitch Deck
22 pages
Latihan Soal Kepolisian 15
No ratings yet
Latihan Soal Kepolisian 15
6 pages
Sanako Lab 100 Multi MSU Technical Guide
No ratings yet
Sanako Lab 100 Multi MSU Technical Guide
27 pages
HOBO® Pro v2 (U23-00x) Manual: Specifications
No ratings yet
HOBO® Pro v2 (U23-00x) Manual: Specifications
4 pages
Horizon® DXA System Brochure GBR EN
No ratings yet
Horizon® DXA System Brochure GBR EN
8 pages
Install Docker on Ubuntu 18.04
No ratings yet
Install Docker on Ubuntu 18.04
3 pages
Autobiographical Narratives Quiz
No ratings yet
Autobiographical Narratives Quiz
8 pages
Success in Pure Phy p2 (2016 - 2022)
No ratings yet
Success in Pure Phy p2 (2016 - 2022)
109 pages
Nov 11 The Burclaws 50 Years of Polka Music
No ratings yet
Nov 11 The Burclaws 50 Years of Polka Music
3 pages
AI Robotics Logistics Full Script
No ratings yet
AI Robotics Logistics Full Script
3 pages
Science 7 DLL
100% (1)
Science 7 DLL
4 pages
Gas Turbine Efficiency Insights
No ratings yet
Gas Turbine Efficiency Insights
19 pages
Ajv DPW 00 TMS53 in R03 2972 - 00
No ratings yet
Ajv DPW 00 TMS53 in R03 2972 - 00
2 pages
Nutrition in Animals Class 7
No ratings yet
Nutrition in Animals Class 7
2 pages
Entrepreneurship Role and Challenges
No ratings yet
Entrepreneurship Role and Challenges
27 pages
English Grammar & Vocabulary Test
No ratings yet
English Grammar & Vocabulary Test
7 pages
The Role of Spes in The Enron Scandal and Its Implications For China
No ratings yet
The Role of Spes in The Enron Scandal and Its Implications For China
6 pages
Precalculus Syllabus
No ratings yet
Precalculus Syllabus
10 pages
Visualizing Complexity
No ratings yet
Visualizing Complexity
232 pages
Domperidone Maleate Suppository Prescribing Information
No ratings yet
Domperidone Maleate Suppository Prescribing Information
6 pages
Deception
No ratings yet
Deception
8 pages
Introduction To Iot
No ratings yet
Introduction To Iot
19 pages
Remedial English
No ratings yet
Remedial English
11 pages
GOOD NUTRITION & DIABETIC CHART FOR Priya 5-04
No ratings yet
GOOD NUTRITION & DIABETIC CHART FOR Priya 5-04
7 pages
Term-2 Portion Grade 12 - New
No ratings yet
Term-2 Portion Grade 12 - New
8 pages