Kaur y Singh 2017 - Ai Based Healthcare Plataform For Real Time Predictive and Prescriptive Analytics Using Reactive Programming
Kaur y Singh 2017 - Ai Based Healthcare Plataform For Real Time Predictive and Prescriptive Analytics Using Reactive Programming
Conference Series
Abstract. AI in Healthcare needed to bring real, actionable insights and Individualized insights
in real time for patients and Doctors to support treatment decisions., We need a Patient Centred
Platform for integrating EHR Data, Patient Data, Prescriptions, Monitoring, Clinical research
and Data. This paper proposes a generic architecture for enabling AI based healthcare analytics
Platform by using open sources Technologies Apache beam, Apache Flink Apache Spark,
Apache NiFi, Kafka, Tachyon, Gluster FS, NoSQL- Elasticsearch, Cassandra. This paper will
show the importance of applying AI based predictive and prescriptive analytics techniques in
Health sector. The system will be able to extract useful knowledge that helps in decision
making and medical monitoring in real-time through an intelligent process analysis and big
data processing.
1. Introduction
As with the change in the time and advancements in the technology, there is a need to make a
systematic change to health systems to improve the quality, efficiency, and effectiveness of patient
care. Chronic diseases like heart disease, stroke, cancer, and diabetes are considered as the most
common, expensive, and preventable health problems but due to the poor health care systems, patients
can’t able to take good care of the problems.
The strategic aim of value based Health Care is to ensure that everyone can use the health services
they are needed for their good health and well-being. The focus on value-based care corresponds to an
increased concentration on patient-centric care. By focussing on technologies and healthcare processes
on patient results, a doctor, hospitals, and health insurance need to work with each other to personalize
care that is effective, transparent in its delivery and billing, and measured based on satisfaction of the
patient.
Suppose a patient is suffering from ache or pain and need to visit a physician. After listening to patient
symptoms, Physician inputs them into the computer, which helps to show the latest research that is
needed to know about how to diagnose and treat the problem. Patient has an MRI or an x-ray and a
computer helps to detect a problem that could be too small for a human to see. Finally, a computer
monitors at patient medical records and family history and compares both with the most recent
research to suggest a treatment for a patient that is specifically personalized to his needs.
Nowadays, Patients need an affordable and high quality of healthcare. According to the Health Data,
most of the information is not in structured, relational format. About 80% information is in an
unstructured format. Due to the limited structured data about the patient’s health conditions is
available and patients have only very limited opportunities to actively involved in the process. It is
very difficult to use this massive unstructured amount of data from different sources to take the right
decision for the right patient at the right time by the doctors. This will slow down the personalized
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
10th International Conference on Computer and Electrical Engineering IOP Publishing
IOP Conf. Series: Journal of Physics: Conf. Series 933 (2018)
1234567890 ‘’“”012010 doi:10.1088/1742-6596/933/1/012010
care to the patient. So, it is necessary to develop a new strategy or system to care for patients which
involves the health of patients while decreasing the cost of care[1].
These services range from clinical care to the public services for individual patients that are helpful for
the health of whole populations. There is a need to improve the healthcare quality and coordination so
that outcomes are consistent with current professional knowledge. The cost of treatment for the
problems should be reduced so that each and every patient get their personalized treatment at lower
cost.
Adoption of Electronic Health Records and systematic collection of data by health care providers
were predicted to improve the efficiency and quality of patient care.
Machine learning is improving diagnostics, predicting outcomes, and just beginning to scratch the
surface of personalized care. Human surgeons will not be replaced by machines in the future, but AI
can positively help surgeons to make better clinical decisions or can even replace human decision in
certain areas of healthcare (eg, radiology). The growing availability of healthcare data and speedy
development of big data analytic methods have made possible the current successful applications of AI
in healthcare [2].
Data analytics has become increasingly important in almost every region of the economy. Health care
involves a diverse set of public and private data collection systems having different sources of Data.
The volume of data has been increasing exponentially in the past era as health care contributors turned
to EHRs, digitized laboratory slides, and high-resolution radiology videos and images. There are
petabytes of data stored in databases of health insurance companies and the trillions of data points
streaming from sensors like activity trackers and various other continuous monitoring devices.
As shown in fig1. Text or natural language data resides in many fields including several fields such as
output from medical devices, physician notes, nursing notes, surgical notes, radiology notes, pathology
reports, admission notes, clinical data, Genomic data, Behaviour data etc. These fields may have
valuable information about the patient including diagnosis, history, family history, complaints,
statistics, and opinions, demographics, medical history, allergies, laboratory test results etc.
To transform the current healthcare system into a preventive, active and value-based system, the
interoperability, exchange, and sharing of health data are needed.
One of the major challenges is data integration i.e. to integrate the data that is obtained for each patient
into one system, as that will allow for fast data analysis, and give physicians all the information they
need to perfectly treat their patients [3]. However, most of the data is encrypted and access is restricted
due to the patient privacy and many medical devices are not interoperable[4]. So, once a single
database can be established, ML can solve the major problems. Machine learning provides a way to
find reason and patterns about data, which enables physicians to move to personalized care known as
precision medicine. There are many ways as how machine learning algorithms can be used in
healthcare, but all of them depend on having sufficient amount of data and permission to use it.
Previously, alerts and advise for a medical treatment have been developed based on physicians studies,
and data into their software. However, that can restrict the accuracy of that data because it might come
from different populations. On the other hand, Machine learning nan be refined using data that is
available in that particular environment i.e. anonymous patient information from a hospital and the
2
10th International Conference on Computer and Electrical Engineering IOP Publishing
IOP Conf. Series: Journal of Physics: Conf. Series 933 (2018)
1234567890 ‘’“”012010 doi:10.1088/1742-6596/933/1/012010
area it serves. Physicians can use machine learning to predict hospital readmission for chronically ill
patients. The patients that are at risk of being readmitted are identified that makes possible for
providers to offer improved post-discharge support. By lowering the rate of readmission, it can
improves the lives of those most at risk.
The rest of this paper is organized as follows: Section II gives an overview of Interdependency of
Cloud Computing with Big Data Technologies. Section III describes about the Bigdata in terms of
Healthcare. Section IV the related work. Section V describes the platform design in which steps
needed to do analytics will be discussed. Section VI describes the architectural design. Section VII
gives an overview about artificial intelligence that helps the system to understand from its experience
and reactive machine learning.
4
10th International Conference on Computer and Electrical Engineering IOP Publishing
IOP Conf. Series: Journal of Physics: Conf. Series 933 (2018)
1234567890 ‘’“”012010 doi:10.1088/1742-6596/933/1/012010
discover the treatment for personalized care managing large patient populations. Predictive analyses
can be done by inputting the EHR data into a data model that was processed by various analytic
techniques and analyze the results. The rich sources of data along with analytics capabilities have
potential for an increased understanding mechanisms of disease and better health care.
Generating New Knowledge: The primitive uses of Big Data were to generate new visions through
predictive analytics. In addition to the clinical and administrative data, integrate patient data and their
environment that helps in better predictions and right treatments to the right patients.
The predictions may also help in identifying the areas that need improvement such as treatments, early
identification of worse health states, readmissions, etc.
Transforming Knowledge into Practice: Although the standardization of data collection and fresh
analytical approaches to the big data revolution in healthcare are important, it’s their practical
application that will get it across the line.
The understandings that are obtained from big data have the potential to reform multiple extents of
healthcare like the outcomes from comparisons, the effectiveness of various treatments, and the
predictive models that can be used for diagnosing, treating, and delivering care.
4. Related Works
Now days, Healthcare analytics is a new trend in the field of analytics. Due to the advancements in
technologies in Health care sector, there is a major breakthrough in data collection. To improve the
processing of predictable healthcare system, we have proposed a series of Big Data Health Care
System. There are many techniques proposed in order to proficiently process large volume of medical
record. Akshay Raul, Atharva Patil, Prem Raheja, Rupali Sawant proposed “Knowledge Discovery,
Analysis and Prediction in Healthcare using Data Mining And Analytics”[16]. In this paper, Author
proposed a Healthcare system that will be used to create a public awareness about alternative drugs for
a specific medicine, availability of that alternative medicine in an area. The Proposed system helps the
patients to find an alternate medicine which is prescribed by the doctor. Aditi Bansal and Priyanka
Ghare proposed “Healthcare Data Analysis using Dynamic Slot Allocation in Hadoop”. In this paper
HealthCare System is analysis using Hadoop using Dynamic Hadoop Slot Allocation (DHSA)
method[17]. This paper proposed a framework which focus on improving the performance of
MapReduce workloads and maintain the system. DHSA will focuses on the maximum utilization of
slots by allocating map (or reduce) slots to map and reduce tasks dynamically. Van-Dai Ta,
Chuan-Ming Liu, Goodwill Wandile Nkabinde proposed “Big Data Stream Computing in Healthcare
Real-Time Analytics”[18]. In this paper, Author proposed a generic architecture for big data
healthcare analytic by using open sources, including Hadoop, Apache Storm, Kafka and NoSQL
Cassandra. Wullianallur Raghupathi and Viju Raghupathi has proposed “Big data analytics in
healthcare: promise and Potential”[19]. In this paper author proposed the potential of big data
analytics in healthcare. The paper provides an overview of big data analytics for healthcare
practitioners and researchers. Big data analytics in healthcare is growing into a promising field for
providing vision from very large data sets and improving results while reducing costs. This paper
proposes a generic architecture for big data healthcare analytic by using open sources, including
Hadoop, Apache Storm, Kafka and NoSQL Cassandra.
However, all these works either consider the techniques to extract features form specific healthcare
data sources or only focus on batch-oriented task to compute which has higher latency computing. In
our proposed framework, data sources are supposed constantly coming with high rate, variety of
formats, and high volume. Stream computing of real data and use of cloud computing in serving layer
will enhance the results of healthcare analytics.
5. Platform Design
This demonstrates the major components of the Healthcare Analytical platform and the types of
services that is needed to build a model.
Data Ingest: The first problem that everyone gets into is data ingest. The device data is coming from
anywhere and of any type, it will be represented in a standard manner to analyze.
In this part of the application, solution of these problems is discussed:
How to ingest data from medical devices?
How to transform the device data into a format that can be analyzed in a Streams application?
What are the common data schema types when analyzing medical device data?
6
10th International Conference on Computer and Electrical Engineering IOP Publishing
IOP Conf. Series: Journal of Physics: Conf. Series 933 (2018)
1234567890 ‘’“”012010 doi:10.1088/1742-6596/933/1/012010
Data Preparation: Next, there is a need to prepare the data to analyze. Some of the common
problems need to remove include Deduplication, Resampling, Normalization, Cleaning, Noise
reduction etc[20].
Data Processing: Data processing can be done by using two types like Base Analytics and
Aggregated analytics.
Base Analytics: The platform should provide a set of basic analytics of data. These analytics can be
used as building blocks for the complex analytics and prediction models. Here are some of the things
we can do like Simple vital analytics (calculating rolling average, raising alerts when vitals going
beyond normal range), Analytics of ECG, EEG, ICP Waveforms etc.
Aggregated Analytics: This is an area where we combine and aggregate results from the base analytics
to form more sophisticated analytic rules. For example, for septic shock detection, the user should be
able to describe a rule like this:
if temperature is > 37 degree Celsius or < 33 degrees Celsius
and if heart rate is > 90
and respiratory rate is > 19 or PaCO2 < 31 mm HG
and WBC > 13,000/mm3, or < 5000/mm3, or > 9% band
then raise an alert for early septic shock detection.
Patient Data Correlation / EMR Integration: For more complex analytics, we may need to use
EMR data or doctor's notes. For example, we may need to retrieve patient's medical history. Or need
to retrieve doctor's notes that include some of the doctor's observations[21].
These types of data can be ingested from existing hospital infrastructure using the HL7 / FHIR
protocol.
Central Monitoring Dashboard: As part of the platform, there is a need to create a simple dashboard.
This will help users to visualize their data and validate their analytic results. The dashboard can be
web-based, mobile.
Alert and Notification Framework: When an important event occurs, we need to be able to notify
and alert the right people to help the patient. In this part of the framework, we need to deliver
notification to the right people based on alert types and patient information, alerts can be delivered via
email, text messaging, etc. or should be displayed onto a dashboard.
Data sources: As there are diverse sources of Health data available now a days like data from medical
devices, Social media, EMR, Back- office-systems, Pharmacy, Medical images and HL7 events
7
10th International Conference on Computer and Electrical Engineering IOP Publishing
IOP Conf. Series: Journal of Physics: Conf. Series 933 (2018)
1234567890 ‘’“”012010 doi:10.1088/1742-6596/933/1/012010
Data ingestion: While developing the system, the first step needed is to integrate the data i.e. collect
the data from various data sources. Data Ingestion can be from multiple sources, so need a unified
platform where we can manage all our data sources. Here the data can be generated from two types of
sources, one is real time data and other is batch data. To ingest data from real time data sources,
Apache NiFi and Kafka API can be used.
• Apache NiFi: Apache NiFi is distributed, scalable, fault-tolerant workflow automation platform. So,
we will be having unified view on our Web UI from where we can monitor all our different data
sources from where data ingestion is taking place[22].
• Apache Kafka: Apache Kafka is a Distributed Messaging System and disk based cache. It can store
the data that is collected by Apache NiFi. It provides unified, high-throughput, low latency and works
on publish-subscribe model.
To ingest data in batches, NFS (Network File system) can be used. Data from Batch files can be
ingested using Apache NiFi and Kafka API also but to process older data that was stored on
distributed servers can only be accessed through NFS. NFS protocol provides remote access to shared
disks across networks. An NFS-enabled server can share directories and files with clients, allowing
users and programs to access files on remote systems as if they were stored locally. An NFS-mounted
cluster allows easy data ingestion of data sources such as files, images... from other machines
leveraging standard Linux commands, utilities, applications, and [Link] FS is free and open
source software that can be used for distributed storage of images and videos and used for data
analysis whereas Tachyon provide in memory storage with fast data processing. It is also used to
process the data at memory speed.
Data Processing: After ingestion of data from various sources, next step is to process the data i.e. to
clean the data, normalize the data, remove all outliers, data mining, etc. For data processing, Apache
spark and apache apex will be used[23].
Apache spark is a next generation distributed parallel processing framework having a rich set of APIs
for machine learning, GraphX for graph processing[24]. Spark is much faster than MapReduce for
iterative algorithms because Spark tries to keep things in memory whereas MapReduce involves more
reading and writing from disk[25] .
Apache Gearpump is a lightweight real-time big data streaming engine. It is a simple pump that
consists of two gears. It is difficult to process stream data because of low latency assurance, infinite
data, Out-of-order data, ● Correctness requirement, Cheap to update user logics but Apache Gearpump
solve all these problems as it provides good user interface, control the flow of data, out-of-order
processing.
Apache Apex is an engine for processing streaming data. Some other engines that can also fulfil the
requirement of stream processing are Apache storm, Apache flink[26]. But Apache Apex has a built-in
support for fault-tolerance, scalability and focus on operability.
Apache Flink is an open source platform for distributed stream and batch data processing. Flink has an
API that perform operations on streams of data. At the time of submission of a job, theseoperations are
turned into a graph by Flink.. It also supports an API that are compatable with storm.
Data Storage: As already discussed, Health care data consists of a variety of data like images, doctor
notes, lab reports, insurance data etc. Different types of storage system are also needed like cloud to
store the data. To store the data Gluster FS, Tachyon, Cassandra, Titan DB and Elasticsearch can be
used. Gluster FS is free and open source software that can be used for distributed storage of images
and videos and data analysis. Tachyon is used to process the data at memory speed. Also provide in
memory storage with fast data processing. Apache Cassandra is a free and open-source distributed
database management system used to handle large amounts of data like sensor data and claim data. It
is an open source project developed by Facebook Titan dB is a database used to store graph and link
analysis. Elastic search is a search engine used for indexing.
It is the study of ideas which allows computers to do the things that make people seem intelligent. The
central goals of Artificial Intelligence are to make computers more practical and to understand the
principles which make intelligence possible
There is need to make system AI enabled so that machine can automatically predict and prescribe the
results from its own experience. To make system AI enabled, Natural Language processing,
Knowledge representation, Automated reasoning, and Machine Learning should be in the system as
shown in fig 3. NLP helps to make computer/machines as intelligent as human beings in
understanding language.
7.2. Integrating Big Data, Analytics, Artificial Intelligence, and Machine Learning in Healthcare
With enormous amounts of computational power, machines can analyze large sets of data points and
apply correlation modelling in a predictive way and in real time. Big Data technology has the potential
to influence machine learning capabilities accurately and real-time decision-making capability helps in
improving overall operating efficiency and reducing unnecessary costs.
In healthcare, machine learning plays a big role by understanding different parameters and correlate
them with diseases. Machine learning is defined as a process in which computers use machine learning
algorithms to analyse large set of data that is represented in non-linear ways, identify patterns, and
make predictions that can be tested and confirmed.
At a high level, there is supervised machine learning, in which outcomes will be predicted from
available data and information from previous outcomes. Further, in unsupervised machine learning,
unknown patterns of outcomes will be predicted from data.
Machine learning can be used in healthcare by increasing efficiencies, saving money and saving lives.
CT scan data can be analyze and cross-applied to patient records to see who is most at risk
9
10th International Conference on Computer and Electrical Engineering IOP Publishing
IOP Conf. Series: Journal of Physics: Conf. Series 933 (2018)
1234567890 ‘’“”012010 doi:10.1088/1742-6596/933/1/012010
for disease. Physicians can predict post-discharge outcomes to reduce readmissions and enhance
patient flow. They can make curative diagnostics faster, more accurate, and more accessible.
Machine learning, deep learning, and cognitive computing are various steps towards a high degree of
artificial intelligence, but they are different thing. Deep learning is a subset of machine learning. It
uses artificial neural networks that simulates human brain connections that become “trained” with the
time to provide answers to questions with nearly 100 percent accuracy.
AI is basically the intelligence that how we make machines intelligent, whereas machine learning is
the process of implementing the compute methods that support it. AI algorithms can predict
post-discharge results, reducing readmissions and optimizing patient flow. In other words, physicians
can use these solutions to make medical diagnostics faster, more precise and manageable. For a
caregiver, identifying a pattern is the first step which further solves the complicated process of
treatment. Here, artificial intelligence, machine learning will do the same type of
thing. Companion diagnostics will help to find the gaps in our current data resources and help us move
into truly personalized medicine.
8. Conclusions
This paper provides a generalized framework for personalized healthcare which influences the
advantages of remote monitoring, cloud computing, big data and reactive machine learning. It
provides systematic approach to support fast growing data of people with severe diseases. It also
simplifies the task of physicians by not overwhelming them with false alerts. The proposed
architecture can support for Artificial intelligence bssed healthcare analytics by providing batch and
stream computing, extendable storage solution and query management. To achieve more efficient
result from healthcare, there is still a need to handle the health care data that is growing time by time
at high rate, and larger scale with tons of inconsistent data sources. A distributed system should be
arranged to interchange data among labs, hospital systems, and clinical centres. Previous analytics
tools have become transparent and friendly. But, when they emerged with open source development
tools and platforms, it will become very complex, need complex programming and need the
10
10th International Conference on Computer and Electrical Engineering IOP Publishing
IOP Conf. Series: Journal of Physics: Conf. Series 933 (2018)
1234567890 ‘’“”012010 doi:10.1088/1742-6596/933/1/012010
application of a variety of Refine, KNIME and data mining can be used to improve the efficiency of
data analytics.
9. References
[1] R. Weil, ’’ Big Data In Health: A New Era For Research And Patient Care Alan R. Weil’’, Health
Affair, Vol. 33, N°7, pp 1110, 2014.
[2] A. Alexandru, D. Coardos, Big Data in Healthcare and Medical Applications in Romania, IEEE
International Conference on Automation, Quality and Testing, Robotics, THETA 20th edition,
2016.
[3] T., Huang, L., Lan, ‘’Promises and Challenges of Big Data Computing in Health Sciences’’, Big
Data Research vol. 2, pp 2-11, 2015.
[4] K. Roney, If Interoperability is the Future of Healthcare, What's the Delay?, Becker's Hospital
Review, 2012.
[5] Yichuan Wang, LeeAnn Kung, Terry Anthony Byrd, “Big data Analytics: Understanding its
capabilities and potential benefits for healthcare organizations”, in Press Article,” Technology
forecasting & social change”, Science Direct, Elsevier, 2016.
[6] Jonathan Northover, Brian McVeigh, Sharat Krishnagiri. Healthcare in the cloud: the opportunity
and the challenge. MLD.
[7] CSCC, Impact of Cloud Computing on Healthcare, pp. 1-18, 2012.
[8] I.A.T. Hashem, I. Yaqoob, N.B. Anuar, S. Mokhtar, A. Gani, S.U. Khan, The rise of “big data”
on cloud computing: Review and open research issues, Information Systems, Vol. 47, pp. 98-115,
2015.
[9] P. Piletic, How is cloud and big data changing healthcare in 2016.
[10] Priyanka Jain, Sanjay Ojha, “Significance of Big Data Analytics”, International Journal of
Software and Web Sciences (IJSWS), 2015
[11] Basma Boukenze, Hajar Mousannif, Abdelkrim Haqiq, “A Conception of a Predictive Analytics
Platform in Healthcare Sector by Using Data Mining Techniques and Hadoop” in Proc. Conf.
International Journal of Advanced Research in Computer Science and Software Engineering,
Volume 6, Issue 8, August 2016.
[12] Akshay Raul, Atharva Patil, Prem Raheja, Rupali Sawant, “Knowledge Discovery, Analysis and
Prediction in Healthcare using Data Mining and Analyics”, in Proc. Conference, 2nd International
Conference on Next Generation Computing Technologies (NGCT-2016) Dehradun, India 14-16
October 2016.
[13] Frost & Sullivan, Drowning in big data? Reducing information technology complexities and costs
for healthcare organizations, 2012.
[14] B. Feldman, E.M. Martin, T. Skotnes, Big Data in Healthcare Hype and Hope, October 2012. Dr.
Bonnie 360, 2012.
[15] Akshay Raul; Atharva Patil; Prem Raheja; Rupali Sawant, “ Knowledge discovery, analysis and
prediction in healthcare using data mining and analytics, in Proc. 2nd International Conference on
Next Generation Computing Technologies (NGCT),IEEE, 2016.
[16] M. Roopa, Dr.S. Manju Priya, “A Review of Big Data Analytics in Healthcare” in Proc. Conf.
International Journal for Scientific Research & Development| Sp. Issue – Data Mining 2015.
[17] Aditi Bansal and Priyanka Ghare, “Healthcare Data Analysis using Dynamic Slot Allocation in
Hadoop” in Proc. Conf. International Journal of Recent Technology and Engineering (IJRTE)
11
10th International Conference on Computer and Electrical Engineering IOP Publishing
IOP Conf. Series: Journal of Physics: Conf. Series 933 (2018)
1234567890 ‘’“”012010 doi:10.1088/1742-6596/933/1/012010
12