Quantum Computing A Tool in Big Data Analytics
Quantum Computing A Tool in Big Data Analytics
Analytics
AKSHAT GAURAV1 , KWOK TAI CHUI2 , FRANCESCO COLACE3 ,
1
Ronin Institute, Montclair, USA, Email: akshat.gaurav@ieee.org
2
Hong Kong Metropolitan University (HKMU), Hong Kong, Email: jktchui@hkmu.edu.hk
3
University of Salerno, Italy, Email: fcolace@unisa.it
ABSTRACT Massive volumes of data, referred to as "big data," can do amazing things. Because of the enormous
potential it has, it’s been a hot issue for the previous two decades. Public and commercial sector organisations are
using big data analytics to better the services they provide. Data management and analysis are necessary in order to
extract relevant information from this little amount of data. Instead of searching for answers in vast data, it becomes
like searching for the needle in the haystack. As a result of its many promises for information processing systems,
quantum computing comes to the rescue, notably in the area of Big Data Analytics. Quantum computing’s power is built
on quantum physics’ principles. Because these events lack a classical counterpart, conventional computers is unable to
provide the same results. Here, we’ve taken a look at what’s out there on Big Data Analytics using Quantum Computing.
As a completely new subject, quantum computing presents a number of open issues. Quantum computing in Big data
analytics is also emphasised for its problems, potential, and future directions and methodologies.
10 VOLUME 3, 2022
A. Gaurav et al./ Cyber Security Insights Magazine, Vol 03, 2022
Variety, Validity, Viability, Volatility and Vulnerability can all many sources and processed using error-free technolo-
be represented by the 10 Vs. The following is an explanation gies in order to boost its authenticity.
of each ’V’ in health care big data. – Structured, semi-structured, and unstructured data are
– In big data analytics technology, value indicates the sig- all types of healthcare data that are obtained from a
nificance of the data analysed. Processing medical data variety of sources. Hence, diversity depicts the wide
using an inaccurate processing approach diminishes the range of variations that exist in healthcare information.
value of the processed data. Healthcare data processing provides a significant prob-
– The created data’s volume is represented by this value. lem.
Health care information is complicated, and at times it – It refers to the accuracy of the medical data. When it
includes a significant amount of noise. Patient records, comes to healthcare data, validity is critical since if the
biometric sensor readings, and x-ray pictures are among data isn’t legitimate, the processed results are useless.
the most common types of healthcare data. Global There are a variety of methods and instruments for
healthcare data was estimated to be 500 petabytes in verifying the accuracy of healthcare data.
2012 and is predicted to become 2500 petabytes by – When it comes to locating the essential information in
the end of 2020, according to the estimate. 1.2 to 2.4 a big ocean of data, the process becomes much more
Exabytes of healthcare big data are created each year. difficult since the amount of healthcare information is so
– Velocity - This is the pace at which healthcare data enormous. In order to utilise healthcare data effectively,
is processed, i.e., the rate at which data is generated, we must first pick relevant data.
stored, and then transported. There are several sites – An enormous quantity of healthcare data is created
where healthcare data is created, including laboratories every second because to the digital revolution. In ad-
and doctor’s offices. There must be a quick data inter- dition to the previously saved data, this new data is also
change between multiple sites for real-time analysis of included. The next step is to determine the data’s life
large healthcare datasets expectancy. Healthcare data loses relevance with time,
– Veracity is a measure of the data’s dependability. There making it a research topic to determine the appropriate
is a greater chance that a patient’s life will be saved if lifespan of the data.
the data is accurate. This data should be gathered from – An important goal in healthcare is to create an efficient
VOLUME 3, 2022 11
A. Gaurav et al./ Cyber Security Insights Magazine, Vol 03, 2022
quantum measurement. With a chance of |ax|2 of the state result, handling the massive amounts of data generated by
being destroyed, a complete state measurement produces biomedical research requires both biology and IT expertise.
an outcome of x. As a result, even though the amount of Bioinformaticians often have a dual skill set like this.
information required to describe the quantum state of n
qubits grows exponentially as n increases, measurements can IV. BIG DATA AND QUANTUM COMPUTING
only retrieve n bits of information. Finding a means to take An efficient linear and non-linear binary classifier may be
use of quantum computers’ exponential state space despite implemented on a quantum computer using a support vector
these and other limitations is the primary issue of quantum machine with exponential speedups in the size of the vectors
algorithm creation. and number of training instances [18]. Performing a prin-
cipal component analysis and a matrix inversion effectively
III. BIG DATA ANALYTICS CHALLENGES is at the heart of the approach, which relies on a non-
The term "big data" refers to the massive volumes of data sparse matrix simulation technique. Weinstein [19] discusses
that are being created at a fast pace. It is more important to the strong visual approach of dynamic quantum cluster-
optimise customer services than it is to maximise consumer ing (DQC), which works with large and high-dimensional
consumption of the data obtained from diverse sources. Big data. Because it uses differences in data density (in feature
data from biomedical research and healthcare also falls into space) and uncovers hidden subsets, it can handle large,
this category. Big data is a tremendous difficulty because high-dimensional datasets. To demonstrate how and why
of the sheer number of information it contains. As part of data points are identified as simple cluster members with
the scientific community, data must be kept in a manner correlations among all variables assessed, a DQC analysis
that is readily accessible and usable for efficient analysis. has produced a video. According to Rebentrost et al. [20],
Implementation of high-end computing tools, protocols, and a quantum computer-implemented optimal binary classifier
high-end hardware in the clinical environment is another key with logarithmic complexity in vector size and training exam-
difficulty in the context of healthcare data. We need experts ple number has been developed. The Generalized Eigen value
from several fields to accomplish this aim, including biol- Proximal SVM (GEPSVM) was introduced by Marghny et
ogy and information technology as well as statisticians and al. [21] to address the SVM complexity problem. Real-world
mathematicians. Pre-installed software solutions provided by data is affected by errors or noise, and dealing with this data
analytic tool makers may make sensor data accessible on may be a difficult challenge. This issue has been addressed in
a storage cloud. AI specialists have built data mining and this paper with the help of a solution. DSAGEPSVM is the
machine learning features for these instruments, which would name of this approach. Using Quantum Computing, Anguita
be able to turn data into knowledge. Implementation would et al [22] explored how to overcome the challenge of effective
improve the efficiency of healthcare big data collection and SVM training, particularly in the case of digital implemen-
analysis as well as the display of such data. The primary tation. Experiments in synthetic and real-world scenarios
goal is to annotate, integrate, and display this complicated are conducted to support the theoretical understanding of
material in a way that facilitates comprehension. Research in the behavioural characteristics of standard and improved
biomedicine is hampered by a lack of clear information about SVMs. The study provided here examines the similarities and
the data available. Finally, computer graphics designers have contrasts between quantum-based optimization and quadratic
devised visualisation tools that can effectively convey this programming. Future research challenges in Big data and
new information. Big data analysis must also contend with quantum comouting Companies are always working to create
the problem of data heterogeneity. Big data in healthcare is new methods for managing and analysing massive amounts
difficult to make sense of because of its enormous quantity of data in order to better incorporate that data into their
and very varied nature. High-power computer clusters ac- operations. In spite of this, the wide variety of products
cessible through grid computing infrastructures are the most available has made it difficult to share data. These are a
prevalent platforms for executing the software framework few of the issues we’ll touch on briefly in this section. Data
that supports big data analysis. Because of its virtualized storage is one of the main issues, although many firms are
storage and dependable services, cloud computing has be- satisfied with storing their own data on their own premises.
come a popular choice for businesses. Highly scalable, highly Control over security, access, and uptime are just a few of
reliable, and completely self-sufficient are just a few of the the benefits. Scaling and maintaining an on-premises server
benefits that come with this system. Such platforms may network may be costly and time consuming. Cloud-based
operate as a receiver of data from the omnipresent sensors, storage employing IT infrastructure looks to be a more cost-
as a computer to analyse and interpret the data, and as a web- effective and reliable alternative for most healthcare firms,
based visualisation tool for the user. Mobile edge computing according to this study. Organizations should only work with
cloudlets and fog computing may be used in IoT to process cloud service providers that are cognizant of the need of
massive data closer to the data source. ML and AI methods security. For these reasons and more, cloud storage is becom-
to large data analysis on computer clusters need the use of ing increasingly popular. A hybrid approach to data storage
advanced algorithms. Could be written in a programming may be the most adaptable and viable option for providers
language suited to coping with large amounts of data. As a with different data access and storage demands. Cleaning
VOLUME 3, 2022 13
A. Gaurav et al./ Cyber Security Insights Magazine, Vol 03, 2022
After collection, data must be cleaned or scrubbed to guar- order to reap the benefits of Big Data. More study in these
antee its precision, correctness, consistency, relevance, and sub-fields is needed to handle the problem of Big Data ith the
purity. Automated logic rules may be used to achieve high help of quantum computing.
levels of correctness and integrity in this cleaning procedure.
Machine learning methods may be used to decrease costs REFERENCES
and time, and to prevent bad data from derailing big data [1] B. B. Gupta, A. Gaurav, and D. Peraković, “A big data and deep learning
initiatives, using more complex and accurate tools. Managing based approach for ddos detection in cloud computing environment,” in
2021 IEEE 10th Global Conference on Consumer Electronics (GCCE).
large amounts of data is very challenging, particularly when IEEE, 2021, pp. 287–290.
imperfect data is involved. Data storage and processing and [2] C. L. Stergiou and et al., “Secure machine learning scenario from big
sharing necessitates the creation of a unifirma format. Protec- data in cloud computing via internet of things network,” in Handbook of
computer networks and cyber security. Springer, 2020, pp. 525–554.
tion breaches, hacks, phishing assaults and even ransomware [3] B. B. Gupta, S. Yamaguchi, and D. P. Agrawal, “Advances in security
attacks have made data security a top responsibility for every and privacy of multimedia big data in mobile and cloud computing,”
firm. A set of technological protections was built for the Multimedia Tools and Applications, vol. 77, no. 7, pp. 9203–9208, 2018.
[4] L. Gyongyosi and S. Imre, “A survey on quantum computing technology,”
protected stored data after a number of vulnerabilities were Computer Science Review, vol. 31, pp. 51–71, 2019.
discovered. Organizations are guided by these regulations, [5] C. P. Chen and C.-Y. Zhang, “Data-intensive applications, challenges,
known as HIPAA Security Rules, when it comes to storage, techniques and technologies: A survey on big data,” Information sciences,
vol. 275, pp. 314–347, 2014.
transmissions, authentication procedures and controls over [6] K. Yadav and et al., “2021 hot topics in machine learning research.”
access, integrity, and auditing. Using the latest antivirus soft- [7] F. J. G. Peñalvo, T. Maan, S. K. Singh, S. Kumar, V. Arya, K. T. Chui,
ware, firewalls, encrypting sensitive data, and multi-factor and G. P. Singh, “Sustainable stock market prediction framework using
machine learning models,” International Journal of Software Science and
authentication may save you a lot of time and money in the Computational Intelligence (IJSSCI), vol. 14, no. 1, pp. 1–15, 2022.
long term. Having comprehensive, accurate, and up-to-date [8] K. T. Chui and et al., “Enhancing electrocardiogram classification with
metadata on all of the data stored is essential to a successful multiple datasets and distant transfer learning,” Bioengineering, vol. 9,
no. 11, p. 683, 2022.
data governance strategy. The metadata would include infor- [9] D. Singh, “Captcha improvement: Security from ddos attack,” 2021.
mation such as the date of creation, the purpose of the data, [10] A. Gaurav, V. Arya, and D. Santaniello, “Analysis of machine learning
and who was accountable for it. Later scientific research and based ddos attack detection techniques in software defined network,”
Cyber Security Insights Magazine (CSIM), vol. 1, no. 1, pp. 1–6, 2022.
precise benchmarking might benefit from analysts being able [11] B. Joshi and et al., “A comparative study of privacy-preserving homo-
to reproduce prior questions. As a result, data is more usable morphic encryption techniques in cloud computing,” International Journal
and "data dumpsters" are less likely to be created with useless of Cloud Applications and Computing (IJCAC), vol. 12, no. 1, pp. 1–11,
2022.
data. [12] R. K. S. Rajput, D. Goyal, A. Pant, G. Sharma, V. Arya, and M. K.
With the use of metadata, businesses would be able to Rafsanjani, “Cloud data centre energy utilization estimation: Simulation
query their data and come up with some answers. However, and modelling with idr,” International Journal of Cloud Applications and
Computing (IJCAC), vol. 12, no. 1, pp. 1–16, 2022.
query tools may not be able to access a complete repository [13] F. J. G. Peñalvo, A. Sharma, A. Chhabra, S. K. Singh, S. Kumar, V. Arya,
of data if datasets are not properly interoperable. Moreover, and A. Gaurav, “Mobile cloud computing and sustainable development:
a full picture of a patient’s health may not be created if Opportunities, challenges, and future directions,” International Journal of
Cloud Applications and Computing (IJCAC), vol. 12, no. 1, pp. 1–20,
distinct dataset components are not adequately integrated or 2022.
linked and readily available. Visualizing data using charts, [14] K. Pathoee and et al., “A cloud-based predictive model for the detection of
heatmaps, and histograms to show contrasts and precise breast cancer,” International Journal of Cloud Applications and Computing
(IJCAC), vol. 12, no. 1, pp. 1–12, 2022.
labelling may make it much simpler for humans to absorb [15] B. B. Gupta, Modern Principles, Practices, and Algorithms for Cloud
the information and apply it correctly. There are a variety of Security. IGI Global, 2019.
other examples of data visualisations, such as bar charts, pie [16] P. S. Emani, J. Warrell, A. Anticevic, S. Bekiranov, M. Gandal, M. J.
McConnell, G. Sapiro, A. Aspuru-Guzik, J. T. Baker, M. Bastiani et al.,
charts, and scatterplots. “Quantum computing at the frontiers of biological sciences,” Nature
Methods, vol. 18, no. 7, pp. 701–709, 2021.
V. CONCLUSION [17] “The lens - free & open patent and scholarly search,” https://www.lens.
org/, accessed: 2023-01-01.
Scientific revolutions are set to enter a new phase because of [18] P. Rebentrost, M. Mohseni, and S. Lloyd, “Quantum support vector ma-
Big Data’s role in innovation, competitiveness and produc- chine for big data classification,” Physical review letters, vol. 113, no. 13,
tivity. It is a good thing that we will be able to observe the p. 130503, 2014.
[19] M. Marghny, R. M. A. ElAziz, and A. I. Taloba, “Differential search
technology leapfrogging in the near future. In this article, algorithm-based parametric optimization of fuzzy generalized eigenvalue
we provide a quick introduction to the fundamentals and proximal support vector machine,” arXiv preprint arXiv:1501.00728,
challenges of Big Data and quantum computing. These tech- 2015.
[20] D. Anguita, S. Ridella, F. Rivieccio, and R. Zunino, “Quantum optimiza-
nologies are still in the early stages of development but we tion for training support vector machines,” Neural Networks, vol. 16, no.
are certain that we will see a number of major breakthroughs 5-6, pp. 763–770, 2003.
in the near future. Although Big Data analytics is still in [21] T. A. Shaikh and R. Ali, “Quantum computing in big data analytics: A sur-
vey,” in 2016 IEEE international conference on computer and information
the early stages of development, the current Big Data ap- technology (CIT). IEEE, 2016, pp. 112–115.
proaches and tools are unable to tackle all of the genuine Big [22] R. Dridi and H. Alghassi, “Homology computation of large point clouds
Data challenges. Consequently, governments and businesses using quantum annealing,” arXiv preprint arXiv:1512.09328, 2015.