0% found this document useful (0 votes)
98 views7 pages

What Is Big Data

Veracity refers to uncertainty in data due to inconsistencies and incompleteness. For example, a table is missing some values and lists an improbably low minimum value. Big data can be used for product development by analyzing past product attributes and customer responses to predict demand. It can enable predictive maintenance by analyzing log entries, sensor data, and error messages to identify potential issues before failures occur. Big data also allows analyzing customer experience data from various sources to improve interactions and deliver personalized offers. While big data holds promise, challenges include the large data volumes that are doubling every two years, and the significant time (50-80%) data scientists spend preparing data for analysis. Additionally, big data technology is changing rapidly so

Uploaded by

Kamlesh Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
98 views7 pages

What Is Big Data

Veracity refers to uncertainty in data due to inconsistencies and incompleteness. For example, a table is missing some values and lists an improbably low minimum value. Big data can be used for product development by analyzing past product attributes and customer responses to predict demand. It can enable predictive maintenance by analyzing log entries, sensor data, and error messages to identify potential issues before failures occur. Big data also allows analyzing customer experience data from various sources to improve interactions and deliver personalized offers. While big data holds promise, challenges include the large data volumes that are doubling every two years, and the significant time (50-80%) data scientists spend preparing data for analysis. Additionally, big data technology is changing rapidly so

Uploaded by

Kamlesh Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 7

What is big Data

Veracity refers to the data in doubt or uncertainty of data available due to data inconsistency and
incompleteness. In the image below, you can see that few values are missing in the table. Also, a few
values are hard to accept, for example – 15000 minimum value in the 3rd row, it is not possible. This
inconsistency and incompleteness is Veracity.

Big Data Use Cases


Big data can help you address a range of business activities, from customer experience to
analytics. Here are just a few. (More use cases can be found at Oracle Big Data Solutions.)

Product Development Companies like Netflix and Procter & Gamble use big data to anticip
demand. They build predictive models for new products and services
attributes of past and current products or services and modeling the re
between those attributes and the commercial success of the offerings.
uses data and analytics from focus groups, social media, test markets,
rollouts to plan, produce, and launch new products.

Predictive Maintenance Factors that can predict mechanical failures may be deeply buried in s
such as the year, make, and model of equipment, as well as in unstruc
covers millions of log entries, sensor data, error messages, and engine
analyzing these indications of potential issues before the problems ha
organizations can deploy maintenance more cost effectively and max
equipment uptime.

Customer Experience The race for customers is on. A clearer view of customer experience i
now than ever before. Big data enables you to gather data from social
visits, call logs, and other sources to improve the interaction experien
the value delivered. Start delivering personalized offers, reduce custo
handle issues proactively.

Fraud and Compliance When it comes to security, it’s not just a few rogue hackers—you’re u
expert teams. Security landscapes and compliance requirements are c
evolving. Big data helps you identify patterns in data that indicate fra
large volumes of information to make regulatory reporting much faste

Machine Learning Machine learning is a hot topic right now. And data—specifically big
the reasons why. We are now able to teach machines instead of progr
availability of big data to train machine learning models makes that p

Operational Efficiency Operational efficiency may not always make the news, but it’s an are
data is having the most impact. With big data, you can analyze and as
customer feedback and returns, and other factors to reduce outages an
demands. Big data can also be used to improve decision-making in lin
market demand.

Drive Innovation Big data can help you innovate by studying interdependencies among
institutions, entities, and process and then determining new ways to u
Use data insights to improve decisions about financial and planning c
Examine trends and what customers want to deliver new products and
Implement dynamic pricing. There are endless possibilities.

Big Data Challenges


While big data holds a lot of promise, it is not without its challenges.

First, big data is…big. Although new technologies have been developed for data storage, data volumes
are doubling in size about every two years. Organizations still struggle to keep pace with their data
and find ways to effectively store it.
But it’s not enough to just store the data. Data must be used to be valuable and that depends on
curation. Clean data, or data that’s relevant to the client and organized in a way that enables
meaningful analysis, requires a lot of work. Data scientists spend 50 to 80 percent of their
time curating and preparing data before it can actually be used.
Finally, big data technology is changing at a rapid pace. A few years ago, Apache Hadoop was the
popular technology used to handle big data. Then Apache Spark was introduced in 2014. Today, a
combination of the two frameworks appears to be the best approach. Keeping up with big data
technology is an ongoing challenge.

How Big Data Works


Big data gives you new insights that open up new opportunities and business models. Getting
started involves three key actions:

1. Integrate

Big data brings together data from many disparate sources and applications. Traditional data
integration mechanisms, such as ETL (extract, transform, and load) generally aren’t up to the
task. It requires new strategies and technologies to analyze big data sets at terabyte, or even
petabyte, scale.
During integration, you need to bring in the data, process it, and make sure it’s formatted and
available in a form that your business analysts can get started with.

2. Manage

Big data requires storage. Your storage solution can be in the cloud, on premises, or both.
You can store your data in any form you want and bring your desired processing
requirements and necessary process engines to those data sets on an on-demand basis. Many
people choose their storage solution according to where their data is currently residing. The
cloud is gradually gaining popularity because it supports your current compute requirements
and enables you to spin up resources as needed.

3. Analyze

Your investment in big data pays off when you analyze and act on your data. Get new clarity
with a visual analysis of your varied data sets. Explore the data further to make new
discoveries. Share your findings with others. Build data models with machine learning and
artificial intelligence. Put your data to work.

Big Data Best Practices


To help you on your big data journey, we’ve put together some key best practices for you to
keep in mind. Here are our guidelines for building a successful big data foundation.

Align Big Data with Specific More extensive data sets enable you to make new discoveries. To that end
Business Goals base new investments in skills, organization, or infrastructure with a strong
context to guarantee ongoing project investments and funding. To determ
right track, ask how big data supports and enables your top business and IT
Examples include understanding how to filter web logs to understand ecom
deriving sentiment from social media and customer support interactions, a
statistical correlation methods and their relevance for customer, product, m
engineering data.

Ease Skills Shortage with One of the biggest obstacles to benefiting from your investment in big data
Standards and Governance You can mitigate this risk by ensuring that big data technologies, considera
are added to your IT governance program. Standardizing your approach wi
manage costs and leverage resources. Organizations implementing big dat
strategies should assess their skill requirements early and often and should
identify any potential skill gaps. These can be addressed by training/cross-t
resources, hiring new resources, and leveraging consulting firms.

Optimize Knowledge Transfer Use a center of excellence approach to share knowledge, control oversight
with a Center of Excellence project communications. Whether big data is a new or expanding investme
hard costs can be shared across the enterprise. Leveraging this approach c
data capabilities and overall information architecture maturity in a more s
systematic way.

Top Payoff Is Aligning It is certainly valuable to analyze big data on its own. But you can bri
Unstructured with Structured business insights by connecting and integrating low density big data w
Data data you are already using today.

Whether you are capturing customer, product, equipment, or environm


the goal is to add more relevant data points to your core master and an
summaries, leading to better conclusions. For example, there is a diff
distinguishing all customer sentiment from that of only your best cust
why many see big data as an integral extension of their existing busin
capabilities, data warehousing platform, and information architecture

Keep in mind that the big data analytical processes and models can be
machine-based. Big data analytical capabilities include statistics, spat
semantics, interactive discovery, and visualization. Using analytical m
correlate different types and sources of data to make associations and
discoveries.

Plan Your Discovery Lab for Discovering meaning in your data is not always straightforward. Som
Performance even know what we’re looking for. That’s expected. Management and
support this “lack of direction” or “lack of clear requirement.”

At the same time, it’s important for analysts and data scientists to wo
business to understand key business knowledge gaps and requirement
accommodate the interactive exploration of data and the experimenta
algorithms, you need high-performance work areas. Be sure that sand
have the support they need—and are properly governed.

Align with the Cloud Operating Big data processes and users require access to a broad array of resources f
Model experimentation and running production jobs. A big data solution includes
including transactions, master data, reference data, and summarized data.
sandboxes should be created on demand. Resource management is critical
of the entire data flow including pre- and post-processing, integration, in-d
summarization, and analytical modeling. A well-planned private and public
and security strategy plays an integral role in supporting these changing re
Applications of Big Data
We cannot talk about data without talking about the people, people who are getting
benefited by Big Data applications. Almost all the industries today are leveraging Big
Data applications in one or the other way.

 Smarter Healthcare: Making use of the petabytes of patient’s data, the organization
can extract meaningful information and then build applications that can predict the
patient’s deteriorating condition in advance.

 Telecom: Telecom sectors collects information, analyzes it and provide solutions to


different problems. By using Big Data applications, telecom companies have been able
to significantly reduce data packet loss, which occurs when networks are overloaded,
and thus, providing a seamless connection to their customers.

 Retail: Retail has some of the tightest margins, and is one of the greatest beneficiaries
of big data. The beauty of using big data in retail is to understand consumer behavior.
Amazon’s recommendation engine provides suggestion based on the browsing history
of the consumer.

 Traffic control: Traffic congestion is a major challenge for many cities globally.
Effective use of data and sensors will be key to managing traffic better as cities become
increasingly densely populated.

 Manufacturing: Analyzing big data in the manufacturing industry can reduce


component defects, improve product quality, increase efficiency, and save time and
money.

 Search Quality: Every time we are extracting information from google, we are
simultaneously generating data for it. Google stores this data and uses it to improve
its search quality.

Someone has rightly said: “Not everything in the garden is Rosy!”. Till now in this
Big Data tutorial, I have just shown you the rosy picture of Big Data. But if it was so
easy to leverage Big data, don’t you think all the organizations would invest in it? Let
me tell you upfront, that is not the case. There are several challenges which come
along when you are working with Big Data.

Now that you are familiar with Big Data and its various features, the next section of
this blog on Big Data Tutorial will shed some light on some of the major challenges
faced by Big Data.

Challenges with Big Data


Let me tell you few challenges which come along with Big Data:
1. Data Quality – The problem here is the 4th V i.e. Veracity. The data here is very messy,
inconsistent and incomplete. Dirty data cost $600 billion to the companies every year
in the United States.

2. Discovery – Finding insights on Big Data is like finding a needle in a haystack.


Analyzing petabytes of data using extremely powerful algorithms to find patterns and
insights are very difficult.

3. Storage – The more data an organization has, the more complex the problems of
managing it can become. The question that arises here is “Where to store it?”. We
need a storage system which can easily scale up or down on-demand.

4. Analytics – In the case of Big Data, most of the time we are unaware of the kind of
data we are dealing with, so analyzing that data is even more difficult.

5. Security – Since the data is huge in size, keeping it secure is another challenge. It
includes user authentication, restricting access based on a user, recording data access
histories, proper use of data encryption etc.

6. Lack of Talent – There are a lot of Big Data projects in major organizations, but a
sophisticated team of developers, data scientists and analysts who also have sufficient
amount of domain knowledge is still a challenge.

Hadoop to the Rescue


We have a savior to deal with Big Data challenges – its Hadoop. Hadoop is an open
source, Java-based programming framework that supports the storage and
processing of extremely large data sets in a distributed computing environment. It is
part of the Apache project sponsored by the Apache Software Foundation.

Hadoop with its distributed processing, handles large volumes of structured and
unstructured data more efficiently than the traditional enterprise data warehouse.
Hadoop makes it possible to run applications on systems with thousands of commodity
hardware nodes, and to handle thousands of terabytes of data. Organizations are
adopting Hadoop because it is an open source software and can run on commodity
hardware (your personal computer). The initial cost savings are dramatic as
commodity hardware is very cheap. As the organizational data increases, you
need to add more & more commodity hardware on the fly to store it and hence,
Hadoop proves to be economical. Additionally, Hadoop has a robust Apache
community behind it that continues to contribute to its advancement.

As promised earlier, through this blog on Big Data Tutorial, I have given you the
maximum insights in Big Data. This is the end of Big Data Tutorial. Now, the next step
forward is to know and learn Hadoop. We have a series of Hadoop tutorial blogs which
will give in detail knowledge of the complete Hadoop ecosystem.

All the best, Happy Hadooping!


Now that you have understood what is Big Data, check out the Big Data training by
Edureka, a trusted online learning company with a network of more
than 250,000 satisfied learners spread across the globe. The Edureka Big Data
Hadoop Certification Training course helps learners become expert in HDFS, Yarn,
MapReduce, Pig, Hive, HBase, Oozie, Flume and Sqoop using real-time use cases
on Retail, Social Media, Aviation, Tourism, Finance domain.

You might also like