0% found this document useful (0 votes)
12 views

F.4. Data Analytics Part 1

Uploaded by

Kondreddi Saku
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

F.4. Data Analytics Part 1

Uploaded by

Kondreddi Saku
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

1 Big data earned its name, because it is different from previous methods of data analysis

in four main ways: the "Four Vs" of big data. One way that big data is different is the
rapid production and processing of information. Which of the "Four Vs" of big data is
most applicable to this definition?

CalculatorTime Value Tables


A. Valiant
B. Volume
C. Velocity
D. Veracity
Explanation

Choice "C" is correct. The "Four Vs" of big data are velocity (the speed at which data is
gathered and processed); volume (the amount of storage required to retain the data
gathered); variety (the spread of data types across sensor values, numbers, text,
pictures, etc.); and veracity (the accuracy of the data and the presumption that data
within the same dataset may be of different quality).

Velocity means speed, and often, big data applications create data very rapidly. Some
applications require that decisions be made quickly. This is seen in self-driving cars,
which have many sensors creating data that a computer uses to determine road
conditions and to react quickly with driving decisions.

Choice "A" is incorrect. Valiant is having or showing heroism or courage. This is not one
of the "Four Vs" of big data.

Choice "B" is incorrect. Volume is the size of the data. This question is most focused on
the speed of incoming data and the speed of making decisions based on that data, the
velocity.

Choice "D" is incorrect. Veracity is the accuracy of the data, and the expectation that
when a company gathers big data, some of that data will likely be of different quality.
Any processing that is done on the big data must be able to handle varying degrees of
data quality.
2 Big data earned its name, because it is different from previous methods of data analysis
in four main ways: the "Four Vs" of big data. One way that big data is different is the
automatic accumulation of data from different sources. The data may contain errors
from malfunctioning sensors or be collected by different vendors that have different
measures, methods, or definitions. The data may also vary in format, complicating the
work of accurately consolidating the data into a single system. Which of the "Four Vs" of
big data is described in this definition?

CalculatorTime Value Tables


A. Volume
B. Veracity
C. Variety
D. Vicinity
Explanation

Choice "B" is correct. The "Four Vs" of big data are velocity (the speed at which data is
gathered and processed); volume (the amount of storage required to retain the data
gathered); variety (the spread of data types across sensor values, numbers, text,
pictures, etc.); and veracity (the accuracy of the data and the presumption that data
within the same dataset may be of different quality).

The situation above describes errors and inaccuracies, which come into the dataset
along with the desirable and accurate "good" data. This is a concern about the veracity
of the data. A company that analyzes big data must accept data of varying quality,
recognize the difference between undesirable and desirable data, and correct or discard
the undesirable data before using the data.

Choice "A" is incorrect. Volume is the size of the data. This question is more concerned
with quality than with quantity.

Choice "C" is incorrect. Variety is different types of data being collected and analyzed.
While the question does refer to a variety of data, the variety in the question is the result
of differing data sources, not differing types of data. Normalizing data from different data
sources is an issue of veracity.

Choice "D" is incorrect. Vicinity is the area near or surrounding a place. This is not one
of the "Four Vs" of big data.
3 Nova Vista Corporation wants to leverage big data in its fraud division by purchasing a
license to a large quantity (15 petabytes) of transaction records from other companies.
Nova Vista will analyze these to improve its ability to identify fraudulent transactions
from its own business partners and customers. In order to handle such a large dataset,
Nova Vista is comparing the costs of increasing the capacity of its data center versus
contracting a Cloud provider. Nova Vista is most concerned with which "Four Vs" aspect
of big data?

CalculatorTime Value Tables


A. Velocity
B. Volume
C. Vehemence
D. Variety
Explanation

Choice "B" is correct. The "Four Vs" of big data are velocity (the speed at which data is
gathered and processed); volume (the amount of storage required to retain the data
gathered); variety (the spread of data types across sensor values, numbers, text,
pictures, etc.); and veracity (the accuracy of the data and the presumption that data
within the same dataset may be of different quality).

Due to the size of the dataset, volume is Nova Vista's main concern. Such large
datasets often require dedicated infrastructure to hold them, and this leads to Nova
Vista's decision to increase its own capacity or to rent someone else's.

Choice "A" is incorrect. Velocity is the speed at which the data is processed and
analyzed to make meaningful decisions. The question refers to a large quantity of data,
which would be volume.

Choice "C" is incorrect. Vehemence is a display of intensity, forcefulness, or strong


feelings, and it is not one of the "Four Vs" of big data. Nova Vista has specific concerns
about its capacity to hold very large datasets, which would be volume.

Choice "D" is incorrect. Veracity is the accuracy of the data and the expectation that
when a company gathers big data, some of that data will likely be of different quality.
Any processing that is done on big data must be able to handle varying degrees of data
quality. Nova Vista lists concerns about the size of the dataset, which means it's more
worried about volume.
4 Bigotech Corporation wishes to collect the social media feeds of its customers and
potential customers to see if there are patterns it can use to attract new customers. The
marketing director is concerned that the targets use their social media for many different
activities. They post text, pictures, songs, videos, and links to other applications. In
order to find patterns, Bigotech needs a way to store, organize, and compare different
types of data with one another. With which of the "Four Vs" of big data is Bigotech most
concerned?

CalculatorTime Value Tables


A. Volume
B. Victory
C. Veracity
D. Variety
Explanation

Choice "D" is correct. The "Four Vs" of big data are velocity (the speed at which data is
gathered and processed); volume (the amount of storage required to retain the data
gathered); variety (the spread of data types across sensor values, numbers, text,
pictures, etc.); and veracity (the accuracy of the data and the presumption that data
within the same dataset may be of different quality).

Variety is different types of data collected and included in big data analyses. Bigotech is
specifically concerned with how it is going to store, organize, and compare these
different types of data, so it's concerned about its data's variety.

Choice "A" is incorrect. Volume is the size of the data. The question refers to different
types of data, which is variety.

Choice "B" is incorrect. Victory is the condition of winning and is not one of the "Four
Vs" of big data. Bigotech has specific concerns about its ability to deal with data of
different types, which is variety.

Choice "C" is incorrect. Veracity is the accuracy of the data and the expectation that
when a company gathers big data, some of that data will likely be of different quality.
Any processing that is done on big data must be able to handle varying degrees of data
quality. Bigotech lists concerns about the variety of its data, because it includes so
many kinds of files.
5 Apricot Intentions Inc. plans to build the next-generation network of refueling stations
and wants to use big data collection methods on mobile phone data to determine
vehicle locations. At times, multiple communications towers will simultaneously report
the location of a given phone. This produces overcounts, which can distort the analysis,
so Apricot must make plans for how it will clean the data before analyzing it. With which
of the "Four Vs" of big data is Apricot most concerned?

CalculatorTime Value Tables


A. Visibility
B. Variety
C. Veracity
D. Velocity
Explanation

Choice "C" is correct. The "Four Vs" of big data are velocity (the speed at which data is
gathered and processed); volume (the amount of storage required to retain the data
gathered); variety (the spread of data types across sensor values, numbers, text,
pictures, etc.); and veracity (the accuracy of the data and the presumption that data
within the same dataset may be of different quality).

Veracity is the accuracy of the data, and the expectation that when a company gathers
big data, some of that data will likely be of different quality. Any processing that is done
on the big data must be able to handle varying degrees of data quality. The company
must first determine how to clean the data, then determine if it's worth the expense to do
so in order to improve the veracity of the data before using it for analysis.

Choice "A" is incorrect. Visibility means that something can be seen. This is not one of
the "Four Vs" of big data. Apricot is concerned with the accuracy and quality of its data,
which is veracity.

Choice "B" is incorrect. Variety is different types of data being collected and analyzed.
Apricot is concerned with the accuracy and quality of its data, which is veracity.

Choice "D" is incorrect. Velocity is the speed at which the data is processed and
analyzed to make meaningful decisions. Apricot is concerned with the accuracy and
quality of its data, which is veracity.
6 Baiem, LLC is analyzing stock market trading data. It intends to cease predicting future
stock price movement and wants to quickly analyze what the stock market is doing
every moment. By reacting quickly, Baiem will catch the rises and falls in stock prices
caused by other market actors. Baiem believes it will make more money by catching a
portion of every stock movement, rather than catching all of the movement each time it
correctly predicts a trend. With which of the "Four Vs" of Big Data is Baiem most
concerned?

CalculatorTime Value Tables


A. Velocity
B. Volume
C. Variety
D. Vexation
Explanation

Choice "A" is correct. The "Four Vs" of Big Data are velocity (the speed at which data is
gathered and processed); volume (the amount of storage required to retain the data
gathered); variety (the spread of data types across sensor values, numbers, text,
pictures, etc.); and veracity (the accuracy of the data and the presumption that data
within the same data set may be of different quality).

Baiem's proposed strategy relies on speed. Baiem must receive data, analyze it,
produce a recommendation, and act on that recommendation before every other stock
trader. This challenge is addressed by velocity.

Choice "B" is incorrect. Volume is the size of the data. Baiem is less concerned with the
quantity of data coming out of the stock market than the speed at which it can react to it.

Choice "C" is incorrect. Variety is different types of data being collected and analyzed.
Variety is not Baiem's main concern; the data being analyzed includes a stock name, a
price, and a time stamp.

Choice "D" is incorrect. Vexation is the state of being annoyed, frustrated, or worried.
While Baiem may feel vexed about its stock trades, vexation is not one of the "Four Vs"
aspects of Big Data.
7 What type of data file contains a strong pattern of identified components, listed in a
consistent order within each record, and clearly separated by a delimiter or a fixed size?

CalculatorTime Value Tables


A. Unstructured
B. Semi-structured
C. Structured
D. Infra-structured
Explanation

Choice "C" is correct. Unstructured data is data with no underlying pattern, such as text
messages. Structured data is highly ordered with a strong underlying pattern of
identified components in a consistent order within each record and a clear separator.
Semi-structured data is in between, such as text messages with a string of hashtags or
other metadata intended to link similar articles together, or numbers with labels where
those numbers can only be processed after the labels are interpreted.

The question describes a file with a strong underlying pattern among its components:
the definition of structured data.

Choice "A" is incorrect. Unstructured data has no underlying pattern, such as text
messages, where the ideas in the messages are the desired data. The ideas may be
present, but because they are encoded in language, it is not easy for a program to
locate and evaluate them correctly.

Choice "B" is incorrect. Semi-structured data has a weak underlying pattern, one made
from metadata tags or highly various labels.

Choice "D" is incorrect. Infra-structured data is not a type of data structure.


8 What kind of data is a file containing a repository of music songs without titles and
artists, but with metadata tags for music styles intended to link similar music together?

CalculatorTime Value Tables


A. Unstructured
B. Semi-structured
C. Structured
D. Meta-structured
Explanation

Choice "B" is correct. Unstructured data is data with no underlying pattern, such as text
messages. Structured data is highly ordered with a strong underlying pattern of
identified components in a consistent order within each record and a clear separator.
Semi-structured data is in between, such as text messages with a string of hashtags or
other metadata intended to link similar articles together, or numbers with labels where
those numbers can only be processed after the labels are interpreted.

The music does not contain components that can be read by an input procedure; it has
tags attached to it. These tags do not provide structure, but processing could use the
tags to create structure. This is semi-structured data.

Choice "A" is incorrect. Unstructured data has no underlying pattern, such as music
where the styles or moods in the music are the desired data. The criteria may be
present, but because they are encoded in the music, it is not easy for a program to
locate and evaluate them correctly.

Choice "C" is incorrect. Structured data has a strong underlying pattern—a discrete
series of identified components, always in the same order, with a clear separator.

Choice "D" is incorrect. Meta-structured data is not a type of data structure.


9 A file containing photographs with no identifying metadata, such as hashtags or image
descriptions, would be what kind of data?

CalculatorTime Value Tables


A. Unstructured
B. Semi-structured
C. Structured
D. Mega-structured
Explanation

Choice "A" is correct. Unstructured data is data with no underlying pattern, such as text
messages. Structured data is highly ordered with a strong underlying pattern of
identified components in a consistent order within each record and a clear separator.
Semi-structured data is in between, such as text messages with a string of hashtags or
other metadata intended to link similar articles together, or numbers with labels where
those numbers can only be processed after the labels are interpreted.

The question describes a file containing images with no components within them to
subdivide or identify them. Any ideas or concept one might want to analyze may be
present, but because they are encoded in an image, it is not easy for a program to
locate and evaluate them correctly.

Choice "B" is incorrect. Semi-structured data has a weak underlying pattern, one made
from metadata tags or vague labels requiring analysis.

Choice "C" is incorrect. Structured data has a strong underlying pattern—a discrete
series of identified components, always in the same order, with a clear separator.

Choice "D" is incorrect. Mega-structured data is not a type of data structure.


10 Which of the following terms represents the processing of large numbers of common
business events that are processed in a predefined highly structured way?

CalculatorTime Value Tables


A. Data processing
B. Transaction processing
C. Transaction controls
D. Information and communication
Explanation

Choice "B" is correct. Transaction processing is the term used to describe processing
large numbers of commonly occurring business events in a predefined, highly structured
way. Common transaction processing systems include sales, cash receipts, accounts
payable, etc.

Choice "A" is incorrect. The processing of data is part of transaction processing.

Choice "C" is incorrect. Transaction controls is not a term that describes the processing
of business events. Transaction controls include policies and procedures used to
safeguard assets, ensure accurate reporting, etc.

Choice "D" is incorrect. Information and communication is an element of the control grid
described by the COSO and describes the efforts of an entity to capture information that
will provide feedback on the effectiveness of controls.
11 A transaction file contains 1,000 records of sales data, including codes for store
numbers, salesperson identification numbers, sold items, discount codes, quantities,
date, time, and a text field for the customer, all in a specific order separated by
commas. What type of data is this?

CalculatorTime Value Tables


A. Unstructured
B. Semi-structured
C. Structured
D. Super-structured
Explanation

Choice "C" is correct. Unstructured data is data with no underlying pattern, such as text
messages. Structured data is highly ordered with a strong underlying pattern of
identified components in a consistent order within each record and a clear separator,
such as a comma delimited transaction file. Semi-structured data is in between, such as
text messages with metadata (e.g., a string of hashtags).

The question describes a file with a strong underlying pattern among its components
(each of the codes in the question is a component), always in the same order (so the
system always knows the next component to expect), with a clear separator (so the
system can tell when one stops, and the next starts). It is structured data.

Choice "A" is incorrect. Unstructured data has no underlying pattern, such as text
messages, where the ideas communicated in the messages are the desired data. The
ideas may be present, but because they are encoded in language, it is not easy for a
program to locate and evaluate them correctly.

Choice "B" is incorrect. Semi-structured data has a weak underlying pattern, one made
from metadata tags or highly variable labels, such as account names from the financial
statements of different companies.

Choice "D" is incorrect. Super-structured data is not a type of data structure.


12 A library file containing all text message files is sent by all members of a regulatory
agency subject to regulations regarding transparency and accountability. What type of
data is this?

CalculatorTime Value Tables


A. Unstructured
B. Semi-structured
C. Structured
D. Hyper-structured
Explanation

Choice "A" is correct. Unstructured data is data with no underlying pattern, such as text
messages. Structured data is highly ordered with a strong underlying pattern of
identified components in a consistent order within each record and a clear separator.
Semi-structured data is in between, such as text messages with a string of hashtags or
other metadata intended to link similar articles together, or numbers with labels where
those numbers can only be processed after the labels are interpreted.

The question shows a file containing text of varying length, with no components within.
A conversation or idea might be common to many different messages, but it would be
difficult for a program to evaluate this correctly, because these concepts are encoded in
language instead of being clearly recorded in a specific location. Free-form text is
unstructured data.

Choice "B" is incorrect. Semi-structured data has a weak underlying pattern, one made
from metadata tags or highly various labels.

Choice "C" is incorrect. Structured data has a strong underlying pattern—a discrete
series of identified components, always in the same order, with a clear separator.

Choice "D" is incorrect. Hyper-structured data is not a type of data structure.


13 A data file contains images of income statements that were scanned using OCR (optical
character recognition). OCR is the electronic conversion of images of text into machine-
readable text. The files consist of accounts, dollar values, and subtotals; however, the
account names used contain a high degree of variability, so they cannot be reliably
processed one at a time. The whole file must be read to discover enough identifying
information to properly summarize the data. What type of data is this?

CalculatorTime Value Tables


A. Unstructured
B. Semi-structured
C. Structured
D. Pseudo-structured
Explanation

Choice "B" is correct. Unstructured data is data with no underlying pattern, such as text
messages. Structured data is highly ordered with a strong underlying pattern of
identified components in a consistent order within each record and a clear separator,
such as a comma delimited transaction file. Semi-structured data is in between, such as
text messages with a string of hashtags or other metadata intended to link similar
articles together, or numbers with labels where those numbers can only be processed
after the labels are interpreted.

An income statement scanned using OCR has a structure, but it is a weak structure,
because the labels (i.e., account names) attached to the numbers are so varied that
interpreting the structure requires considerable processing. In addition, the structure can
only be identified after translating the image into text. This is one example of semi-
structured data regularly encountered by accountants.

Choice "A" is incorrect. Unstructured data has no underlying pattern, such as text
messages, where the ideas in the messages are the desired data. The ideas may be
present, but because they are encoded in language, it is not easy for a program to
locate and evaluate them correctly.

Choice "C" is incorrect. Structured data has a strong underlying pattern—a discrete
series of identified components, always in the same order, with a clear separator.

Choice "D" is incorrect. Pseudo-structured data is not a type of data structure.


14 Houston Corporation is planning to implement a new information system. Which of the
following statements is/are correct?

I. Houston's requirement for an accounting information system will preclude it from


implementing a transaction processing system.
II. If Houston implements an executive information system, the system will provide top
executives with access to information to assist those executives in monitoring business
conditions.
III. Whichever system Houston implements, the system may be implemented for a specific part
of the business or for the business as a whole as an enterprise system.
CalculatorTime Value Tables
A. I, II, and, III are correct.
B. I and II only are correct.
C. II and III only are correct.
D. II only is correct.
Explanation

Choice "C" is correct.

In this question, the examiners want to know which of a series of statements is/are
correct.

Statement I effectively says that Houston cannot implement an accounting information


system and a transaction processing system at the same time. The categories of
systems are not mutually exclusive. An accounting information system normally
processes accounting transactions of some type (e.g., journal entries, accounts
payable, accounts receivable, etc.) and thus is normally also a transaction processing
system. Statement I is thus incorrect, which means that the first choice and the second
choice are incorrect.

Statement II defines an executive information system. Statement II is correct.

Statement III says that systems may be implemented for a specific part of the business
or for the business as a whole. Statement III is correct.
15 A major investment bank wants to monitor its key risk areas (regulatory, compliance,
and technology) in real time with transparency. How can the bank use data analytics to
monitor the key risk areas?

CalculatorTime Value Tables


A. Use descriptive analytics to examine all journal entries processed for the entire year.
B. Develop a library of indicators to generate different score levels using real-time data analysis
tools.
C. Develop a forecasting model to predict what could happen in each area.
D. Use a deal optimization engine to analyze customer demographics and spending patterns.
Explanation

Choice "B" is correct.

Developing a library of key risk indicators to generate different score levels using real-
time data analysis tools will allow the major investment bank to monitor the regulatory,
compliance, and technology risk areas in real time with transparency.

Choice "A" is incorrect. Descriptive analytics describes events that have already
occurred and because the bank wants to monitor key risk areas in real time, this type of
analytics is not appropriate.

Choice "C" is incorrect. Predictive analytics uses statistical techniques and forecasting
models to predict what could happen. The bank wants to monitor the key risk areas, not
predict what could happen.

Choice "D" is incorrect. Analyzing customer demographics and spending patterns with a
deal optimization engine does not help the investment bank monitor key risk areas.
16 Which of the following statements concerning data mining is(are) correct?

I. Data mining is the analysis of data in a data warehouse performed in order to attempt to
discover hidden patterns and trends in business.
II. Data mining assists managers in making business decisions and strategic planning.
III. Although it will take a little longer without a computer, a manager would be able to perform
data mining analysis manually.
CalculatorTime Value Tables
A. I and II.
B. I and III.
C. II and III.
D. I, II, and III.
Explanation

Choice "A" is correct. Statements I and II are correct.

Statement I: This is a true statement. A major use of data warehouse databases is data
mining. Data mining is the analysis of data in a data warehouse in order to attempt to
discover hidden patterns and trends in historical business activities.

Statement II: This is a true statement. Data mining would help managers understand the
changes that are occurring in a business and would also assist in making strategic
business decisions in order to attempt to get a competitive advantage in the
marketplace.

Statement III is a false statement. Data mining is used to sift through large amounts of
data, sometimes several terabytes of information. Without the use of a computer, a
person would never be able to analyze this much data and uncover trends using
algorithms and other mathematical and statistical procedures.

Choices "B", "C", and "D" are incorrect, per the above explanation.
17 Analysis of large and diverse amounts of data included in data warehouses is often
referred to as:

CalculatorTime Value Tables


A. Systems analysis.
B. Electronic Data Interchange (EDI).
C. Data mining.
D. Data processing.
Explanation

Choice "C" is correct. Data mining refers to the process of sifting through large amounts
of data, impossible to analyze by individuals, to search for relationships amongst
various data as a means for achieving strategic or competitive advantage.

Choice "A" is incorrect. Systems analysis is the analytical evaluation of the manner in
which systems process data.

Choice "B" is incorrect. Electronic Data Interchange (EDI) is the transfer of data
between various systems in machine-readable formats.

Choice "D" is incorrect. Data processing is a generic term that describes the methods
and systems used to collect and process data and produce outputs.
18 Which of these statements is correct regarding primary keys?

CalculatorTime Value Tables


A. Primary keys may be NULL if the foreign key exists.
B. Every primary key must match a foreign key on some other table.
C. A primary key may not be reused in other tables.
D. A primary key must be unique within a table.
Explanation

Choice "D" is correct. A primary key is the means by which the contents of a table are
grouped and referenced at the row level. A primary key must be unique within a table in
order to specifically identify a row within a single table of a relational database. The
primary key must not be NULL. The same primary key may be used in multiple tables.

Primary keys must be unique within a table in order to specifically identify a row within a
single table of a relational database.

Choice "A" is incorrect. Primary keys must not be NULL. The foreign key cannot match
a NULL primary key. A primary key must exist to be a unique identifier or for a foreign
key to match it.

Choice "B" is incorrect. Primary keys are not required to have associated foreign keys.
Foreign keys are used to create data relationships between tables; not every row of
data must have associated data on other tables.

Choice "C" is incorrect. Primary keys uniquely identify the rows within one table. Several
tables may each contain data, which is uniquely identified by the same primary key.
This occurs when data has several subgroups and a customer or process is more likely
to reference data within a subgroup than data from other subgroups. These subgroups
can form the basis for designing several tables within the relational database, each of
which uses the same primary key to uniquely identify its subset of data.
19 Which of the following SQL verbs is most frequently used when interacting with a
database?

CalculatorTime Value Tables


A. DELETE
B. UPDATE
C. SELECT
D. WITHDRAW
Explanation

Choice "C" is correct. The six foundational actions in SQL have keyword verbs
associated with them. SELECT retrieves data, INSERT adds new data, UPDATE
changes existing data, and DELETE removes data. CREATE is used to make new
tables to hold data, and DROP removes tables, and thereby the associated data.

Retrieving data from a database is the most common action taken over the lifetime of a
typical database or a typical database analyst. In most queries, analysts perform
several SELECT statements to identify a precise group of data, even if they go on to
UPDATE or DELETE.

Choice "A" is incorrect. DELETE is less frequently used than SELECT, because
business practices discourage the destroying of data. Rows no longer needed are given
an effective date showing that they apply to the past, and old data is archived, making
actual DELETEs rare.

Choice "B" is incorrect. UPDATE is less frequently used than SELECT, because
UPDATE requires the data to exist before being changed. SELECT must be used first to
identify the proper records to update.

Choice "D" is incorrect. WITHDRAW is not ANSI-standard SQL. While it may exist in
custom SQL installations, it would not be more commonly used than ANSI-standard
SQL.
20 The ACID test is used to manage changes in a relational database while maintaining
data integrity.

Which of the following does NOT accurately describe an ACID component?

CalculatorTime Value Tables


A. Atomic: every part of the change must succeed, or no change is made.
B. Consistent: all rules must be intact when the change is finished.
C. Isolated: only one change happens at a time.
D. Durable: the database persists, even if the computer is turned off.
Explanation

Choice "D" is correct. Relational Database Management Systems seek to have one
highly protected version of the data and maintain its integrity at all times. Applying the
ACID test is the primary change management technique to maintain data integrity. Its
components are: atomic (every part of the change must be successful, or no part of the
change is applied); consistent (all database rules, such as referential integrity, must be
intact when the change is finished); isolated (only one change happens at a time); and
durable (once a transaction is committed, it is permanent and unaffected by system
failure during processing).

Although "durable" is the "D" in ACID, the explanation of "durable" is incorrect.


Retaining data while powered down is a basic function of hardware, not of relational
database change management.

Choice "A" is incorrect. This is the correct definition of "atomic." Multi-part changes must
all be successful, or the database will reject the entire change request, even those
portions that had been successful individually.

Choice "B" is incorrect. This is the correct definition of "consistent." Relational Database
Management Systems have rules for data integrity, such as uniqueness of a primary
key. If a change is made to a primary key value, then all primary keys must be unique
after the change is made.

Choice "C" is incorrect. This is the correct definition of "isolated." Only one change
happens at a time. This prevents a "dirty-read" scenario where portions of an atomic
change occur and change the database, then a second user reads the changed data.
Next, the first user's later portions of the Atomic change fail, causing a rollback to the
pre-change state. This means that the data read by the second user never actually
existed in the database.
21 What is the ANSI-standard SQL wildcard character that stands for any number of
occurrences (including zero) of any characters?

CalculatorTime Value Tables


A. #
B. _
C. &
D. %
Explanation

Choice "D" is correct. SQL provides mechanisms, called wildcards, to enable loose
definitions of matching with the LIKE clause in queries. This allows the analyst to
specify which portions of the search parameter must match exactly and which portions
may be different and still satisfy the search.

The ANSI-standard SQL wildcard where any number (including zero) of characters can
be unspecified is the percent symbol (%). An SQL query written to find items LIKE
"version%" will find "Version1," "Version3," "Version10," "Versioning," and "Version
Control," but not "Tenth Version."

Choice "A" is incorrect. The character # is not an ANSI-standard SQL wildcard, but one
may find this character playing special roles in some SQL implementations, such as
designating temporary objects in SQL server.

Choice "B" is incorrect. The character _ is used in ANSI-standard SQL in the LIKE
clause to stand for one and only one occurrence of any character. The question
specifies any number of occurrences.

Choice "C" is incorrect. The character & is not an ANSI-standard SQL wildcard, but in
some SQL implementations, this character may be used to override a standard SQL
word (like SELECT) to use that word as a variable name.
22 Which of these statements is correct regarding foreign keys?

CalculatorTime Value Tables


A. A foreign key field may be NULL.
B. Every foreign key may have zero, one, or many matching primary keys from the same table.
C. A foreign key may not be reused in other tables.
D. A foreign key must be unique within a table.
Explanation

Choice "A" is correct. A foreign key serves as a data relationship identifier between two
tables within a relational database. When two tables contain data that needs to be
associated one with another, a new data attribute (the foreign key) is added to one of
the tables. This new data attribute references the primary key of the other table. In this
way, the contents of the second table can be selectively attached to the data from the
first table when needed. If a row from the first table has no associated data from the
second, then the foreign key attribute will be empty, or NULL.

Foreign keys may be NULL. This means that this data in the primary table is not
associated with data in the secondary table.

Choice "B" is incorrect. Foreign keys cannot match to zero primary keys. This condition
results in "orphaned" data, because the row in the secondary table cannot ever be
referenced by any primary data. It either doesn't belong in the database, or a
relationship that should exist has been broken. Foreign keys also cannot match many
primary keys, because primary keys must be unique within a table.

Choice "C" is incorrect. Foreign keys may be reused in other tables, if those tables
should be related to the first table.

Choice "D" is incorrect. Foreign keys are not required to be unique within a table, as
they do not uniquely identify a row in that table. The data that comprises the foreign key
also comprises the primary key of a different table, but that is a different environment
from where the foreign key resides.
23 A database contains two tables, one for orders and one for discounts. An analyst wants
to write a query that will return a list of all orders in the system. If a discount was used,
that data should be included. Assuming a relationship exists between the tables, how
should the two tables be joined?

CalculatorTime Value Tables


A. Orders LEFT OUTER JOIN Discount
B. Discount INNER JOIN Orders
C. Orders RIGHT OUTER JOIN Discount
D. Orders OUTER JOIN Discount
Explanation

Choice "A" is correct. SQL queries requiring information that is spread across multiple
tables in a relational database must JOIN those tables during the query, according to
the data relationship that exists between those tables. Besides the data relationship, the
JOIN type affects the information received. The INNER JOIN returns all records that
exist in both tables. The OUTER JOIN returns all records of a specified table (LEFT or
RIGHT), plus any matches from the unspecified table, or only those rows, which do not
match if neither LEFT nor RIGHT is specified.Orders LEFT OUTER JOIN Discount
means that all orders will be returned, and any discount data related to those orders.
Orders where no discount was used will show blank values in any discount fields.

Choice "B" is incorrect. Discount INNER JOIN Orders will return discount data and order
data only for those combinations where a discount and an order match. In this way, the
only order data returned by the query will be for those orders where discounts were
used, and this is less data than the analyst wants.

Choice "C" is incorrect. Orders RIGHT OUTER JOIN Discounts means that all discounts
will be returned, and any order data related to those discounts. This will return any
discounts that were never used, and this will fail to return any orders that used no
discounts.

Choice "D" is incorrect. Orders OUTER JOIN Discounts will return all discount data and
all order data that does not have associated data in the other table: orders that used no
discounts and discounts that were never used. This is not what the analyst wants.
24 A database contains two tables, one for customers and one for orders, and business
rules that say that neither table gets data unless it connects to some data in the other
table. An analyst wants a query that will test the integrity of the database relative to the
company's business rules. The analyst wants to know if there are any customers who
have not made an order, and if any orders exist without a customer. How should the two
tables be joined for such a query?

CalculatorTime Value Tables


A. Orders LEFT OUTER JOIN Customers
B. Customers INNER JOIN Orders
C. Orders RIGHT OUTER JOIN Customers
D. Customers OUTER JOIN Orders
Explanation

Choice "D" is correct. SQL queries requiring information that is spread across multiple
tables in a relational database must JOIN those tables during the query according to the
data relationship that exists between those tables. Besides the data relationship, the
JOIN type affects the information received. The INNER JOIN returns all records that
exist in both tables. The OUTER JOIN returns all records of a specified table (LEFT or
RIGHT) plus any matches from the unspecified table, or only those rows that do not
match if neither LEFT nor RIGHT is specified.

Customers OUTER JOIN Orders will return all customer data and all order data that
does not have associated data in the other table: customers who have placed no orders
and orders with no associated customer. This will test the database's compliance with
business rules.

Choice "A" is incorrect. Orders LEFT OUTER JOIN Customers means that all orders will
be returned, and any customer data related to those orders. This will only serve half of
the analyst's request.

Choice "B" is incorrect. Customers INNER JOIN Orders will return customer data and
order data only for those combinations where a customer and an order match. This is
the data that conforms to the business rules, and the analyst wants to know if there is
any data that does not conform.

Choice "C" is incorrect. Orders RIGHT OUTER JOIN Customers means that all
customers will be returned, and any order data related to those customers. This will
serve half of the analyst's request.
25 Which of the following statements describes the difference between the WHERE and
the HAVING clause in ANSI-standard SQL?

CalculatorTime Value Tables


A. There is no difference—either clause can be used interchangeably.
B. The WHERE clause can only reduce a query based on conditions that exist in the data, while
the HAVING clause can reduce a query based on summary or aggregate data.
C. There are no similarities between the WHERE and HAVING clauses.
D. The HAVING clause can only reduce a query based on conditions that exist in the data, while
the WHERE clause can reduce a query based on summary or aggregate data.
Explanation

Choice "B" is correct. Database analysts frequently do not want the entire table returned
by their query. To reduce the row-level query results, SQL has two clauses: WHERE
and HAVING. Using these clauses, an analyst can return only those rows from the
table(s) that satisfy specified conditions.

The WHERE clause is different from the HAVING clause, because the WHERE clause
limits the query output based on conditions that exist in the data (e.g., WHERE Price >
1000). The HAVING clause limits the query output based on conditions that are derived
from aggregated data (e.g., HAVING AveragePrice > 1000).

Choice "A" is incorrect. These two clauses are different; the WHERE clause cannot be
used with summary or aggregate data.

Choice "C" is incorrect. There are similarities between the two clauses. They both serve
to reduce the results of a query. The major difference is that the WHERE clause cannot
be used with summary or aggregate data.

Choice "D" is incorrect. The WHERE clause is the one that cannot be used with
summary or aggregate data; the HAVING clause can.
26 What is the basic format of an SQL data retrieval query?

CalculatorTime Value Tables


A. FROM {tables} SELECT {columns} WHERE {conditions}
B. SELECT {rows} FROM {tables} WHERE {conditions}
C. SELECT {columns} FROM {tables} WHERE {conditions}
D. WHERE {tables} SELECT {conditions} FROM {rows}
Explanation

Choice "C" is correct. To retrieve data from a database, analysts must specify the data
they want, the tables that contain those columns, and any other limiting conditions. SQL
is the language used to interact with the data in a database.

With a SELECT statement, analysts specify which columns of data they wish to retrieve.
The analysts must then specify which tables contain those columns using the FROM
clause. If the analysts do not want to retrieve the entire contents of a table, the WHERE
clause lists conditions that must be met for data to be retrieved by this query.

Choice "A" is incorrect. The SELECT clause must occur before the FROM clause in
SQL.

Choice "B" is incorrect. In SQL, the SELECT clause references data fields, which are
designated by the table's columns; it does not reference rows (the table's records)
directly.

Choice "D" is incorrect. The SELECT clause must occur before the FROM clause; the
WHERE clause must occur after both. In SQL, an analyst would SELECT columns
FROM tables WHERE conditions are met.
27 A database installation has a product table that lists the name and price of each product
sold by the company, along with inventory data. Which of the following SQL queries will
create a list of only those product names for products that cost more than $10?

CalculatorTime Value Tables


A. SELECT product.name, Product.price FROM Product WHERE Product.price > 10
B. SELECT name, FROM Price WHERE Price > 10
C. SELECT * FROM Product WHERE Price > 10
D. SELECT name FROM Product WHERE price > 10
Explanation

Choice "D" is correct. A basic SQL data retrieval query will SELECT {a list of specified
columns of data} FROM {a list of the tables which contain those columns} WHERE {any
specified conditions are met}. It is acceptable to use a column in the WHERE clause,
which does not appear in the SELECT clause, as long as all those columns come from
tables specified in the FROM clause.

The only requested column of data is the name of the product, so that is the only item in
the SELECT clause. Only one table (i.e., the product table) is needed to reference for
the one column in the SELECT clause. The condition for the WHERE clause is those
products more expensive than $10.

Choice "A" is incorrect. The analyst does not want the price values included in the
report. Specifying the source table along with each column reference is not necessary in
this case, but it is not harmful.

Choice "B" is incorrect. Price is not the name of the table that contains the data fields
"price" and "name."

Choice "C" is incorrect. The * symbol will retrieve all columns from the product table,
and the analyst requested only one column.
28 A decision tree analysis where subgroups of customers more likely to respond to
advertising are identified so that a company can selectively advertise only to those
customers. Which type of analytic modeling does this describe?

CalculatorTime Value Tables


A. Descriptive
B. Diagnostic
C. Predictive
D. Prescriptive
Explanation

Choice "D" is correct. This data analysis presumes that there is a relationship (or at
least a correlation) between the measured characteristics of a customer (such as age
and income) and whether or not they choose to purchase the company's product.

The analysis is performed so that the company may choose to take action prescribed by
the model to selectively advertise to those customers who are more receptive to the
company's products.

Choice "A" is incorrect. Decision tree analysis is not directed toward describing what
has occurred. The information present about who has purchased the company's
products and the associated data is not presented for its own value, but rather for what
29 Consider the following representation of a data set prepared for data mining.

Which of the following statements are true?

I. The grouping together of colored dots was done with regression analysis.
II. In order to assign grey dots to an existing cluster group, classification analysis must be used.
III
. Colored dots were assigned using cluster analysis.
IV
. In order to assign the grey dots to a group, the analyst must run a time-series analysis.
CalculatorTime Value Tables
A. III only.
B. II and IV only.
C. II and III only.
D. I and II only.

You might also like