DA-Interview Reference Material

What is SQL? How is it different from MySQL?
SQL stands for Structured Query Language. It is a language used to communicate with databases. MySQL
on the other hand is a RDBMS whereas SQL is a language.
What are the different types of joins in SQL?
There are four main types of joins in SQL:
* Inner join: Returns all rows from the two tables where the join condition is met.
* Outer join: Returns all rows from the left table, and the rows from the right table that match the join
condition.
* Left join: Returns all rows from the left table, and the matching rows from the right table.
* Right join: Returns all rows from the right table, and the matching rows from the left tab
• How will GROUP BY clause perform without an aggregate function?

When we use GROUP BY clause in the SELECT statement without using aggregate functions then it would
behave like DISTINCT clause. For example, we have the following table –
| id | Name | Address | Subject |
+------+---------+------------+------------+
| 101 | YashPal | Amritsar | History |
| 105 | Gaurav | Chandigarh | Literature |
| 125 | Raman | Shimla | Computers |
| 130 | Ram | Jhansi | Computers |
| 132 | Shyam | Chandigarh | Economics |
| 133 | Mohan | Delhi | Computers |
| 150 | Saurabh | NULL | Literature
mysql> Select DISTINCT ADDRESS from Student_info;
mysql> Select ADDRESS from Student_info GROUP BY Address;

Both queries work exactly in the same way. The only difference in between both the result sets returned by
MySQL that the result set returns by MySQL query using GROUP BY clause is sorted and in contrast, the
result set return by MySQL query using DISTICT clause is not sorted.
Are these statements correct.
• The GROUP BY clause is placed after the WHERE clause. (TRUE)

• The GROUP BY clause is placed before the ORDER BY clause. (TRUE)
GROUP BY goes before the ORDER BY statement because the latter operates on the final result of the
query.
Write a SQL statement selects all products with a price between 10 and 20. In addition; do not show products
with a CategoryID of 1,2, or 3. Table name is Products. Columns are Price and CategoryID.
SELECT * FROM Products

WHERE Price BETWEEN 10 AND 20
AND CategoryID NOT IN (1,2,3);
Write a SQL Query to get top 5 people from IT department.Lets say we have a customers table and there are
3 columns cust_name,cust_id,dept(Make use of Limit statement:
*Using limit statement.
SELECT * FROM Customers

WHERE dept = ‘IT’
LIMIT 5;
Write a SQL statement using rownum function for 3 employees who were hired earliest and the
result is displayed in increasing order of their hire date. Columns are employee_id, first_name
,last_name,email,contact_num, hire_date,department_id,salary.
Syntax : SELECT [column_list], ROWNUM

FROM (SELECT [column_list]
FROM table_name
ORDER BY Top-N_clolumn)
WHERE ROWNUM<=N;
SELECT ROWNUM as RANK, first_name, employee_id, hire_date

FROM (SELECT first_name, employee_id, hire_date
FROM Employee
ORDER BY hire_date)
WHERE ROWNUM<=3;
What will this query give me as a result:

SELECT *
FROM salesman a
CROSS JOIN customer b
WHERE a.city IS NOT NULL;
The above SQL query selects all columns (*) from the salesman table alias a and the customer table
alias b, and performs a cross join between the two tables. The query also includes a WHERE clause
that filters the results to only include rows from the salesman table where the 'city' column is not
null.
This means that the query will return all combinations of rows from the salesman table where the
'city' column is not null and the customer table, effectively creating a Cartesian product of the two
tables.
Suppose table A has 5 rows and table B has 6 rows. You perform a cross join on these two tables.
How many rows will it have?
30
Power BI Questions:
What is DAX?
DAX stands for Data Analysis Expressions. It's a collection of functions, operators, and constants used in
formulas to calculate and return values. In other words, it helps you create new info from data you already
have.
Tell about filter function and ALL function and the syntax for the same.
· FILTER function: An expression or table's subset is returned by the FILTER function.

FILTER(<table>,<filter>)
ALL Function: It can retrieve all values in a column or rows in a given table, overriding any
·
previous filters.
ALL(<table> or <column>)
Which data types are used for DAX?

Ans: Yes, DAX does support a few different data types. The seven types are whole numbers, decimal
numbers, boolean, text, date, currency, and N/A.
Can you have a table in the model which does not have any relationship with other tables?
Yes. There are two main reasons why you can have disconnected tables:
· The table is used to present the user with parameter values to be exposed and selected in slicers
· It uses the table as a placeholder for metrics in the user interface

Define bi-directional cross filtering.
Bidirectional cross-filtering lets data modelers to decide how they want their Power BI Desktop filters to flow
for data, using the relationships between tables. The filter context is transmitted to a second related table that
exists on the other side of any ogiven table relationship. This procedure helps data modelers solve the many-
to-many issue without having to complicated DAX formulas.
Explain responsive slicers in Power BI.

On a Power BI final report page, a developer can resize a responsive slicer to various sizes and shapes, and
the data collected in the container will be rearranged to find a match. If a visual report becomes too small to
be useful, an icon representing the visual takes its place, saving space on the report page.
What are pre requisite of append and union all in power bi?
Data types and no of columns should be same. Good if column name is same.
Improving Powerbi report performance:

Refer the below mentioned Link
https://zebrabi.com/power-bi-performance-optimization/
Excel interview questions:

What is the difference between count, counta, and countblank?
count: Calculate the total number of cells containing only numeric values. Cells with string values, special characters,
or blank cells will be ignored.
counta: Calculate the total number of cell containing any content. Blank cells will be ignored.
countblank: It only counts the amount of blank cells, as the name implies. Content-filled cells will not be taken into
account.
Is it possible to create a Pivot Table using multiple sources of data?
Ans: Yes, a pivot table may be made from several sheets. There must be a common row in both tables for
this to work. This will be the first table's primary key and the second table's foreign key. Create a link
between the tables before constructing the pivot table.
Let us suppose you need to put the Pack size (Column C) values in different buckets.
Pack size less than equal to 500 then “SMALL PACK”
Pack size between 500 and 2000 then “MEDIUM PACK”
Pack size between 1000 and 2000 then “LARGE PACK”
Anything above 2000 then “PACKAGE”
In this case you will be putting 3 IF statements and putting the conditions accordingly.
Tableau interview questions:
What is a parameter in Tableau?
The parameter is a variable (numbers, strings, or date) created to replace a constant value in calculations,
filters, or reference lines. For example, you create a field that returns true if the sales are greater than 30,000
and false if otherwise. Parameters are used to replace these numbers (30000 in this case) to dynamically set
this during calculations. Parameters allow you to dynamically modify values in a calculation. The
parameters can accept values in the following options:
• All: Simple text field

• List: List of possible values to select from
• Range: Select values from a specified range
Tell me the different connections to make with a dataset?
There are two types of data connections in Tableau:
LIVE: Live connection is a dynamic way to extract real-time data by directly connecting to the data
source. Tableau directly creates queries against the database entries and retrieves the query results in a
workbook.
EXTRACT: A snapshot of the data, extract the file (.tde or .hyper file) contains data from a relational
database. The data is extracted from a static source of data like an Excel Spreadsheet. You can schedule to
refresh the snapshots which are done using the Tableau server. This doesn’t need any connection with the
database
What are LOD expressions?
LOD expressions are used to perform aggregations that are more granular than the view's original level of
aggregation. There are three types of LOD expressions: FIXED, INCLUDE, and EXCLUDE.
What is the use of Data source filterTableau?
• Data Source Filter: This filter refrains users from viewing sensitive information and thus reduces data feeds.
In what situation will you prefer to use a treemap over a heat map?
When we have to deal with large quantitative values having hierarchically structured data, we can prefer
treemaps. Each rectangular set on the same hierarchy level denotes a data table column.
Mention some significant ways of improving Tableau's performance.
There are different ways of improving Tableau's performance. Some well-known techniques are:
· We can minimize the scope of data and keep only the data that we need for our visualization. It
will eventually decrease the volume of data making Tableau's processing faster.
· To run our Tableau workbook faster, we can use the Extract.
· We can avoid using strings while dealing with numbers and prefer using Boolean and integer
values. It is because they are faster than strings.
· We can hide unnecessary or unused fields.
· We can also eliminate needless calculations and sheets.
Python Questions:
Which Python libraries are used for data analysis?
The reason for Python's popularity is its extensive collection of libraries. These libraries include various
functionalities and tools to analyze and manage data. The popular Python libraries for data science are:
• SciPy
• NumPy
• Pandas
• Matplotlib
• PyTorch
• Scrapy
• BeautifulSoup
Get a good hands-on knowledge on these libraries and understand how do you manage data and do data related
operations on these libraries.
In pandas what are the data structures you deal with?

• Series - It is a one-dimensional array-like structure with homogeneous data which means data of different
data types cannot be a part of the same series. It can hold any data type such as integers, floats,
and strings and its values are mutable i.e. it can be changed but the size of the series is immutable i.e. it
cannot be changed.
• DataFrame - It is a two-dimensional array-like structure with heterogeneous data. It can contain data of
different data types and the data is aligned in a tabular manner. Both size and values of DataFrame are
mutable
• How to Create an empty Series:
The simplest series that can be created is an empty series. The Series() function of Pandas is used to create a series
of any kind.
Code Example 1:
# import pandas as pd
import pandas as pd
# Creating empty Series

ser = pd.Series()
How to create a series from an array: Pandas is built on top of the Numpy library. In order to create a series from
the NumPy array, we have to import the NumPy module and have to use numpy.array() the function.
Code Example 2:
# import pandas as pd
import pandas as pd
# import numpy as np
import numpy as np
# simple array
data = np.array(['s', 'c', 'a', 'l', 'a','r'])
ser = pd.Series(data)
How to get frequency count of unique items in a Pandas DataFrame?

In order to get the frequency count of unique items in a Pandas DataFrame we can use
the Series.value_counts() method.
# importing the module

import pandas as pd
# creating the series

s = pd.Series(data = [1,2,3,4,3,5,3,7,1])
# displaying the series

print(s)
# finding the unique count
print(s.value_counts())
Create a 1D array with values ranging from 0 to 9.

import numpy as np
arr = np.arange(10)
print(arr)
Statistics Questions:
How is the statistical significance of an insight assessed?
Hypothesis testing is used to find out the statistical significance of the insight. To elaborate, the null hypothesis and the alternate
hypothesis are stated, and the p-value is calculated.
After calculating the p-value, the null hypothesis is assumed true, and the values are determined. To fine-tune the result, the alpha
value, which denotes the significance, is tweaked. If the p-value turns out to be less than the alpha, then the null hypothesis is
rejected. This ensures that the result obtained is statistically significant.
What is an outlier? How can outliers be determined in a dataset?
Outliers are data points that vary in a large way when compared to other observations in the dataset. Depending on the learning
process, an outlier can worsen the accuracy of a model and decrease its efficiency sharply.
Outliers are determined by using two methods:
· Standard deviation/z-score
· Interquartile range (IQR)
State the case where the median is a better measure when compared to the
mean.
In the case where there are a lot of outliers that can positively or negatively skew data, the median is preferred as it provides an
accurate measure in this case of determination.

DA-Interview Reference Material

Uploaded by

DA-Interview Reference Material

Uploaded by

What is SQL? How is it different from MySQL?

What are the different types of joins in SQL?

There are four main types of joins in SQL:

• How will GROUP BY clause perform without an aggregate function?

mysql> Select DISTINCT ADDRESS from Student_info;

mysql> Select ADDRESS from Student_info GROUP BY Address;

Are these statements correct.

• The GROUP BY clause is placed after the WHERE clause. (TRUE)

SELECT * FROM Products

*Using limit statement.

SELECT * FROM Customers

Syntax : SELECT [column_list], ROWNUM

SELECT ROWNUM as RANK, first_name, employee_id, hire_date

What will this query give me as a result:

· FILTER function: An expression or table's subset is returned by the FILTER function.

Which data types are used for DAX?

· It uses the table as a placeholder for metrics in the user interface

Explain responsive slicers in Power BI.

Improving Powerbi report performance:

Excel interview questions:

What is a parameter in Tableau?

• All: Simple text field

Tell me the different connections to make with a dataset?

There are two types of data connections in Tableau:

What are LOD expressions?

What is the use of Data source filterTableau?

Which Python libraries are used for data analysis?

In pandas what are the data structures you deal with?

# Creating empty Series

How to get frequency count of unique items in a Pandas DataFrame?

# importing the module

# creating the series

# displaying the series

# finding the unique count

Create a 1D array with values ranging from 0 to 9.

How is the statistical significance of an insight assessed?

hypothesis are stated, and the p-value is calculated.

rejected. This ensures that the result obtained is statistically significant.

What is an outlier? How can outliers be determined in a dataset?

Outliers are determined by using two methods:

· Interquartile range (IQR)

accurate measure in this case of determination.

You might also like