January All SQL Questions Compiled 1682631354
January All SQL Questions Compiled 1682631354
PROBLEM STATEMENT: Find the email activity rank for each user. Email activity rank is defined by the total
number of emails sent. The user with the highest number of emails sent will have a rank of 1, and so on.
Output the user, total emails, and their activity rank. Order records by the total emails in descending order.
Sort users with the same number of emails in alphabetical order. In your rankings, return a unique value (i.e., a
unique rank) even if multiple users have the same number of emails.
google_gmail_emails
Field Type
id int
from_user varchar
to_user varchar
day int
SAMPLE TABLE:
MENTAL APPROACH:
1. Find different users who have sent the emails and count the number of times same user has sent the
email.
2. This will helps us to find the distinct users who have sent emails and the count of emails they have sent.
3. Now we have to provide rank on the basis of a number of emails and the user name who has sent the
email.
NOTE: Here, we need to rank on the basis of descending order of a number of emails and ascending
order of user name. This was done so that each user is ranked with different numbers not the same even
though they are having the same number of emails sent.
QUERY EXECUTION:
1. Firstly FROM command will get executed and it will fetch the records from the table
google_gmail_emails.
2. In the second step GROUP BY on the basis of from_user will be executed.
3. In third step AGGREGATE FUNCTION (here COUNT()) will be executed.
4. After that WINDOW FUNCTION (here DENSE_RANK()) on the basis of the OVER clause will be executed.
5. Finally SELECT clause will get executed.
6. So, after this we will get our final output which will display from_user, number of emails sent and rank
provided.
SAMPLE OUTPUT:
from_user emails_sent activity_rank
32ded68d89443e808 19 1
ef5fe98c6b9f313075 19 2
5b8754928306a18b68 18 3
55e60cfcc9dc49c17e 16 4
91f59516cb9dee1e88 16 5
6edf0be4b2267df1fa 15 6
7cfe354d9a64bf8173 15 7
cbc4bd40cd1687754 15 8
e0e0defbb9ec47f6f7 15 9
8bba390b53976da0cd 14 10
a84065b7933ad01019 14 11
Problem Statement: Find the average effectiveness of each advertising channel in the period from
2017 to 2018 (both included). The effectiveness is calculated as the ratio of total money spent to total
customers aquired.
Output the advertising channel along with corresponding average effectiveness. Sort records by the
average effectiveness in ascending order.
uber_advertising
Field Type
year int
advertising_channel varchar
money_spent int
customer_acquired int
Query:
Output:
advertising_channel average_effectiveness
radio 56
tv 58
busstops 82
celebrities 105
billboards 179
buses 827
PROBLEM STATEMENT: Meta/Facebook is developing a search algorithm that will allow users to search
through their post history. You have been assigned to evaluate the performance of this algorithm.
We have a table with the user's search term, search result positions, and whether or not the user clicked on
the search result.
Write a query that assigns ratings to the searches in the following way:
• If the search was not clicked for any term, assign the search with rating=1
• If the search was clicked but the top position of clicked terms was outside the top 3 positions, assign the
search a rating=2
• If the search was clicked and the top position of a clicked term was in the top 3 positions, assign the search a
rating=3
As a search ID can contain more than one search term, select the highest rating for that search ID. Output the
search ID and it's highest rating.
Example: The search_id 1 was clicked (clicked = 1) and it's position is outside of the top 3 positions
(search_results_position = 5), therefore it's rating is 2.
FIELD TYPE
search_id int
search_term varchar
clicked int
search_results_position int
MENTAL APPROACH:
1. First we will provide rating for each of the search id present in the given table as per the conditions
provided to us.
2. Now find the maximum rating for distinct search id and get the desired output.
NOTE: There are many same search id in the table so for distinct search id we will find the maximum
rating.
QUERY EXPLANATION:
1. With CASE WHEN we are giving rating to each search_id as per the given condition.
search_id rating
1 2
2 2
2 2
3 3
3 2
5 3
2. We are also using MAX so that we can get maximum rating out of it and grouping them on the basis of
search id.
search_id rating
1 2
2 2
3 3
5 3
Problem Statement: You are given a table, BST, containing two columns: N and P, where N represents the
value of a node in Binary Tree, and P is the parent of N.
Write a query to find the node type of Binary Tree ordered by the value of the node. Output one of the
following for each node:
• Root: If node is root node.
• Leaf: If node is leaf node.
• Inner: If node is neither root nor leaf node.
Sample Input
Sample Output
1 Leaf
2 Inner
3 Leaf
5 Root
6 Leaf
8 Inner
9 Leaf
SOLUTION
Mental Approach:
1. If N (node ) is not having any value in P (parent node) that means it will be a Root node.
2. If for N (node) we are having both P (parent node) and if it is itself P (parent node) for other N (nodes)
that means it is an Inner node.
3. Now after doing the above steps we will be left with Leaf Node. Actually, if N (node) is not P (parent
node) of any other N (nodes) then we can say it is a Leaf node.
QUERY:
2. In the first condition of the CASE statement we are simply searching for all Root nodes. As we have given
the condition that if the parent node is null then it will be the Root node.
3. In the second condition we are simply searching for all Leaf nodes. Here, we have given the condition
that N (node) should not be present in the parent node column.
Note: Here we have excluded values for P (parent node) that is having Null values because with NOT IN
we can't compare the null values.
4. In ELSE I am simply giving Inner because after both conditions are fulfilled we will be left with Inner
nodes only.
N Type N P Type
1 Leaf 1 2 Leaf
2 Inner 2 4 Inner
3 Leaf 3 2 Leaf
4 Inner 4 15 Inner
5 Leaf 5 6 Leaf
6 Inner 6 4 Inner
7 Leaf 7 6 Leaf
8 Leaf 8 9 Leaf
9 Inner 9 11 Inner
10 Leaf 10 9 Leaf
11 Inner 11 15 Inner
12 Leaf 12 13 Leaf
13 Inner 13 11 Inner
14 Leaf 14 13 Leaf
15 Root 15 NULL Root
Introduction
Danny seriously loves Japanese food so in the beginning of 2021, he decides to embark
upon a risky venture and opens up a cute little restaurant that sells his 3 favourite foods:
sushi, curry and ramen.
Danny’s Diner is in need of your assistance to help the restaurant stay afloat - the
restaurant has captured some very basic data from their few months of operation but have
no idea how to use their data to help them run the business.
Problem Statement
Danny wants to use the data to answer a few simple questions about his customers,
especially about their visiting patterns, how much money they’ve spent and also which
menu items are their favourite. Having this deeper connection with his customers will help
him deliver a better and more personalised experience for his loyal customers.
He plans on using these insights to help him decide whether he should expand the
existing customer loyalty program - additionally he needs help to generate some basic
datasets so his team can easily inspect the data without needing to use SQL.
Danny has provided you with a sample of his overall customer data due to privacy issues -
New Section 12 Page 1
Danny has provided you with a sample of his overall customer data due to privacy issues -
but he hopes that these examples are enough for you to write fully functioning SQL
queries to help him answer his questions!
Danny has shared with you 3 key datasets for this case study:
• sales
• menu
• members
You can inspect the entity relationship diagram and example data below.
Table 1: sales
The sales table captures all customer_id level purchases with an
corresponding order_date and product_id information for when and what menu items were
ordered.
Table 2: menu
Table 3: members
The final members table captures the join_date when a customer_id joined the beta version
of the Danny’s Diner loyalty program.
10. In the first week after a customer joins the program (including their join date) they earn 2x
points on all items, not just sushi - how many points do customer A and B have at the end
of January?
--1. What is the total amount each customer spent at the restaurant?
SELECT s.customer_id,
SUM([Link]) total_amount
FROM sales s
LEFT JOIN menu m
ON s.product_id=m.product_idsq
GROUP BY s.customer_id
ORDER BY s.customer_id;
customer_id total_amount
A 76
B 74
C 36
customer_id no_of_days_visited
A 4
B 6
C 2
--[Link] was the first item from the menu purchased by each customer?
SELECT customer_id,
STRING_AGG(product_name,' , ') most_purchased_product
FROM (SELECT DISTINCT s.customer_id,
m.product_name,
RANK() OVER (PARTITION BY s.customer_id ORDER BY order_date)
rank
FROM sales s
LEFT JOIN menu m
ON s.product_id=m.product_id
) subquery
WHERE rank=1
GROUP BY customer_id;
customer_id most_purchased_product
A curry , sushi
B curry
C ramen
--4. What is the most purchased item on the menu and how many times was it
product_name number_of_times_purchased
ramen 8
--5. Which item was the most popular for each customer?
SELECT customer_id
,STRING_AGG(product_name,' , ') popular_dish
FROM (
SELECT s.customer_id
,m.product_name
,COUNT(1) no_of_orders
,RANK() OVER (PARTITION BY s.customer_id ORDER BY COUNT(1) DESC) rnk
FROM sales s
INNER JOIN menu m
ON s.product_id=m.product_id
GROUP BY s.customer_id
,m.product_name
) subquery
WHERE rnk=1
GROUP BY customer_id;
customer_id popular_dish
A ramen
B sushi , curry , ramen
C ramen
--6. Which item was purchased first by the customer after they became a
member?
WITH rank_cte AS (
SELECT s.customer_id,
m.product_name,
s.order_date,
DENSE_RANK() OVER (PARTITION BY s.customer_id ORDER BY order_date
ASC) rnk
FROM sales s
INNER JOIN menu m
ON s.product_id=m.product_id
INNER JOIN members mb
ON s.customer_id=mb.customer_id
WHERE s.order_date>=join_date
)
SELECT customer_id,
product_name
FROM rank_cte
WHERE rnk =1
--[Link] item was purchased just before the customer became a member?
WITH rank_cte AS (
SELECT s.customer_id,
m.product_name,
s.order_date,
DENSE_RANK() OVER (PARTITION BY s.customer_id ORDER BY order_date
ASC) rnk
FROM sales s
INNER JOIN menu m
ON s.product_id=m.product_id
INNER JOIN members mb
ON s.customer_id=mb.customer_id
WHERE s.order_date<join_date
)
SELECT customer_id,
STRING_AGG(product_name,' , ') items_puchased_before_joining
FROM rank_cte
WHERE rnk =1
GROUP BY customer_id;
customer_id items_puchased_before_joining
A sushi , curry
B curry
--[Link] is the total items and amount spent for each member before they
became a member?
SELECT s.customer_id,
COUNT(1) total_items,
SUM([Link]) amount_spent
FROM sales s
INNER JOIN menu m
ON s.product_id=m.product_id
INNER JOIN members mb
ON s.customer_id=mb.customer_id
WHERE s.order_date<join_date
GROUP BY s.customer_id;
customer_id points
A 860
B 940
C 360
--[Link] the first week after a customer joins the program (including their
join date) they earn 2x points on all items,
--not just sushi - how many points do customer A and B have at the end of
January?
SELECT s.customer_id,
SUM(CASE WHEN s.order_date BETWEEN mb.join_date AND
DATEADD(DAY,DATEDIFF(DAY,0,mb.join_date),6) THEN [Link]*20
WHEN m.product_name='Sushi' THEN [Link]*20
ELSE [Link]*10
END) points
FROM sales s
INNER JOIN members mb
ON s.customer_id=mb.customer_id
INNER JOIN menu m
ON s.product_id=m.product_id
WHERE MONTH(order_date)=1
GROUP BY s.customer_id;
customer_id points
A 1370
B 820
PROBLEM STATEMENT: Julia asked her students to create some coding challenges. Write a query to print
the hacker_id, name, and the total number of challenges created by each student. Sort your results by the
total number of challenges in descending order. If more than one student created the same number of
challenges, then sort the result by hacker_id. If more than one student created the same number of challenges
and the count is less than the maximum number of challenges created, then exclude those students from the
result.
Input Format
The following tables contain challenge data:
• Hackers: The hacker_id is the id of the hacker, and name is the name of the hacker.
• Challenges: The challenge_id is the id of the challenge, and hacker_id is the id of the student who created the
challenge.
Sample Input 0
Hackers Table:
Challenges Page 1
Challenges Table:
Sample Output 0
21283 Angela 6
88255 Patrick 5
96196 Lisa 1
Sample Input 1
Challenges Page 2
Sample Input 1
Hackers Table:
Challenges Table:
Challenges Page 3
Sample Output 1
12299 Rose 6
34856 Angela 6
79345 Frank 4
80491 Patrick 3
81041 Lisa 1
Explanation
For Sample Case 0, we can get the following details:
Students and both created challenges, but the maximum number of challenges created is so these students
are excluded from the result.
For Sample Case 1, we can get the following details:
Students and both created challenges. Because is the maximum number of challenges created, these
students are included in the result.
MENTAL APPROACH:
1. First count the number of challenges created by each hacker.
2. Now look for hackers having the same number of challenges created by them.
3. Find the maximum number of challenges that were created by any hacker in the whole list.
4. Let us find the hackers who have created the same number of challenges.
5. If multiple hackers created the same number of challenges and it is not equal to the maximum number
of challenges created then we will drop those records.
QUERY:
Challenges Page 4
QUERY:
QUERY EXPLANATION:
1. In CTE, we count the number of challenges created by each hacker and rank them on the basis of the
number of challenges created by each hacker in descending order.
Here, we gave descending order so that we can rank the higher number of challenges created at the top.
Challenges Page 5
What it will give as output:
4. Now join (using UNION) the first SELECT query to the second SELECT query where we are querying for
the records whose rank is greater than 1 and using a subquery inside WHERE clause to filter out those
records where the COUNT of the number of challenges created by different hacker is same
Basically, we are counting the number of challenges created and filtering where this count is equal to 1.
It will give a count of the number of challenges created which is appearing only one time.
5. At last we are ordering them on the basis of descending order of number of challenges created by each
hacker and if it is same then on the basis of ascending order of hacker id.
Challenges Page 6
What it will give as output (without UNION) :
Challenges Page 7
BY MANISH KUMAR CHAUDHARY
PROBELEM STATEMENT: Write a query that identifies cities with higher than average home prices when
compared to the national average. Output the city names.
Table : zillow_transactions
FIELD TYPE
id int
state varchar
city varchar
street_address varchar
mkt_price int
MENTAL APPROACH:
1. First, we must determine the national average. This can be accomplished by calculating the average of all
mk_prices.
2. Now compute the average of mkt_prices for each city.
It will assist us in determining the average home prices in each city.
3. Now compare them and list the records where the average home price in the city exceeds the national
average home price.
QUERY:
QUERY EXPLANATION:
1. We only need the city name as output, thus city is in the SELECT statement.
2. We have now grouped them by city so that we can receive records for each particular city.
3. Using the having clause, we can only have records where each city's average home price is more than the
national average home price.
Basically, in execution, AVG(mkt price) will be generated for each city and then compared to AVG(mkt
price) calculated via a subquery.
The subquery will return the national or overall average home price in this case.
WHAT HAPPENS IN BACKGROUND:
INSIGHTS:
1. We can see from the table above that just three cities have average home prices that are higher than the national
average home price.
FINAL OUTPUT:
city
Mountain View
San Francisco
Santa Clara
Visit my LinkedIn page by clicking here for more answers to interview questions.
19 January 2023 18:39
FIELD TYPE
business_id int
business_name varchar
business_address varchar
business_city varchar
business_state varchar
business_postal_code float
business_latitude float
business_longitude float
business_location varchar
business_phone_number float
inspection_id varchar
inspection_date datetime
inspection_score float
inspection_type varchar
violation_id varchar
violation_description varchar
risk_category varchar
MENTAL APPROACH:
1. We will simply find the particular words like 'restaurant' , 'cafe' , etc in business name and
write its business type according to this information.
QUERY:
QUERY EXPLANATION:
1. This is a classification problem so we are use CASE WHN for classifying the business type for
each business.
Here, in CASE statement we are using LIKE operator to search for particular word so that we
can find the business type of that particular business.
2. As there were multiple business with same name, we want only distinct ones.
Here, for getting distinct names we are simply using GROUP BY business_name instead of
DISTINCT keyword in SELECT clause because DISTINCT take more execution time as compare
to GROUP BY.
SAMPLE OUTPUT:
business_name business_type
24 Hour Fitness Club, #273 other
24th and Folsom Eatery other
Akira Japanese Restaurant restaurant
Allstars Cafe Inc cafe
Andersen Bakery other
Angel Cafe and Deli cafe
Annie's Hot Dogs & Pretzels other
Antonelli Brothers Meat, Fish, and Poultry Inc. other
AT&T - COMMISARY KITCHEN [145184] other
AT&T Park - Coffee and Ice Cream (5A+5B) cafe
Azalina's other
BALBOA HIGH SCHOOL school
PROBLEM STATEMENT: You did such a great job helping Julia with her last coding contest
challenge that she wants you to work on this one, too!
The total score of a hacker is the sum of their maximum scores for all of the challenges. Write a query
to print the hacker_id, name, and total score of the hackers ordered by the descending score. If more
than one hacker achieved the same total score, then sort the result by ascending hacker_id. Exclude
all hackers with a total score of from your result.
Input Format
The following tables contain contest data:
• Hackers: The hacker_id is the id of the hacker, and name is the name of the hacker.
• Submissions: The submission_id is the id of the submission, hacker_id is the id of the hacker who
made the submission, challenge_id is the id of the challenge for which the submission belongs to,
and score is the score of the submission.
Sample Output
4071 Rose 191
74842 Lisa 174
84072 Bonnie 100
4806 Angela 89
26071 Frank 85
80305 Kimberly 67
49438 Patrick 43
MENTAL APPROACH:
1. Firstly combine both table on the basis of common column.
2. Now we will find hacker id and name with their total score.
NOTE: Here, we are told to find total score. So, to find it we will add all the scores for each hacker for
different challenge. For hacker who have submitted same challenge multiple time we will find the
maximum marks obtained in that particular challenge.
3. After this we will order them on the basis of decreasing order of total scores.
4. If they are having same total scores then we will order them on the basis of ascending order of
hacker_id.
QUERY:
2. Now we will query hacker id, hacker name and SUM of maximum scores from the CTE so that for a
particular hacker we can find the overall total score scored by them for all the challenges.
Introduction
Danny seriously loves Japanese food so in the beginning of 2021, he decides to embark
upon a risky venture and opens up a cute little restaurant that sells his 3 favourite foods:
sushi, curry and ramen.
Danny’s Diner is in need of your assistance to help the restaurant stay afloat - the
restaurant has captured some very basic data from their few months of operation but have
no idea how to use their data to help them run the business.
Problem Statement
Danny wants to use the data to answer a few simple questions about his customers,
especially about their visiting patterns, how much money they’ve spent and also which
menu items are their favourite. Having this deeper connection with his customers will help
him deliver a better and more personalised experience for his loyal customers.
He plans on using these insights to help him decide whether he should expand the
existing customer loyalty program - additionally he needs help to generate some basic
datasets so his team can easily inspect the data without needing to use SQL.
Danny has provided you with a sample of his overall customer data due to privacy issues -
New Section 12 Page 1
Danny has provided you with a sample of his overall customer data due to privacy issues -
but he hopes that these examples are enough for you to write fully functioning SQL
queries to help him answer his questions!
Danny has shared with you 3 key datasets for this case study:
• sales
• menu
• members
You can inspect the entity relationship diagram and example data below.
Table 1: sales
The sales table captures all customer_id level purchases with an
corresponding order_date and product_id information for when and what menu items were
ordered.
Table 2: menu
Table 3: members
The final members table captures the join_date when a customer_id joined the beta version
of the Danny’s Diner loyalty program.
10. In the first week after a customer joins the program (including their join date) they earn 2x
points on all items, not just sushi - how many points do customer A and B have at the end
of January?
PROBLEM STATEMENT: Find the top 10 users that have traveled the greatest distance. Output their id, name and
a total distance traveled.
lyft_rides_log lyft_users
FIELD TYPE FIELD TYPE
id int id int
user_id int name varchar
distance int
MENTAL APPROACH:
1. For each user look for the distance traveled by them and sort them in descending order according to
distance.
2. Now pick the top 10 users who have traveled the maximum distance.
QUERY:
OUTPUT:
id name distance
144 Barbara Larson 98
197 Christina Price 98
173 Crystal Berg 96
133 Christopher Schmitt 96
185 Kimberly Potter 95
115 Pamela Cox 94
147 Barbara Larson 94
101 Patrick Gutierrez 93
154 Dennis Douglas 93
158 Sean Parker 92
PROBLEM STATEMENT: Find the number of matches played by each team along with the number of matches
won and matches lost by each team.
icc_world_cup
In table icc_world_cup Team_1 and Team_2 are teams who are playing against each other and the winner
column lists those who won that match.
MENTAL APPROACH:
1. To get all teams who played in icc_world_cup we need to get team names from both team_1
and team_2.
2. Now we need to choose only distinct team as there can be duplicate teams.
There duplicate because the same team is playing against different teams.
3. For total matches played by each team we can simply go through both teams and see which team played
how many times.
4. To count the number of matches wins we can count from the winner column.
5. Now for matches lost we can subtract matches won from the total matches played.
QUERY:
icc_world_cup Page 1
QUERY EXPLANATION:
1. In CTE we are using union to join team_1 and team_2 in one column.
Basically, this will help us to find total matches played and also it will help to get matches won and lost
by each team.
team winner
India India
SL Aus
SA Eng
Eng NZ
Aus India
SL India
Aus Aus
Eng Eng
NZ NZ
India India
2. Now in SELECT clause after CTE we are querying for team, COUNT() and SUM with CASE WHEN
statement.
Here, COUNT(team) will helps to find the total matches played by each team.
Here, CASE WHEN will help us to flag 1 if team is same as team in winner column else flag 0 and then
we are summing this as it will give matches won by each team.
Here, CASE WHEN will help us to flag 1 if team is not same as team in winner column else flag 0 and
then we are summing this as it will give matches lost by each team.
icc_world_cup Page 2
QUERY EXPLANATION:
1. In CTE we are flagging 1 for win and 0 for loss along with joining team_1 and team_2 using UNION ALL in
one single column.
2. After CTE we are SELECTING the required output from the above CTE i.e COUNT for total matches played
and SUM of win_flag column to get total matches won by a team and last difference between both to get
matches lost by each team.
icc_world_cup Page 3
INSERT INTO icc_world_cup values('SA','Eng','Eng');
INSERT INTO icc_world_cup values('Eng','NZ','NZ');
INSERT INTO icc_world_cup values('Aus','India','India');
For more such questions visit Ankit Sir's Youtube channel by clicking here.
For more such questions explained in written visit my LnkedIn profile by clicking here.
icc_world_cup Page 4
09 January 2023 11:14
Problem Statement: Write a query that'll identify returning active users. A returning active user is a user that
has made a second purchase within 7 days of any other of their purchases. Output a list of user_ids of these
returning active users.
amazon_transactions
Field Type
id int
user_id int
item varchar
created_at datetime
revenue int
Sample Table :
Mental Approach:
1. For finding users who bought the next product within 7 days of buying a product we need to find the
difference between these dates.
2. We need to find the difference of two consecutive buying dates (dates must be sorted in ascending
order) for the same user_id. (In SQL we will do this by using the LEAD window function within the
DATEDIFF Function to find no day difference)
3. After getting the difference we will now need to filter for the difference that is less than 7 days. (In SQL
we will do it by using a subquery to find the difference and filtering out in outer query)
4. There are more than 2 products bought consecutively so to get unique users we need to choose unique
records of user_id. (In SQL we do this using DISTINCT)
user_id diff
100 6
100 5
100 11
100
109 1
109 19
109
113
returning_active_users
100
109
returning_active_users
100
103
105
109
110
111
112
114
117
120
122
128
129
130
131
133
141
143
150
Problem Statement: Write a query to find which gender gives a higher average review score when writing
reviews as guests. Use the from_type column to identify guest reviews. Output the gender and their average
review score.
airbnb_reviews
airbnb_guests
FIELD TYPE
FIELD TYPE
from_user int
guest_id int
to_user int
nationality varchar
from_type varchar
gender varchar
to_type varchar
age int
review_score int
Mental Approach
1. Joining the tables.
2. Finding the review_score by filtering through from_user for guest users only .(in SQL we do it using WHERE clause)
3. Now for both gender we need to find the Average of review_score. (In SQL we do it using GROUP BY clause)
Output
gender avg_review_score
M 5.526
F 5.009
Page 1
08 January 2023 07:49
RIGHT(): This function returns specified number of characters from the right side of given expression.
SYNTAX:
LEFT(): This function returns specified number of characters from the left side of given expression.
SYNTAX:
The Name column only contains uppercase (A-Z) and lowercase (a-z) letters.
Sample Input
Sample Output
Ashley
Julia
Belvet
Explanation
Only Ashley, Julia, and Belvet have Marks >75 . If you look at the last three characters of each of their
names, there are no duplicates and 'ley' < 'lia' < 'vet'.
PROBLEM STATEMENT: Given the users purchase history write a query to print users who have done purchase
on more than 1 day and products purchased on a given day are never repeated on any other day.
MENTAL APPROACH:
1. Find the users who have purchased on different dates.
2. After this now we need to find the users who have purchased a product that was not bought on any
other day by that particular user.
STEPWISE SOLUTION
1. To get the users who have bought on different days we are using the HAVING clause so that we can filter
out those users who have bought only on one particular day.
userid count_of_days
1 2
2 2
4 2
2. Now for getting non-repeated products purchased by the user we are using AND clause with the HAVING
clause for extra filtering.
Here, we are comparing the COUNT of distinct productid (it will give the count of distinct products
bought by a particular user) and COUNT of productid (it will give a count of all products bought by a
user). If both matches that means all products are distinct else the product is repeated on any day by
that particular user and thus will be filtered out.
Output:
OUTPUT:
userid
1
4
Table : sf_transactions
FIELD TYPE
id int
created_at datetime
value int
purchased_id int
MENTAL APPROACH:
1. Change the created_at to the required format and group them. We will be given year and month specific
dates.
2. Using the above formula, determine the month-over-month percentage change in revenue.
QUERY:
QUERY EXPLANATION:
1. With CTE, we change the format of the create at date column to YYYY-MM and retrieve the prior
month's salary using the LAG and SUM functions.
We're using the SUM function here to get the total revenue for that month.
2. With the SELECT statement following the CTE, we are simply utilizing a formula to determine the
month-over-month % change in revenue for all months and rounding it to two decimal points.
In this case, we're using SUM in our calculation since we want to group them by year_month.
SOLVING IT IN EXCEL
STEPS:
1. To get the YYYY-MM date format from the created_at column, first add a column called year month.
2. Now, create a pivot table using the entire table.
3. Place the newly created year_month column in the rows field.
4. Put the revenue in values field. (It will automatically add them up based on year_month.)
5. However, under the value field settings, change the name to revenue_diff_pct in the custom name field,
and then go to the show value as tab.
6. Select % Difference From in the show value as tab, with the base field as year month and the base item as
(previous).
This will tell us the revenue percentage change month over month.
FINAL OUTPUT:
FINAL OUTPUT:
year_month revenue_diff_pct
2019-01
2019-02 -28.56%
2019-03 23.35%
2019-04 -13.84%
2019-05 13.49%
2019-06 -2.78%
2019-07 -6.00%
2019-08 28.36%
2019-09 -4.97%
2019-10 -12.68%
2019-11 1.71%
2019-12 -2.11%
PROBLEM STATEMENT: Find how many orders are made by new customers and repeat customers
get output as order_date, new_customer_count, repeat_customer_count
Customer_orders Table:
MENTAL APPROACH:
1. At first we need to find the first order date for all the customers for each date. It will help us
to find the number of new customers.
2. Now as we know the first order date for each customer for each date. We can easily find the
number of repeat customers for each date with the help of this.
NOTE: We will compare the order date of each customer to its first order date so that we can
confirm whether it is a repeat order or the customer has ordered for the first time.
QUERY:
QUERY EXPLANATION:
QUERY EXPLANATION:
1. With the help of CTE we are finding the minimum order date for each customer.
This will help us to find the first order date for each customers.
Similarly when order date and first order date are not same that means that customer is
repeat customer. So, we will flag 1 when this is met else with 0 and add all of them. It will
give count of repeat customers.
FINAL OUTPUT:
QUERY EXPLANATION:
1. In CTE first order date for each customer is extracted.
customer_id first_order_date
100 01-01-2022
200 01-01-2022
300 01-01-2022
400 02-01-2022
500 02-01-2022
600 03-01-2022
2. With SELECT statement after CTE we are flagging 1 and 0 with CASE WHEN.
Here, we have used join because in CTE we don't have an order date. So to get count of new
and repeat customers for each order date we need the order date.
3. Now to count new and repeat customers for each date we will SUM the flagged numbers.
3. Now to count new and repeat customers for each date we will SUM the flagged numbers.
FINAL OUTPUT:
order_date new_customers repeat_customers
01-01-2022 3 0
02-01-2022 2 1
03-01-2022 1 2
For more such interview questions visit Ankit Bansal Sir's Channel by clicking here.
For more such written explanations of interview questions visit my LinkedIn by clicking here.
25 January 2023 11:59
PROBLEM STATEMENT: Amber's conglomerate corporation just acquired some new companies. Each of the
companies follows this hierarchy:
Given the table schemas below, write a query to print the company_code, founder name, total number
of lead managers, total number of senior managers, total number of managers, and total number
of employees. Order your output by ascending company_code.
Note:
• The tables may contain duplicate records.
• The company_code is string, so the sorting should not be numeric. For example, if the company_codes are C_
1, C_2, and C_10, then the ascending company_codes will be C_1, C_10, and C_2.
Input Format
The following tables contain company data:
• Company: The company_code is the code of the company and founder is the founder of the company.
• Lead_Manager: The lead_manager_code is the code of the lead manager, and the company_code is the code
of the working company.
• Senior_Manager: The senior_manager_code is the code of the senior manager, the lead_manager_code is the
code of its lead manager, and the company_code is the code of the working company.
• Employee: The employee_code is the code of the employee, the manager_code is the code of its manager,
the senior_manager_code is the code of its senior manager, the lead_manager_code is the code of its lead
manager, and the company_code is the code of the working company.
Sample Input
Company Table:
Senior_Manager Table:
Manager Table:
Employee Table:
Sample Output
C1 Monika 1 2 1 2
C2 Samantha 1 1 2 2
MENTAL APPROACH:
1. As we want to find total number of lead managers, total number of senior managers, total number
of managers, and total number of employees for each company so we will find count of all the
designations from their respective table.
QUERY:
QUERY EXPLANATION:
1. We are joining all the tables and counting the required fields.
Here, we have used distinct in count because same manager can be manager for multiple employees and
similarly same senior manager can be senior manager for multiple managers and likewise lead manager
also.
NOTE: We could have used the employee table directly over here for getting the required fields but in a
real scenario, it would not be the appropriate way. It is because there may be new employees who have
not been assigned to any manager. Thus we are joining all tables and then finding the required fields.
2. We are grouping by company code because we want the required field for each company
3. At last we are ordering on the basis of ascending order of company code.
SAMPLE OUTPUT:
Problem Statement: Pivot the Occupation column in OCCUPATIONS so that each Name is sorted
alphabetically and displayed underneath its corresponding Occupation. The output column headers
should be Doctor, Professor, Singer, and Actor, respectively.
Note: Print NULL when there are no more names corresponding to an occupation.
Input Format
The OCCUPATIONS table is described as follows:
Occupation will only contain one of the following values: Doctor, Professor, Singer or Actor.
Sample Input
Sample Output
Doctor Professor Singer Actor
Jenny Ashley Meera Jane
Samantha Christeen Priya Julia
NULL Ketty NULL Maria
SOLUTION
Mental Approach:
1. Basically at first we will make unique column for each distinct occupation that is available in occupation
column.
2. We will traverse for the particular Name that belongs to the particular Occupation.
3. Now we will put that Names for their corresponding Occupations one by one.
Output we get:
Query Steps: Creating different columns with CASE WHEN and then inserting the Name values for matching
WHEN condition.
Problem: In output, we can see that a single Name is coming in all rows for different occupations but actually a
row should contain Names for all columns unless there is no available Name for that particular Occupation.
Solution: So, here we must aggregate the Name so that we can group them together.
Output of Subquery:
Occupation Name rn
Actor Eve 1
Actor Jennifer 2
Actor Ketty 3
Actor Samantha 4
Doctor Aamina 1
Doctor Julia 2
Doctor Priya 3
Professor Ashley 1
Professor Belvet 2
Professor Britney 3
Professor Maria 4
Professor Meera 5
Professor Naomi 6
Professor Priyanka 7
Singer Christeen 1
Singer Jane 2
Singer Jenny 3
Singer Kristeen 4
1. Here with ROW_NUMBER firstly we are partitioning on the basis of Occupation so that Names for every
Occupation get separated.
2. After partitioning, we are providing a row number to each row on the basis of the Name column by
providing the Name in the ORDER BY clause within the ROW_NUMBER function.
3. Thus, we see row number on the basis of ascending order of Name for that particular occupation starting
with row number 1 for each new occupation.
Output:
PROBLEM STATEMENT: Harry Potter and his friends are at Ollivander’s with Ron, finally replacing
Charlie’s old broken wand. Hermione decides the best way to choose is by determining the minimum number of
gold galleons needed to buy each non-evil wand of high power and age. Write a query to print the id, age,
coins_needed, and power of the wands that Ron’s interested in, sorted in order of descending power. If more
than one wand has same power, sort the result in order of descending age.
Input Format
The following tables contain data on the wands in Ollivander’s inventory:
• Wands: The id is the id of the wand, code is the code of the wand, coins_needed is the total number of
gold galleons needed to buy the wand, and power denotes the quality of the wand (the higher the power,
the better the wand is).
• Wands_Property: The code is the code of the wand, age is the age of the wand, and is_evil denotes
whether the wand is good for the dark arts. If the value of is_evil is 0, it means that the wand is not evil.
The mapping between code and age is one-one, meaning that if there are two pairs, and , then and .
Sample Input
Wands Table:
Wands_Property Table:
Sample Output
9 45 1647 10
12 17 9897 10
1 20 3688 8
15 40 6018 7
19 20 7651 6
11 40 7587 5
10 20 504 5
18 40 3312 3
20 17 5689 3
5 45 6020 2
14 40 5408 1
EXPLANATION:
The data for wands of age 45 (code 1):
QUERY:
QUERY EXPLANATION:
1. We're using SUBQUERY "ranking" to sort the coins_needed in ascending order. for each partition formed
on the basis of age and power.
We ordered in ascending order here because we want the most coins needed (gold galleons) at the top of
each partition.
2. Using the outer query, we are now choosing the appropriate columns using a filter condition to obtain
records where rank is 1 for all separate partitions.
26 January 2023 11:22
PROBLEM STATEMENT : Write a query to print all prime numbers less than or equal to 1000 . Print your result on
a single line, and use the ampersand (&) character as your separator (instead of a space).
For example, the output for all prime numbers would be:
2&3&5&7
QUERY:
QUERY EXPLANATION:
1. With recursive CTE we are creating the series of number from till 1000. In first SELECT statement we are simply getting 2
(because 2 is smallest prime number) and in second SELECT statement we recursively calling the CTE by increasing value of n
by 1.
NOTE: Here, recursive CTE acts like a loop that is it will get executed till the condition is met.
2. Now after CTE we are simply using STRING_AGG function in SELECT statement to get the result as per our requirement.
3. To get the prime number we are using the NOT EXISTS in WHERE condition along with subquery.
4. In subquery we are selecting all n where value of n from c1 is divisible by value of n from another table c2 and value of n from
table c1 is not equal to value of n from table c2.
NOTE: Here, this we are doing so that we can check if number n is divisible by all other numbers except of itself or not.
5. So, here NOT EXISTS will help us to get the result except of the result provided by subquery.
NOTE: AS per different databases there are different syntax for combining the result of different columns in one row. In MS
SQL Server we can use STRING_AGG.
2&3&5&7&11&13&17&19&23&29&31&37&41&43&47&53&59&61&67&71&73&79&83&89
&97&101&103&107&109&113&127&131&137&139&149&151&157&163&167&173&179&
181&191&193&197&199&211&223&227&229&233&239&241&251&257&263&269&271&
277&281&283&293&307&311&313&317&331&337&347&349&353&359&367&373&379&
383&389&397&401&409&419&421&431&433&439&443&449&457&461&463&467&479&
487&491&499&503&509&521&523&541&547&557&563&569&571&577&587&593&599&
601&607&613&617&619&631&641&643&647&653&659&661&673&677&683&691&701&
709&719&727&733&739&743&751&757&761&769&773&787&797&809&811&821&823&
827&829&839&853&857&859&863&877&881&883&887&907&911&919&929&937&941&
947&953&967&971&977&983&991&997
QUERY:
QUERY EXPLANATION:
1. There are three different CTEs: floor_visited, total_visits, and agg_cte.
By CTE floor_visited, we can determine how many times a floor has been visited by a specific user and
rank them accordingly in decreasing order.
In order to determine how many times a particular user has visited the premises, use the CTE total_visits.
Using CTE agg_cte, you can combine the different resources used by a particular user.
2. SELECT statement after CTE selects required fields and filters with WHERE condition of rnk=1.
We are using a WHERE condition here so that we only include records from users who have visited a
particular floor the most number of times.
NOTE
FOR MORE SUCH QUESTIONS VISIT ANKIT BANSAL Sir's Yotube Channel: Click Here
FOR MORE SUCH WRITTEN EXPLANATION BY ME: Click Here
01 January 2023 09:10
Problem Statement : Query all columns for all American cities in the CITY table with populations larger
than 100000. The CountryCode for America is USA.
The CITY table is described as follows:
CITY
Field Type
ID NUMBER
NAME VARCHAR2(17)
COUNTRYCODE VARCHAR2(3)
DISTRICT VARCHAR2(20)
POPULATION NUMBER
First Step: For selecting all columns we will select by column names but not using * because it lowers
the performance of the code.
This will list the output for all the cities without any filter.
Sample output
Second Step: We need to filter above query so that we only see output of American cities.
Sample Output
Third Step: Now we need to filter above query to provide output for those records only which are
having population greater than 100000
This query will give use the final output that is all details of American country having population greater
Sample Output
Problem Statement : Query the NAME field for all American cities in the CITY table with populations
larger than 120000. The CountryCode for America is USA.
The CITY table is described as follows:
Field Type
ID NUMBER
NAME VARCHAR2(17)
COUNTRYCODE VARCHAR2(3)
DISTRICT VARCHAR2(20)
POPULATION NUMBER
First Step: For selecting NAME column we will specify NAME column in SELECT statement.
This will list all the NAMES of cities irrespective of any particular country.
Sample Output
NAME
Rotterdam
Scottsdale
Corona
Concord
Cedar Rapids
Coral Springs
Fairfield
Boulder
Fall River
Second Step: We need to filterve query so the that we only see output of American cities.
Page 1
This will give all NAMES of American cities.
Sample Output
NAME
Scottsdale
Corona
Concord
Cedar Rapids
Coral Springs
Fairfield
Boulder
Fall River
Insight: From this, we can see that Rotterdam is a city of another country so it has been filtered out.
Third Step: Now we need to filter above query to provide output for those records only which are having
population greater than 120000
This query will give use the final output that is NAME of American cities having population greater than
120000.
Sample Output
NAME
Scottsdale
Corona
Concord
Cedar Rapids
Page 2
Insight: From this, we can see that Population of Coral Springs, Fairfield, Boulder and Fall River are less than
120000 so they are filter out of our final output.
Page 3
03 January 2023 09:22
Problem Statement: Write a query that calculates the difference between the highest salaries
found in the marketing and engineering departments. Output just the absolute difference in
salaries.
db_employee
id int db_dept
first_name varchar id int
last_name varchar department varchar
salary int
department_id int
FIRST APPROACH
1. To find the MAX salary for marketing (dept_id =4) department
Here we could have directly used department_id = 4 in the WHERE clause without using JOIN as well but in real world,
it was a chance that there are lot of departments then we would have gotten into difficulty.
Page 1
3. Now we need to find difference of both max_salary_marketing and max_salary_engineering.
For this we could have done it in different ways.
We can directly use both SELECT statements as Subquery and subtract one from other and use ABS function to
get absolute output which.
e.g (5-9) = - 4
e.g ABS(5-9) = 4
Output
difference
2400
INSIGHT: This approach will take more time as here it will have to iterate through multiple select statement in subquery
and then find max for both and then finally take the difference of them.
SECOND APPROACH
1. Find MAX salary for all departments
Page 2
Output:
department salary
customer care 49926
engineering 45787
human resource 46356
marketing 48187
operation 49488
sales 47657
2. Now we will filter for department for which we need to find the difference of maximum salary. So, we will use
WHERE clause with IN
Output:
department salary
engineering 45787
marketing 48187
3. As we have not got the maximum salary of both the required departments. We can now get difference of MAX
and MIN salary from this by using it as a subquery.
Page 3
Output:
salary_difference
2400
Page 4
03 January 2023 08:16
Problem Statement: Query all columns for a city in CITY with the ID 1661.
The CITY table is described as follows:
Field Type
ID NUMBER
NAME VARCHAR2(17)
COUNTRYCODE VARCHAR2(3)
DISTRICT VARCHAR2(20)
POPULATION NUMBER
First Step: We will select all columns that is required within SELECT statement.
Sample Output
Second Step: Now we will filter for ID for which we need all details.
Page 1
Second Step: Now we will filter for ID for which we need all details.
Insight: In this query we have filtered for ID 1661. So it will list record that is having ID as 1661 only.
Sample Output:
Page 2
15 January 2023 11:51
PROBLEM STATEMENT: You are given two tables: Students and Grades. Students contains three
columns ID, Name and Marks.
Sample Output
Maria 10 99
Jane 9 81
Julia 9 88
Scarlet 8 78
NULL 7 63
NULL 7 68
Note
Print "NULL" as the name if the grade is less than 8.
Explanation
Consider the following table with the grades assigned to the students:
SOLUTION
MENTAL APPROACH:
1. Output Name, Grade and Marks from both table by combining them together.
2. We have been provided with minimum and maximum marks for Grades. Thus, we need to search for
marks that are present in Students table in Grades table's within which range it lies i.e between the
minimum and maximum marks of Grades table.
3. Now we will sort the result on the basis of Grades. If one or more students having same Grades then we
will sort them on the basis of their Name.
4. At last we will replace the Student Name with "NULL" for those students who are having Grade less than
8.
QUERY:
OUTPUT:
PROBLEM STATEMENT: Julia just finished conducting a coding contest, and she needs your help
assembling the leaderboard! Write a query to print the respective hacker_id and name of hackers
who achieved full scores for more than one challenge. Order your output in descending order by the
total number of challenges in which the hacker earned a full score. If more than one hacker received
full scores in same number of challenges, then sort them by ascending hacker_id.
Input Format
The following tables contain contest data:
• Hackers: The hacker_id is the id of the hacker, and name is the name of the hacker.
• Difficulty: The difficult_level is the level of difficulty of the challenge, and score is the score of the
challenge for the difficulty level.
• Challenges: The challenge_id is the id of the challenge, the hacker_id is the id of the hacker who
created the challenge, and difficulty_level is the level of difficulty of the challenge.
• Submissions: The submission_id is the id of the submission, hacker_id is the id of the hacker who
made the submission, challenge_id is the id of the challenge that the submission belongs to,
and score is the score of the submission.
Difficulty Table:
Submissions Table:
Sample Output
90411 Joe
Explanation
Hacker 86870 got a score of 30 for challenge 71055 with a difficulty level of 2, so 86870 earned a full
score for this challenge.
Hacker 90411 got a score of 30 for challenge 71055 with a difficulty level of 2, so 90411 earned a full
score for this challenge.
Hacker 90411 got a score of 100 for challenge 66730 with a difficulty level of 6, so 90411 earned a
full score for this challenge.
Only hacker 90411 managed to earn a full score for more than one challenge, so we print the
their hacker_id and name as space-separated values
SOLUTION
MENTAL APPROACH:
1. Firstly we will combine all the tables one by one together so that we can get our desired output easily
and efficiently.
2. Now we will filter for hackers who have scored full scores. (In SQL we will do this by using the WHERE
clause)
3. As we want hackers who have submitted more than one challenge, we will filter this out. (Using HAVING
Clause)
4. We will n
5. ow order our output on the basis of descending order of the number of challenges solved by hackers.
6. If there are multiple hackers who have solved the same number of challenges then we will order on the
basis of ascending order of hacker id.
QUERY:
QUERY EXECUTION:
1. FROM and JOIN clauses will be executed in the first step.
NOTE: Here, we are using the submission table as the main table for joining other tables because we are
details of challenges submitted by hackers are provided in this table.
2. Filtering on the basis of the WHERE clause will be executed by making sure that score in the submissions
table is equal to the score present in the difficulty table. Now we will be left with records of those
hackers who have scored full scores in the challenge.
Here, we have used the score from the difficulty table to compare it with the score from the submissions
table because it contains the maximum score for a particular difficulty level.
3. Now GROUP BY will get executed and it will group the above result on the basis of hacker id.
Here, it was done to get distinct hackers as it was repeating because the same hacker has submitted
4. After GROUPING now HAVING clause will get executed and it will filter out the records on the basis of
the condition provided.
Here, we are basically filtering so that we can get records of those hackers who have submitted more
than one challenge.
5. Now the SELECT query will be executed and we will get hacker_id and hacker name as output.
NOTE: Here, we are using the aggregate function MAX() for the hacker name because we have to group
the records on the basis of the hacker id to get the desired output.
6. At last we are using the ORDER BY clause to order our output.
SAMPLE OUTPUT:
hacker_id name
27232 Phillip
28614 Willie
15719 Christina
43892 Roy
14246 David
14372 Michelle
18330 Lawrence
26133 Jacqueline
26253 John
30128 Brandon
35583 Norma
PROBLEM STATEMENT: Calculate each user's average session time. A session is defined as the time difference
between a page_load and page_exit. For simplicity, assume a user has only 1 session per day and if there are
multiple of the same events on that day, consider only the latest page_load and earliest page_exit. Output the
user_id and their average session time.
FIELD TYPE
user_id int
timestamp datetime
action varchar
TABLE:
NOTE: Here, latest page load time means if same user for same day is having different page load time
then we should choose one which is the most recent (or we can say maximum time out of all available
time)
2. Now similarly find the page exit time for all different users on different days. But here we should
consider the which ever is the earliest one ( that is minimum time out of all available time).
3. We want session time which is difference of page exit time and page load time divided by the total
number of days (here it is basically calculating average)
Here, we are told to consider that in one day user logins one time only, so it means one day here we
consider as one session
4. As we have got page load and page exit time for all users for all different days, now we will calculate the
difference of page exit time and page load time for all users for all different days.
5. Now we need to count the number of sessions for each user. (which is basically count number of days a
user has logged in)
6. Finally, we will calculate the session time by using step 3 formula for each users.
QUERY:
2. First CTE that is timestamp_cte we finding the dividing the page load and exit time in different columns.
Here, we are removing the other actions apart from page load and exit because they are of no use. At
last, we are grouping them on users and day wise.
4. At last we are extracting user id and average session time from page_load_exit_cte and filtering out
those users whose average session time is NULL.
user_id average_session_time
0 1883.5
1 35
Page 1
Page 2
05 January 2023 08:26
COUNT() Function
COUNT() function is used to count the number of records in a table.
SYNTAX :
COUNT(*) : It will count all the records present in a table irrespective of NULL values which means it will count all the NULL
and NON-NULL values.
Output
number_of_employees
5
Page 1
This query will count the number of employees of those who are having last name.
Output
number_of_employees
3
Insight: It is showing 3 as output because it has filtered out the null records and count the non-null records (i.e Kumar,
Gupta, Yadav) from the last_name columns. So basically now it is using below table to count the records.
NOTE : If we want distinct first name from employee table then we will use DISTINCT FIRST NAME inside the COUNT()
function.
COUNT(DISTINCT column_name): This will count unique records present in a particular column.
Output number_of_employees
4
Insight: It is showing 4 as output because it is counting only distinct or unique records (i.e Prayash, Ranya, Anusha, Salone)
from the firs_name column. It has removed the duplicate record for Ranya. It is using below table to count distinct records.
Problem Statement: Find the difference between the total number of CITY entries in the table and the number of
distinct CITY entries in the table.
The STATION table is described as follows:
Page 2
where LAT_N is the northern latitude and LONG_W is the western longitude.
For example, if there are three records in the table with CITY values 'New York', 'New York', 'Bengalaru', there are
2 different city names: 'New York' and 'Bengalaru'. The query returns 1 , because .
total number of records - number of unique city names = 3 - 2 = 1
Query
Output
difference
13
Page 3
06 January 2023 08:09
LIKE OPERATOR
In SQL like operator is used in WHERE clause to search for any type of pattern within a column.
SYNTAX:
Page 1
Problem Statement: Query the list of CITY names starting with vowels (i.e., a, e, i, o, or u)
from STATION. Your result cannot contain duplicates.
Input Format
The STATION table is described as follows:
where LAT_N is the northern latitude and LONG_W is the western longitude.
Mental Approach
1. We find all the CITY names first. (in sql we use SELECT clause)
2. Now we will only choose the CITY names which are starting with a,e,i,o,u. (in sql we use WHERE clause
with LIKE operator)
3. We now find the distinct CITY name out of above list because we don't want duplicate records. (in sql
we do this using DISTINCT in SELECT clause)
Page 2
Sample Output
NAME
Arlington
Turner
Slidell
Negreet
Glencoe
Alisa
Chignik Lagoon
Pelahatchie
Ellen
Dorrance
Albany
Usain
Final Query
Sample Output
Name
Arlington
Alisa
Ellen
Albany
Usain
Page 3
What To Study
06 January 2023 09:47
PROBLEM STATEMENT: Query the list of CITY names from STATION that do not
start with vowels. Your result cannot contain duplicates.
Input Format
The STATION table is described as follows:
where LAT_N is the northern latitude and LONG_W is the western longitude.
SAMPLE TABLE:
SAMPLE OUTPUT :
CITY
Kissee Mills
Loma Mar
Sandy Hook
Tipton
Turner
Slidell
Negreet
INSIGHTS: All names starting with a,e,i,o,u are removed from provided data.
Query the Western Longitude (LONG_W) for the largest Northern Latitude (LAT_N) in STATION that is less than
137.2345 . Round your answer to 4 decimal places.
Input Format
The STATION table is described as follows:
where LAT_N is the northern latitude and LONG_W is the western longitude.
Mental Approach:
1. We will search for a maximum value of LAT_N which is less than 137.2345. (In SQL query we will use WHERE clause for it)
2. Now corresponding to that value we will look LONG_W value. (In SQL query we do it using SELECT clause)
3. At last we will round the LONG_W value to 4 decimal places. (In SQL within SELECT clause we round the value using some
ROUND functions)
Query:
Output:
LONG_W
117.2465
Problem Statement: A median is defined as a number separating the higher half of a data set from
the lower half. Query the median of the Northern Latitudes (LAT_N) from STATION and round your
answer to decimal places.
Input Format
The STATION table is described as follows:
where LAT_N is the northern latitude and LONG_W is the western longitude.
Mental Approach:
1. We will sort the LAT_N from smallest to largest
2. Now we will check whether count of LAT_N is odd or even.
3. If it is odd then we will go the middle value that will be our median.
4. If it is even then we need to find the average of middle two values.
Generic Method:
Overall Output:
median
83.8913
Problem Statement: Find the titles of workers that earn the highest salary. Output the highest-paid title or
multiple titles that share the highest salary.
worker table
Field Type
worker_id int
first_name varchar
last_name varchar
salary int
joining_date datetime
department varchar
title table
Field Type
worker_ref_id int
worker_title varchar
affected_from datetime
Table
worker table
worker_id first_name last_name salary joining_date department
1 Monika Arora 100000 20-02-2014 09:00 HR
2 Niharika Verma 80000 11-06-2014 09:00 Admin
3 Vishal Singhal 300000 20-02-2014 09:00 HR
4 Amitah Singh 500000 20-02-2014 09:00 Admin
5 Vivek Bhati 500000 11-06-2014 09:00 Admin
6 Vipul Diwan 200000 11-06-2014 09:00 Account
7 Satish Kumar 75000 20-01-2014 09:00 Account
8 Geetika Chauhan 90000 11-04-2014 09:00 Admin
Mental Approach:
1. First combine both table together. (In SQL we do this by using JOINS)
2. Now look for highest salary in the table. (In SQL we do this using aggregate function MAX() )
3. Now corresponding to that highest salary search for worker_title. (For this we will just query for
worker_title in SELECT statement)
Query:
max_salary
500000
Final Output:
worker_title
Asst. Manager
Manager