Introduction
My Full name is Anurag Srivastava, i have been working as Data Engineer from past 3 years,
i started my career in 2021, where i got mapped to a project of Cloud Data Warehouse that
was basically a data migration project, where i got opportunities to work in 2 different
teams. Both the team i worked was core Data Team.
Then explain about the project description in brief, quantify the data example worked on
loading of 975 PB of Data.
Roles and Responsibilities
As mentioned i worked in 2 Teams -:
DDL Team-: where we were responsible for conversion of the objects like table,views,
procedures, macros, triggers to GCP compatible and also support the cross functional
teams if there were any issues with the testing and ETL jobs, or object definitions.
DMU Team-: Here we were responsible to load the history and incremental data, there
were tables divided as Full load(<=100GB) and Full Large Loa(>100GB), and also help the
teams for incremental load of data in batch. Also helped cross functional teams for data
validation and data analysis.
Follow-up Questions on Projects
1. What was the Tech stacks you were using in the project?
2. How you kept the data after extracting?
3. What was the intermediate layer before loading to target?
4. How fast was the load and volume of data?
SQL Questions
1. What is Indexing in SQL, how it helps, is it logical or physical?
2. What is Nested field in BigQuery?
3. What is Repeated Fields in BigQuery?
4. What optimisation Techniques should be used in writing an complex select queries?
5. What is an Materialised View in SQL?
SQL Questions
Tell the difference between them?
1. select count(*) from emp;
2. select count(1) from emp;
3. select count(distinct 2) from emp;
4. select 1 from emp;
5. select 1;
Python Questions
Write a Python Code to do fibonacci series by adding previous 3 numbers
SQL Question
Query to write the second highest salary
BigQuery Architecture
Questions on Migration
1. Why Migration Happened?
2. Disadvantages of using Teradata over GCP?
3. Why We didn’t used any Data Transfer Tools?
Pyspark Question
1. What is Pyspark? Why it is used?
2. How to do Data Processing using GCP?
3. How do we read csv in spark?
4. If we don’t give inferschema = True what is the case?