0% found this document useful (0 votes)
121 views2 pages

Micron Interview Questions Summary # Question 1 Parsing The HTML Webpages

The document contains summaries of Micron interview questions and answers. For question 1, the answer describes parsing HTML pages using Beautiful Soup and pandas to extract table data into a dictionary. For question 2a, the answer checks for new data in NewData.csv to append to MasterDB.csv if the status is "Available" and price and COE are not "N.A.". For 2b, it describes using left outer join to check for and update changes between the files. For 2c, it removes rows from MasterDB.csv if the status in NewData.csv is "Sold". For question 3a, it splits the "Car Name" column into "Car Make", "

Uploaded by

Kartik Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views2 pages

Micron Interview Questions Summary # Question 1 Parsing The HTML Webpages

The document contains summaries of Micron interview questions and answers. For question 1, the answer describes parsing HTML pages using Beautiful Soup and pandas to extract table data into a dictionary. For question 2a, the answer checks for new data in NewData.csv to append to MasterDB.csv if the status is "Available" and price and COE are not "N.A.". For 2b, it describes using left outer join to check for and update changes between the files. For 2c, it removes rows from MasterDB.csv if the status in NewData.csv is "Sold". For question 3a, it splits the "Car Name" column into "Car Make", "

Uploaded by

Kartik Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Micron interview questions summary

# Question 1 Parsing the html webpages


For parsing the html pages I have used the beautiful Soup along with the pandas
for the dataframes object. Then I read the tables which was in the section of your
question using soup html parser. Then I find the rows and row data associated
with it. In order to process the column headers and values I have separated both
and merged afterwards by cleaning it for any spaces.
And then as required I have packed all the list values which contains column
headers and values to a dict.

# Question 2
a Check if there are new lines in the ‘[Link]’, and append them to the
existing ‘[Link]’, as long as the ‘Status’ in the row is ‘Available’, and
the ‘Price’ and ‘COE’ columns are not ‘N.A’ (has value in ).

Initially read the data and check the condition given for appending the new data

rows_to_be_updated=new_data[(new_data['COE']!="N.A.") &
(new_data['Price']!="N.A") &(new_data['Status']=='Avaialble')]

And after fetching the above records, there is missing value which has to be
treated. Then for comparing the rows from master and fetched rows , there is
‘compare’ method in pandas which I have avoided as it’s resource intensive and
not supported with some pandas versions which could be bottleneck. Inorder to
compare I have used the last index of master data and then appended the
fetched rows according to it.

b. For the existing lines, see if the [Link], contains any changes. If
yes, update the changes in the ‘[Link]’.

Used left outer join for comparing the further rows and removed unwanted rows.
We could have done with several methods alternatively.
c. If the column ‘Status’ in the [Link] is ‘Sold’, then remove those
lines from the ‘[Link]’
Just checked the condition for not equals sold and then filtered the remaining
rows in master.

# Question 3
a. Develop a script that can split Column ‘Car Name’ to get the following
attributes

i. Car Make

ii. Car Model Name

iii. COE End Date

Used the lambda functions for splitting the column according to space. And for
the end date I have fetched last elements and extracted the date from it. Lambda
function can be used in python as well as in spark which provides better
performance.

b. Build statistical model for every car


make(Eg. Toyota)
i. Mean, Median, mode

ii. +- 3 Sigma Value

In order to extract all the above statistics I have formatted the data in specific
dtypes. And the nI have used groupby pandas as well as aggregation for all the
values.

You might also like