SQL Server Interview Questions (SSIS) : - How To Check Quality of Data ?
SQL Server Interview Questions (SSIS) : - How To Check Quality of Data ?
Taken from my book SQL Server interview questions by Shivprasad Koirala http://www.flipkart.com/sqlserver-interview-questions-8183331033/p/itmdyuqz2a6tzhjw Many times you get raw data (as the one shown below) and you would like to understand what kind of quality does this data have?. For example for the below data you would probably like to know: How many null values exist in the name field? What are the types of contact information, email, phone, address etc. What kind of salary range exists? Etc etc
Name Shiv Raju Ajay Kumar Neeraj Vishal sharma Yadav Dinesh
Contact shiv_koirala@yahoo.com 91-022-2130928933 shaam@yahoo.com ajay@yahoo.in kumar@gmail.in neeraj@yahoo.com suraj@yahoo.com vishal@gmail.in sharma@yahoo.com 91-022-2130928933 dinesh@yahoo.com
DOB 3/12/1980 11/2/1975 3/16/1988 5/22/1986 9/24/1977 4/16/1971 2/19/1973 6/24/1978 3/26/1976 8/13/1983 1/17/1966
Salary 1000 1500 1000 1000 6000 4000 8000 3000 2000 1000 5000
Country IND IND NEP IND USA USA IND USA IND IND IND
EMP Code E001 E002 E003 E004 E005 E006 E006 AMDK E005 AQPR E007
Pan card D001 D002 D003 D004 D005 D006 D007 D008 D009 D010 D011
CountryTaxcode IND IND NEP IND USA USA IND USA IND IND IND
Tax% 5 5 2 5 6 6 5 6 5 5 5
This can be achieved by using data profiling task. Data profiling task is available in the control flow toolbox. Following steps needs to be followed: Create profile request in data profiling task. Once you run the data profiling task it creates a XML output. You can then view the XML output using data profile viewer. Data profile viewer exists in C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn directory.
Below are more details of what kind of data analysis is performed by these 8 profile requests.
Type of data analysis How many NULL values exist? Detects what kind of pattern does the data have email address , website URL etc. What are the minimum, maximum, average values in column? What are the distinct lengths of string values?. Profile request Column null ratio profile request Column pattern profile request. Column Statistics profile request.
For
instance you have a country code column you would like to ensure that the length should be equal to 3 (IND , USA ). In case there are some other
lengths you would like to take necessary actions ahead. Finds out how many distinct values exists for a column. How much does one columns depend on other column?. It helps you to find out at how many places the dependency has been violated. Which columns are good candidates for primary keys?. Checks if there is overlap of values between two columns?. Helps to detect a likely foreign key column?.
Column Value Distribution Profile Functional Dependency Profile Candidate Key Profile Value Inclusion Profile
Heres an awesome SQL Server interview question: - How does index makes your search faster ? http://youtu.be/rtmeNwn4mEg?hd=1 Do not forget to see our .NET interview question videos and SQL Server interview questions from www.questpond.com