Data Subsetting and Masking
Data Subsetting and Masking
Agenda
Agenda
BUT IS IT SECURE?
1,800 Exabytes
Source: IDC, 2008
2011
5
Full production
copies for test
systems not cost
effective
Producing
relationally intact
subset is hard but
necessary
Error-prone,
manual process
Test System
Setup
Financials
Supply Chain
Management
Human Capital
Management
Customer
Relationship
Management
Data
Relationship
Modeling
Data
Subsetting
Test
System
Setup
Data
Masking
Agenda
Sensitive
Data
Identification
Data
Relationship
Modeling
Data
Subsetting
Data Masking
Test
System
Setup
10
Sensitive
Data
Identification
Data
Relationship
Modeling
Data
Subsetting
Test
System
Setup
Data
Masking
11
Import from
masking templates
Set columns as
sensitive Manually
12
Agenda
13
Sensitive
Data
Identification
Data Subsetting
What?
Production
Data
Relationship
Modeling
Data
Subsetting
Test
System
Setup
Data Masking
Why?
Reduce the storage overhead created by
production data copies in various
z
application
environments
Allow developers to perform real world
application development by using
production-class data
Application data
Application
metadata
Subset criteria:
REGION = NORTH AMERICA
AND FISCAL_YEAR = 2009
Test
Application data
Application
metadata
14
Financials
Supply Chain
Management
Human Capital
Management
Sensitive
Data
Identification
Data
Relationship
Modeling
Data
Subsetting
Test
System
Setup
Data Masking
Customer
Relationship
Management
15
Dimension
(Region:Asia)
Sensitive
Data
Identification
Data
Relationship
Modeling
Data
Subsetting
Test
System
Setup
Data Masking
Space
(Size:10%)
16
Production
Export =
Writing subset data
via DataPump
Clone
Production
Import
Test
Datapump
Export file
In-Place subset =
Deleting data in
the same database
Database size
Subset size
Test
Time*
1 Terabyte
1 Terabyte
*2-nodes Intel Xeon 6-core X5675 Processor w/ 216G memory running OEL 5.5
17
HR.EMPLOYEES
NAME
HR.EMPLOYEES
NAME
JOB_ID
SALARY
AGUILAR
SA_MAN
40000
BENSON
SA_REP
60000
JOB_ID
JOB
Min_SAL
SA_MAN
Sales Mgr
10000
SA_REP
Sales Repres
20000
Create
Application
Data Model
HR.EMPLOYEES
NAME
AGUILAR
JOB_ID
SALARY
SA_MAN
40000
HR.JOBS
HR.JOBS
JOB_ID
JOB
Min_SAL
SA_MAN
JOB
Min_SAL
Sales Mgr
10000
Create Data
Subset
Definition
Schemas
Tables
Relationships collected
Test/Staging
SALARY
HR.JOBS
JOB_ID
Create Test
Database
EM
Extract Data
Subset:
2 methods
Schemas
Tables
Relationships retrieved
18
Agenda
19
Sensitive Data
Identification
Data
Relationship
Modeling
Data Masking
Production
LAST_NAME
Test
Data
Subsetting
Data Masking
Test
System
Setup
SSN
SALARY
LAST_NAME
SSN
AGUILAR
203-33-3234
60,000
SMITH
111-23-1111
60,000
BENSON
323-22-2943
40,000
MILLER
222-34-1345
40,000
SALARY
Compound
Masking
Compound Mask
Sets of related columns masked together e.g. Address, City, State, Zip, Phone
Condition-based Masking
Specify separate mask format for each condition, e.g. drivers license format for each state
SQL-expression based masking
Use SQL functions, e.g. UPPER, SUBSTR, TO_CHAR, to generate mask values, e.g.
SUBSTR(%ORIG_VALUE%,1,3)||111-1111
22
Data
Relationship
Modeling
Non-Oracle Databases
Production
monitor
(non-Oracle)
Database
Gateway
manage
Staging
(Oracle)
Staging
(Oracle)
manage
Database
Gateway
manage
Test
(Oracle)
Test
(non-Oracle)
monitor
Data
Subsetting
Data
Masking
Test
System
Setup
manage
Sensitive
Data
Identification
23
Flat file
masking with
Oracle Data
Integrator
Incremental
masking using
Oracle Data
Integrator
Pre-Mask
Reverse engineer
flat file into tables
Mask
Post-Mask
Write masked
incremental
transactions to Test
DB
24
Unmask
Reverse the masked data back
to its original value with the
same key
25
Capture
Workload
Replay
Workload
Deploy Replay
Clients
Database Replay
26
27
28
29
30
31
32