Optimizing Data Loading
Optimizing Data Loading
Agenda
The Three Layers of ETL Think about Hardware Think about your Access Methods Hardware Trends and ETL Parallel Loading and Partitioning Q&A
Balanced Configuration
The weakest link defines the throughput
CPU Quantity and Speed dictate
number of HBAs capacity of interconnect
FC-Switch1
FC-Switch2
Spend your time wisely and try to achieve the biggest improvements that can be made Minimize staging data (writes are expensive)
Bulk Performance
Flat Files
Common Methods
JDBC, ODBC, Gateways and DBLinks
XML Files
Web Services
Heterogeneous
File Formats
Use a format allowing position-able and seek-able scans Delimitate clearly and use well known record termination to allow for automatic Granulation
BCP Unload
FTP
External Tables
Oracle Source
FTP
Compress Uncompress
Hardware Trends
Commodity hardware platforms
Intel Chips 64 bit Linux OS
Clustered environments
Increasing CPU counts Increasing memory sizes available
Larger systems
A lot more data Compute power you didnt think you could have
FUSE
Filesystem in Userspace (http://fuse.sourceforge.net/)
Combining DBFS with FUSE offers mountable filesystems for Linux x64 (e.g. Database Machine!)
Create a mount point for the file system owned by the oracle OS user e.g. /data
Grant quota unlimited on the tablespace to user Create actual filesystem using the script
$ORACLE_HOME/rdbms/admin/dbfs_create_filesystem_advanced .sql
External Tables
Oracle Source
SCP
Register by March 18 to Save on COLLABORATE 10 IOUG Forum April 18-22, 2010 Las Vegas, NV
Register via offer code BIWA2010 (step one of the registration process) by March 18 to save $740 off the onsite rate Registration gets your name entered into a drawing for an iPod Touch!
collaborate10.ioug.org