Storing and Using Objects in A Relational Database
Storing and Using Objects in A Relational Database
net/publication/224102058
CITATIONS READS
10 1,998
4 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Berthold Reinwald on 24 November 2015.
In today's heterogeneous development ect often starts with established 00 tools, class li-
environments, application programmers have the braries, and object frameworks,' followed by a cus-
responsibility to segment their application data tomization step, and then is enhanced and refined
and to store those data in different types of
stores. That means relational data will be stored by using features such as inheritance and encapsu-
in RDBMSs (relational database management lation. This new programming paradigm has signif-
systems), C++ objects in OODBMSs (object- icantly improved both the programmer's produc-
oriented database management systems),SOM tivity and the timeliness and cost of application
(System Object Model) objects in OMG (Object
Management Group)persistent stores, and development. It is the growing interest in 00 appli-
OpenDocTM or OLFM(Object Linking and cations, coupled with the attractive features of re-
Embedding) compound documents in document lational database management systems (RDBMSS),
files. In addition, application programmers must that led to theadvent of extended RDBMSs, e.g., sys-
deal with multiple sewer systems with different tems like Postgres and Starburst, as well as object-
query languagesas well as large amounts of
heterogeneous data. This paper describes SMRC oriented database management systems (OODBMSS),
(shared memory-resident cache), an RDBMS e.g., systemslikeObjectStore**, 02**,Gemstone**,
extender that provides the ability to store objects and Versant" *.2-4 Since these systems were estab-
+
created in external type systems like C+ or lished, OODBMSs have matured significantly, creat-
SOM in a relational database, coresident with ing a market presence and increased market share.
existing relational or other heterogeneous data.
Using SMRC, applications can store and retrieve At the same time, RDBMS vendors saw some of the
objects via SQL (structured query language), and same 00 trends and subsequentlydeveloped object-
invoke methods on the objects, without requiring relational database management systems (ORDBMS),
any modifications to the original object e.g., systems like
UniSQL**, Illustra**, and D B ~ * . ~ "
definitions. Furthermore, the stored objects fully
participate in all the characteristic features of RDBMSS continue to dominate the database market,
the underlying relational database, e.g., and market analysts expectthat this trend will con-
transactions, backup, and authorization. SMRC is tinue.
implemented on top of ISM% DB2@Common
Sewer for A l p relational database system and Many users of RDBMSs are expanding towardappli-
heavily exploits the OB2 user-defined types
(UDTs), user-defined functions (UDFs), and large cations that require more effective handling of non-
objects (LOBS)technology. In this paper, the traditional data, such as text, voice, image, and fi-
+
C+ type system is used as a sample external nancial data. It is no surprise then, that most users
type system to exemplify the SMRC approach,
Le., storing C+ + objects in relational databases. Wopyright 1996 by International Business Machines Corpora-
Similar efforts are required for SOM or OLE tion. Copying in printed form for private use is permitted with-
objects.
out payment of royalty provided that (1) each reproduction is done
without alteration and (2) the Journal reference and IBM copy-
right notice are included on the first page. The title and abstract,
but noother portions, of this papermay be copied or distributed
I n recent years, object-oriented (00)technology
has achieved wideacceptance, maturity, and mar-
ket presence. An 00 application development proj-
royalty free without further permission by computer-based and
other information-service systems. Permission to republish any
other portion of this paper must be obtained from the Editor.
172 REINWALD ET AL. 0018-8670/96/$5.00 D 1996 IBM IBM SYSTEMS JOURNAL, VOL 35, NO 2, 1996
also desire their 00 data to be stored in their da- an efficient binding to bridge the gap between ob-
tabases without compromising the essential indus- jects of external type systems and RDBMSS in an at-
trial-strength features of RDBMSS that they already tractive and inexpensive way. By external type sys-
rely upon. Such features include robustness, high per- tem, we refer to types defined in C+ + , which are
1 formance, standards compliance, and support for different from the tables and fields defined in SQL.
open systems, security, bulk 1i0capabilities, anddif- Using SMRC,C + + objects are stored in the data-
ferent levels of concurrency and isolation. As a re- base in the samebinary format as they were created
sult, there is constant pressure on RDBMS vendors in the C+ + client application language. Thus, no
to provide additional functionality for storingobjects translation of C+ + class definitions to relational
that were created in the external type system of an schemata and no data conversion needs to be per-
00 programming language. This functionality goes formed. Standard SQL is used to store and retrieve
beyond user-defined types (UDTS),user-defined func- the C++ objects in the relational database. When
tions (UDFs), and large objects(LOBS)in SQL3.' UDTs retrieving an object from the database client
to mem-
extend the relational type system with new data types, ory, SMRC performs pointerswizzling (due to the re-
based on the relational built-in data types. The uDF location of the object in the client memory). Swiz-
mechanism provides a way to add functions to the zling isthe conversion of persistent database pointers
existing base of relational built-in functions. LOBSgive intomainmemoryaddresspointers.Whereas
the RDBMS a way to manipulate large dataobjects, schema mapper products are useful to provide an
1 typicallymultimedia
for applications.
Although the object-oriented view ofexistingrelational data,SMRC
addition ofUDTs,UDFs, and LOBS to an RDBMS in- provides persistence for new 00 data that need to
creases its functionality, these new features do not be storedin relational databases.In this sense,SMRC
match the functionality of classes, methods, and ob- is complementary to schema mapper products like
jects in an 00 programming language like C + + . Persistence**(seethesectionontraditionalap-
proach and related work) that require substantial
This paper describes the shared memory-resident data transformation between the relational repre-
cache (SMRC)prototype implementation, at the IBM sentation and C + + objects.
Almaden Research Center, that stores C+ + objects
in an RDBMS (e.g., D B Common
~ Server for AIX*) An alternative to using SMRC for making C+ + ob-
by exploiting the UDT, UDF, and LOB technology."."' jects persistent might be to use one of the above-
The design and implementation of SMRC" was es- mentioned OODBMSS. OODBMSS provide many fea-
pecially driven by the following requirements and turesthatare not available in mostrelational
goals: databases, such as arich object-oriented C+ + data
1 model, less impedance mismatch, fast navigational
The approach must be compatible with existing access, etc. However, OODBMSs offer these features
class libraries; thus there is no opportunity to in- at thecost of introducing their own server environ-
herit persistence properties from a commonroot ment in addition to an existing RDBMS environment,
object and modify class definitions to include ad- and thus burden the user with managing multiple
ditional constructors or add methods to support database servers. In fact, SMRC does not compete
persistence properties. directly with OODBMSS. OODBMSS target different
The objects must beaccessible in SQL (structured market segments andwork best for those users who
query language) queriesas theexisting relational have mostly 00 applications and need only persis-
data. tence and simple queryfacilities for their 00 data.
The methods of acquired class libraries must be OODBMSs offer smaller, faster servers for 00 data,
usable within SQL queries. and can handle varying granularities of data with
The performance of queries involving objects must ease. In contrast, SMRC supports a tight C+ + Ian-
1 be reasonable.This is particularly an areaof con- guage bindingas well as clustering and pointerswiz-
cern where methods, used within query predicates, zling for fast pointerbrowsing, seemingly as part of
are applied to millions of database records. If the the existing RDBMS that users already depend on.
predicate evaluatoris inefficient in invoking meth- SMRC is designed to allow users of an RDBMS to in-
ods of objects, then wheninvoked millions of times corporate 00 data into their existing relational ta-
on objects, the response time will be unacceptable. bles and applications. Using SMRC is similar to us-
ing an OODBMS,but SMRC uses a two-level store
SMRC (mostly) I 2 achieves the above goals by exploit- model rather than the traditionalsingle-level store
ing advanced features of RDBMSS and by providing Of most OODBMSs.
(A) SHIPPING
MAPPING
ADT
SAMPLE PERT
(€3) MAPPING
BLOB
SAMPLE
INSTANCE OF shipment 0- - -- -- 1
I
'1
1
1 1
::
I 1
I I
i ;
I I ""
-
""L
pi
p2
p3
-
1M
2M
1.5M
+"' '
I
~. ~ ~
0UDT
IS A OF TYPE shipping 0IS A UDT OF PERT
TYPE
Project Evaluation andReview Technique, chartsil- retrieve the referenced object from the database.
lustrate critical paths forcompletion of project tasks.) SMRC is able tofault in the referenced object from
A class hierarchy includes a super class activity and the databaseautomatically when the external pointer
two subclasses for sub and end activities. The figure is dereferenced. (When an object is referenced, but
shows C+ + objects of three PERT charts allocated is not in main memory, afault condition occurs re-
in SMRC heaps, which are mapped into the schedule sulting in retrieving the object from the database.The
column of the projects table.The samplealso shows terminology used for this event is fault in. Derefer-
the usefulness of external pointers, asone of the ob- encing a pointer results in the value at the location
jects in the PERT chart for projectp2 refers toproj- that the pointer points to.) In the case of the ADT
ect pl as a subproject.The implementation of a "real" mapping, only the one referenced object is faulted
BLOB mapping application (and the related experi- in, whereas in the case of the BLOB mapping, the
ences) using SMRC is described in a paper referenced whole heap containing the referenced object is in-
earlier. *' stalled in main memory.
SMRC supports additional functionality for external The application program determineshow C+ + ob-
pointers (as opposedto internal pointers).An exter- jects should be mappedto database containers.The
nal pointer containsall the information requiredto application program createsobjects either in the pro-
The SMRC persistence schema. A SMRC persistence SMRC tracks type and relocation information for
schema is a collection of application type descrip- pointer swizzling purposes. The type information
tions created by the SMRC schema compiler. The provides the pointer offsets to achieve addressabil-
schema essentially describes the layout of all of the ity of the pointer data membersin the objects. The
persistent C+ + objects for the application, which relocation information provides the basics to calcu-
is needed formemory management and pointer swiz- late the load differences of the objects required for
zling. The schema includes structural information(in pointer swizzling.Since the databasesystem does not
particular, size and pointer offset information) that know about C+ + class definitions (and C+ + does
contains the type information of embedded struc- not support run-time type information), SMRC at-
tures, unions, and dynamic arrays. In addition, the taches type tags to theobjects before they are stored
type information contains theoffsets of the hidden in the database system. After an object is retrieved
Get the type of an object:smrc-object-type (hvptr). Znsert objects into the database. Objects are created
When object instances of a class hierarchy are via the standardC+ + new operator andinserted into
stored in a column, itis useful to be able dynarn-
to the databasevia the standardSQL insert ~tatement.~”
ically identify the type of a particular objectin the In the sample smrc-tag call, “overseas” is the type
column. The smrc-object-type() call returns a char- name within the “shipping” application schema.An
acter string identifying the type of the object cur- object is created, tagged in an SQL host variable hv
rently retrieved into the SUL host variable hvptr. of an appropriate size and inserted into a table.
The following steps show the use of the ADT map- struct {unsigned short len; char data[830];} hv;
ping API for theshipping sample applicationin Fig- overseas *delptr = new overseas();
ure 1A. We start with the database description and
...
then insert and retrieve objects to and from the ta- smrc-tag (delptr,’overseas’,’shipping’,&hv);
ble. exec sql insert into
orders (ordno, prodno, quantity, delivery)
1 Create tableladd additionalcolumn. The objects of values (IO,20, 10, :hv);
the C+ + class hierarchy in Figure 1A are storedin
a table column delivery based on a distinct type. A Retrieve objects from the database. Objects are re-
distinct type essentially is a renamed built-in data- trieved using the standardSQL select statement. We
base type.X Thesize of the distinct type is the size do not impose any additional restrictions on such
of the largestclass in the class hierarchy (plus4 bytes statements. These statements can be dynamic, or
for the type tag). The following statements can be static for better performance, and can flow across
performed in dynamic SQL in order to determine the any supported API, such as DRDA,~’ODBC,” etc. They
830 varchar size (size of class overscas + 4) and de- can also be interactive or embedded in applications.
"""-"
APPLICATION PROGRAM:
shipment "sp;
exec sql select prodno, delivery into :pno, :ship-hv
from orders where ordno = 20;
sp = swizzle (&ship-hv);
printf ("Shipment of %iweighs %i", pno, sp->pkg->weight);
ple below, SMRC sets up theSQL host variable(hv) I/ Now the application canaccess objects in the heap
to store the heap into the database (the 500k size 11 via (pure) C++ pointer browsing.
in the declaration of the host variable is required
by the RDBMS for range checking). Working withexternal pointers and object caching.
SMRC supports external pointer^,^^,^^ which extend
I
sql type is blob(500k) *hv; the scope of pointers and referto objects stored in
hv = hp+pack(); 11 pack heap hp and setup hv other fields of the same column, other columns in
insert into projects (schedule) values (:*hv); the same table, and even columns in other tables.
Figure 2 shows an extension of the shipping ADT sam-
Retrieve heapsfiom database-SMRC heaps arere- ple application.Class shipment contains an external
pointer pkg to class package. The shipmentobjects
trieved from the database into an SQL host vari-
able. are stored in column delivery of table orders and
the package objects in column wrapping of table
posting. The sample application codefirst shows re-
sql type is blob(500k) hv;
trieving and swizzling ofa shipment object from the
select schedule into :hv
orders table. Froman applicationprogrammer's
from projects;
point of view, the external pointer pkg behaves ex-
actly like an internal pointer. But in a normalC+ +
I
Swizzle heaps-Aretrieved heap in a host variable application, the dereferencingof the pkg pointer in
(hv) is swizzled and assigned to a SMRC heap vari- the shipmentobject would cause a segmentation vi-
able. After this, all the SMRC heap methods can olation, asthe appropriate package object might not
be applied (e.g., get the entry point of the heap be resident in memory. However, asthe pkg pointer
with hp+get-root()). is declared as aSMRC external pointer, SMRC is able
to catch this violation, automatically query the da-
smrc-heap *hp; tabase forthe referenced package object, swizzle the
hp = swizzle (&hv); retrieved object, and install it in main memory so
objptr = (activity *) hp+get-root(); that the object can be referenced by C+ +.
11
rn
C++ APPLICATION
r=I==l SMRC API
HEAP MANAGER C++ APPLICATION
SWIZZLER
CACHEMANAGER
RDBMS
T
SQL API
SERVER
objects referenced via externalpointers. At the server table and object tables are persistent, along with the
side (Figure 3B), C + + objects canparticipate in SQL objects ina heap. They provide addressability of each
queries by registering the class methods as UDFS. object andpointerwithin the objects ina heap, which
Since the UDFs are executed on the server side, SMRC is required for pointer swizzling. Thus, a heap is
performs pointer swizzling before the methods are completely self-contained; itcan be shipped in
applied. clientherver environments and interpreted at each
destination.
SMRC heap manager. The SMRC heap manager is
the key component for the BLOB mapping. It sup- Pointer swizzling. When objects are retrieved from
ports the functionality of a full-fledged heap man- disk and reloaded into main memory, all main mem-
ager on the client side, including main memory man- ory pointers within the objects must be swizzled due
agement of all the objects that should be stored to object relocation. SMRC supports three different
within the same field of a relational table. A SMRC approaches for pointer swizzling-all threeap-
heap is segment-oriented and grows dynamically in proaches are implemented to support either the ADT
size. mapping, the BLOB mapping, or external pointers.
SMRC maintains two auxiliary data structures for the Deswizzlepointers-All the pointers within an ob-
management of the objects within a heap: a type ta- ject are deswizzled, i.e.,the current object address
ble and an object table for each type. The type table is subtracted from all the pointer addresses before
refers to the complete type descriptionin the schema an object is saved on disk, thereby making them
database and thus provides the heap manager with offsets to the beginning of the object. After object
the required object layout information. The type ta- retrieval, the pointers are swizzled by adding the
ble is built at heap creation time and is related to new object address to all the pointer addresses.
the persistence schema specified at heap creation Savepreviousobject load address-The previous ob-
time. The object table for each type isupdated dur- ject load address is saved on disk along with the
ing each object allocation or deletion in a heap. The object. After object retrieval from disk, the point-
object tables grow dynamically. The entries in the ers are swizzled bythe difference between the pre-
object tablerefer to the objects withinthe heap. Type vious and the new object load address.
SHIPMENT 0
4
f OVERSEAS
-
0
4
OVERSEAS -
8 ITINERARY[O] PART 8
48 12
I I
1 ... I __
SHIPMENT -
PART
Ontos’ approach (Vbase), making the vtables per- 2. Object faulting-When an object must befaulted
sistent as well, does not seem to be a p ~ r o p r i a t e . ~ ~ in, SMRC needs to be able toretrieve it from the
Aconstructorapproach is exploited by O+ + database.
(Ode),40 that introduces a “faked” new operator
(does not allocate memory). The new operator trig- SMRC launches an “under the cover” SQL state-
gers the executionof a constructor thatfixes all the ment to retrieve objects from the database:
hidden pointers. As no data members should ini- be
tialized with the constructor,all the defaultclass con- select (object-column)
structors of an application have to be rewritten in from (table)
order todistinguish whether they are used for pointer where (predicate)
fixing or usual object initialization. This approach is
not useful for SMRC, as it would require a recom- The SQL statement takes theOID as an input,which
pilation of parts of a class library, although thecon- is kept along with the external pointer causing the
structor rewriting can be triggered automaticallyby object fault. To provide all the input for the select-
a C + + precompiler. statement, an OID contains the following informa-
tion (20-byte structure):
Object cache and OIDs. The object cache, similar
to theSMRC heap (a superset,really), is part of the OID = {table-id, column-id, row-id}
application address space and can grow dynamically
in size. SMRC uses the object cache to manage au- Table-id and column-id are created outof the data-
tomatically faulted in objects via external pointers. base catalogs for the tables and columns in a data-
SMRC maintains an in-memory object table (hash ta- base.Databasecataloginformation is cached to
ble) with the object identifications (OIDS) of all the quickly translate the table-ids and column-ids in OIDs
loaded objects in an object cache. An OID uniquely to the corresponding table and column names for
identifies an object in the database and thuscan be the setupof the SQL statements. A row-id is similar
used for the following two purposes: to a system-generated primary key, but is not reus-
able. It uniquely identifies a record within a table
1.Object residency checks-Beforean object isfaulted and contains physical information to speed up da-
in automatically, SMRC must check whether ornot tabase a c c e s ~ .By
~ ’ having a row-id as part of the OID
the referencedobject is already loadedin the ob- in an external pointer, the faulting in of referenced
ject cache. objects can be very fast.
In the meantime, we developed the following exper- Sample query heap size 85 kilobytes
number of objects 950
iment. We compared the SMRC ADT mapping ap- swizzled pointers 4300
proach (which maps an object to a single column) elapsed time 83 milliseconds
to an approach that completely “flattens” the C+ +
class definitions and stores all the data membersin
additional table columns.s‘)In bothcases, however, tables. However, this experiment compares the
SMRC
we required that the language objectavailable
be so approach against an unnormalized table, andSMRC
that it can be passed to theUDF (time) to compute was still faster.
the query predicate. In the SMRC case, the object can
simply be retrieved and passedto the time method. BLOB mapping performance. The BLOB mapping
In the flattened case, however, the object must be approach was applied in a nontrivial sample appli-
reassembled before it can be passed to the time cation. l o The persistence schema contained more
method (thiswork of reassembling the objectis done than 160 SMRC flagged class definitions with approx-
in the UDF before the method call).” imately 260 persistent pointer definitions and sev-
eral definitions for unions, dynamic arrays, and func-
We populated theorders table from Figure1A with tion pointers (see Table 2). The type table in a heap
2000 C + + objects and executed a querythat did a had more than230 type entries (it included the em-
table scan and invoked the time method onthe C+ + bedded types) with more than610 pointer definitions
objects. We made thequery result empty, to factor (including the transient pointers). Atypical heap size
out the clienthervercommunication costs and thus contained approximately 950 objects that allocated
focus on the overheadof running SMRC in the server. 85 kilobytes of object memory and contained 4300
swizzled pointers. Given the size of the application,
Table 1 shows the performance of the two queries. swizzling of the entire heapwas completed in a re-
In bothcases-the SMRC approach and the class flat- spectable 83 milliseconds of elapsed time.If wewere
tening approach-the sameoriginal C + + time to flatten this data rather than use the BLOB map-
method was executed as partof the UDF invocation. ping, the equivalent relational operation would in-
The experiment shows that SMRC is able topreserve volve multiple joins acrossmany tables to reassem-
the C+ + object nature; C+ + methods can be ap- ble the C + + objects. Actually, the worst case of
plied after objectretrieval from the database and ob- normalizing all the data types-which could result
ject relocationin main memory. SMRC also performs in 160 tables (and could require a 160-way join)-
slightly better (approximately 7 percent in the ex- would probably cause the databasesystem to run out
periment) in comparison to a class flattening ap- of memory.
proach.Beforeconductingthisexperiment, we
thought that theSMRC approach might be faster than In contrastto theADT mapping, which maps an ob-
a normalized approach,mostly because of the over- ject to a single container, the BLOB mapping maps
head in restoring the objects from the normalized a heapof (possibly) heterogeneous objects to sin- a
1
190 REINWALD ET AL IBM SYSTEMS JOURNAL, VOL 35, NO 2, 1996
Vibby Gottemukkala IBMResearch Division, ThomasJ. Watson
Research Center,P.0.Box 704, YorktownHeights, NewYork 10598
(electronic mail:vibby@watson.ibm.com).Dr. Gottemukkala is a
research staff member in the Data Intensive Systems department
at the IBM Thomas J. Watson Research Center,where he is cur-
rently investigating issues in interfacing parallel applications and
databases, integrating database systemswith tertiary storage, and
designing databases for 64-bit architectures. His research inter-
ests also include storage architectures for parallel and distributed
databases, distributed sharedmemory and its application in da-
tabase systems, and database concurrency and coherence con-
trol. He received a Ph.D in computer science from the Georgia
Institute of Technology in 1995.