Data Obfuscation and Encryption
Data Obfuscation and Encryption
Jared Still
Certified Oracle DBA and Part Time Perl Evangelist
jkstill@cybcon.com
Data encryption is a hot topic these days. This is hardly a new subject, but has
received and increasing amount of attention, largely due to ecommerce. Protecting
credit card numbers, medical data and other sensitive information has become more
important than ever before, and on a larger scale.
Please keep in mind that encryption in this article refers to some method of modifying
data so that it is meaningless and unreadable in it’s encrypted form. It also must be
reasonably secure, that is it must not be easily decrypted without the proper key.
Anything less than that will be referred to as obfuscation. This is data that is rendered
unusable by some means, but is not considered as a serious form of encryption.
Examples will follow.
Obfuscation
Why would you want to merely obfuscate data, rather than use a strong encryption
algorithm?
A good example would be an audit report on a medical system. This report may be
generated for an external auditor, and contain sensitive information. The auditor will be
examining the report for information that indicates possible cases of fraud or abuse.
Assume that management has required that Names, Social Security Numbers and other
personal information should not be available to the auditor except on an as needed
basis.
The data needs to be presented to the auditor, but in a way that allows the examination
of all data, so that patterns in the data may be detected.
Encryption would be a poor choice in this case, as the data would be rendered into
ASCII values outside of the range of normal ASCII characters. This would be
impossible to read.
A better choice might be to obfuscate the data with a simple substitution cipher. While
this is not considered encryption, it may be suitable for this situation.
When the auditor finds a possible case of abuse, he will need the real name and SSN of
the party involved. He could obtain this by calling a customer service representative at
the insurance company that supplied the report, and ask for the real information.
The obfuscated data is read to the customer service rep, who then inputs it into an
application that supplies the real data.
end;
/
xlate_from varchar2(62) :=
‘0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz’;
xlate_to varchar2(62) :=
‘nopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklm’;
end;
/
As you can see, it wouldn’t be very difficult to decipher this scheme given enough data.
A somewhat more effective method involves chopping the text into segments and
rearranging it as well as obfuscating it. Below is some sample output from this
algorithm.
OBFS OBFS
---------- ----------
540407786 &24B23B&Z
542800170 -4B*23&&&
542802063 -4Z&23-&_
541466830 *2_423ZZ&
While this is still not encryption, this data would be more difficult to decipher without the
key. Source code for this in PL/SQL is available at the URL provided at the end of this
article.
Another way to hide sensitive data is through masking. This is different from the
previous example in that the clear text cannot be reconstructed from the displayed data.
This is useful in situations where it is only necessary to display a portion of the data. A
good case for this method is the receipts printed at gas stations and convenience
stores. When a purchase is made with a credit card, the last 4 digits of the credit are
often displayed as clear text, while the rest of the credit card number has been masked
with a series of X’s.
This method can also be used for reports where the person reading the report requires
only a portion of the sensitive data. This method is also commonly used for the account
numbers on printed transactions from ATM’s.
Encryption
Ok, you’ve decided that your data is too sensitive to trust to anything other than a real
encryption algorithm. You must use some form of encryption to give reasonable
assurance that sensitive information will be hidden from anyone other than those
entitled to it.
The choices of algorithms and implementation methods you can make use of are
dependent on the version of Oracle that your database uses. All solutions discussed
here will be either home grown software, or solutions provided with the Oracle RDBMS,
and operate inside of the database.
If you need to encrypt data in a version 7 database, your choices are somewhat limited.
The only solution outside of third party software is to use an external program to encrypt
and decrypt data.
While a program written in C could do this very quickly, it precludes the ability to encrypt
and decrypt data in a SQL statement, limiting the usefulness of encryption.
The only solution within the database is to write an encryption routine in PL/SQL.
This works well for limited amounts of data, but may be unwieldy for large amounts of
data. This is because PL/SQL is quite a bit slower than a language such as C for this
type of operation.
TABLE 1
Table CRYPT_TEST
Column Name Type
------------ ----------------------------
SOC_SEC_NBR VARCHAR2(9)
CIPHER_SSN VARCHAR2(16)
The complete PL/SQL source code is too lengthy to include in this article, but I will show
the package header. The complete source code with examples can be found at the
URL at the end of the article.
end rc4;
As of Oracle 8, it is possible to write external library routines in C, and call them from
within a SQL statement or PL/SQL block. A share library native to the Os must be
created. With NT this would be a DLL. In many versions of Unix this is a .so file. This
method will provide much faster results than a pure PL/SQL implementation.
External libraries are somewhat difficult to setup however. When an external callout is
made, Oracle starts a dedicated process for each session to interface to the shared
1
RC4 is a very fast stream cipher created by Ronald Rivest, one of the founders of RSA
library. The listener.ora, sqlnet.ora and tnsnames.ora files must all be configured
properly to launch this process.
The SQL*Net modifications that worked for this on our Solaris 2.6 with an Oracle 8.1.6
database are included in a README file that is available at the URL at the end of this
article.
For my tests, I again used the streaming cipher RC4, this time coded in C. The
performance of this test was 24 times faster than RC4 as implemented in PL/SQL.
1000 rows of data were update in .75 seconds, as compared to about 18 seconds for
the PL/SQL version of RC4.
While this is quite a bit faster, I was actually expecting better performance than that.
More on that later.
There is though one caveat on this RC4 implementation. While writing this article, I
discovered that my C implementation of RC4 as modified for use with Oracle is less
than perfect.
It has the nasty habit of occasionally truncating the encrypted data at some point. So I
have to make the disclaimer that this code is FOR DEMONSTRATION PURPOSES
ONLY.
Its purpose is to demonstrate the concept of using an external library to encrypt data in
the database, and for comparing the performance of this method to other methods.
This source code is also posted on my web site. It will include a notification stating if
the bug still exists.
v_data varchar2(500);
clear_len integer;
crypt_len integer;
begin
v_data := data_in;
clear_len := length(v_data);
dummy := rc4p( key_in, v_data );
return v_data;
end;
end;
One of the chief advantages of using this package is that it does not require any of the
setup that is required for using external libraries. You must be using at least version
8.1.6 of the Oracle RDBMS to make use of this package.
Creating it is as simple as logging into the database as SYS and running the package
$ORACLE_HOME/rdbms/admin/catobtk.sql.
It is also a good performer. This implementation of DES was able to update 1000 rows
in about 0.8 seconds in testing. Even though it is fast, the performance was still a
concern. This can be seen by the timing results in Table 1.
Doing an UPDATE of 1000 rows, and encrypting a single column in each row, the DES
encryption took about three times longer to do the update than if no form of encryption
was used.
During SELECT’s though, the performance difference was much less noticeable, as
decrypting a column added about 40% to the time required for a query without any
encryption.
My motivation for using RC4 to encrypt data was to increase the performance of
updating encrypted data. RC4 is about 10 times faster than DES, yet the results show
that RC4 has a very small performance advantage over DES, at least as implemented
in an Oracle database.
This difference leads me to believe that most of the extra overhead in encrypting data is
not in the actual encryption routine, but is in the interface between the Oracle engine
and the encryption routines. I haven’t attempted to prove this yet, but the observed
performance during testing suggests it.
One small advantage that RC4 has over DES is that data can be encrypted in place in a
table. In the text below Table 1 you will see the encrypted column is larger than the
column for the clear text.
DES is a block cipher, and Oracle’s implementation of it requires that the length of the
data be a multiple of 8. The functions I used to interface to the DES routines will pad
the incoming data so that the length is a multiple of 8. Therefore, the size of any
encrypted column will need to be rounded up to the next multiple of 8.
i
Armed with these few tools, this article can serve as springboard for further exploration
of encryption in the database.
This article is not exhaustive by any means. There is a lot of background material
available for cryptography if you are interested in learning more about this.
Web sites:
http://www.mach5.com/crypto
http://www.counterpane.com/sites.html
The definitive source for making application of cryptography is the book Applied
Cryptography by Bruce Schneier.