0% found this document useful (0 votes)
63 views6 pages

An Inter-Classes Obfuscation Method For Java Program

Uploaded by

Prajwal Atram
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
63 views6 pages

An Inter-Classes Obfuscation Method For Java Program

Uploaded by

Prajwal Atram
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 6

2008 International Conference on Information Security and Assurance

An Inter-Classes Obfuscation Method For Java Program

Xuesong Zhang, Fengling He, Wanli Zuo


College of Computer Science and Technology, Jilin University
Changchun, 130012, P.R.China
xs_zhang@126.com, Hefl@jlu.edu.cn, wanli@jlu.edu.cn

Abstract entire decrypted codes. At the same time, since byte


code cannot access stack directly, and does not able to
Software is a valuable form of data, representing modify its own code dynamically, native code’s
significant intellectual property, and reverse self-encryption technology cannot be used for Java
engineering of software code by competitors may programs.
reveal important technological secrets. This problem Under these circumstances, software obfuscation
becomes more serious when facing with the platform provides a more effective means of protection.
independent language—Java byte code. We introduce Obfuscation can prevents end users from
an inter-classes software obfuscation technique which understanding a program’s design and code, and
extracts the codes of some methods in user-defined therefore make reverse engineering uneconomical. The
classes and embeds them into some other object’s general idea of our proposed obfuscation scheme is as
methods in the object pool. Since all objects in the follows: one instance method invocation in Java
object pool are upcast to their common base type, language can be interpreted as a kind of special
which object’s method will really execute can only be unconditional jump in assembly language level, and all
ascertained at runtime. Thus, drastically obscured the those methods invocation can be transformed to a
program flow. Combined with some enhanced unified style, so long as they have the same parameter
mechanisms, this technique can even resist to dynamic and return type. This form of transformation will lead
analysis to a certain extent. Experimental result shows to a strong obfuscation result by further using of alias
that there is little influence to the execution and method polymorphism. We call this kind of
obfuscation algorithm as inter-classes obfuscation. In
efficiency.
Figure 1, the codes of some methods in user defined
class are extracted and embedded into some object’s
1. Introduction methods in the object pool. All the objects in the class
pool are inherited from the same super class, and their
In order to meet the platform independent relations are either paternity or sibling. Each object’s
characteristic, Java introduces a form of symbolic link DoIt method is the mergence of more than two
technique, which stores all the type and interface methods in user-defined classes. When the program
information into class file’s constant pool. These going to execute one merged method which is
information provide the loading on demand and mobile originally defined in user-defined class, a request is
property, but also facilitate decompilation at the same sent to the class pool, and the class pool will return one
time. Faced with such a neutral Java byte code object whose method is executed instead according to
language, traditional software encryption technology the request parameter. Since objects in the class pool
gains little effect. In the premise of distributed are upcast to their common base type, which object’s
computing, hardware-assisted encryption technology DoIt method will really execute can only be
cannot provide code mobility, and will greatly limit the ascertained at runtime.
interoperability of Java application; Encryption The rest of this paper is organized as follows:
through self-defined class loader cannot provide Section 2 reviews some related work. Section 3
effective protection either, this is due to the describes the inter-classes obfuscation scheme in
implementation of Java programs. In order to carry out detail. Section 4 discussed some enhanced obfuscation
security checks, all of the class related byte codes must mechanisms. In section 5, some experimental results
be completely loaded into memory before their are given. Finally, Section 6 concludes the paper with
execution, malicious users can directly extract the some future work.

978-0-7695-3126-7/08 $25.00 © 2008 IEEE 360


DOI 10.1109/ISA.2008.49
Object pool such as illegal identifiers, nested type names. However,
A.Job1 GetObject() O.DoIt()
Ob1 this kind of source code level rules violation can be
... ...
Ob2 repaired at byte code level by some automated tools[5].
B.Job3 GetObject() O.DoIt() Thus, only has limited protection ability.
...
... ... Sonsonkin et al.[6] present high-level
Obn
K.Job7 GetObject() O.DoIt() transformations of Java program structure — design
obfuscation. Class coalescing replaces several classes
Figure 1 Object pool model with a single class. Class splitting replaces a single
class with multiple classes. If class splitting is used in
2. Related work tandem with class coalescing, program structure would
be changed very significantly, which can hide design
The first formal definition of obfuscation was given concept and increase difficulty of understanding.
by Collberg et al. [1]. They defined an obfuscator in Sakabe et al.[7] concentrate on the object oriented
terms of a semantic-preserving transformation function nature of Java. Using polymorphism, they encapsulate
T which maps a program P to a program P’ such that if all method parameters and return types through new
P fails to terminate or terminates with an error, then P’ defined classes to hide information, and introduce
may or may not terminate. Otherwise, P’ must opaque predicates around new object instantiations to
terminate and produce the same output as P. confuse the true type of the object. In fact, our
Obfuscating transformation can classified into four proposed method is similar to this technique. However,
categories: their method is more focused on inner obfuscation of
Lexical obfuscation: Changes or removes useful single object, with the expansion of the scope of
information from the intermediate language code or protection, their approach will lead to a sharp increase
source code, e.g. removing debugging information, in program size, and sharp decline in performance. In
comments, and scrambling/renaming identifiers. our scheme, more effort is spent on cross obfuscating
Lexical obfuscations are not alone sufficient because a among multiple objects, influence to program size and
determined attacker can infer the meaning of program performance is relatively low.
identifiers from the context.
Layout obfuscation: Weakens the internal inherent 3. Inter-classes obfuscation
logic between elements, e.g. changing the inherit
relation of class, method in-lining, splitting and In Java, an application consists of one or more
merging, etc packages, and it may also use some packages in the
Data obfuscation: Targets data and data structures standard library or other proprietary libraries. The part
contained in the program, complicates their operations of a program that will be obfuscated by the obfuscation
and obscures their usage, e.g. changing data encoding, techniques is called the obfuscation scope. In this
variable and array splitting and merging. paper, obfuscation scope only refers to those packages
Control-flow obfuscation: Alters the flow of control developed by the programmer himself.
within the code, e.g. reordering statements, methods,
loops and hiding the actual control flow behind
3.1. Invocation format unification
irrelevant conditional statements.
Because pointer and goto statement are not support
There are usually different parameters and return
in Java, some obfuscating techniques which have a
types among different methods in a program. In order
strong sense of protection[2, 3] cannot apply to Java
to perform inter-classes merging, their invocation
program straightforwardly. Hence, obfuscating of Java
formats should be unified. The two classes import in
program needs more refinement design. Currently
Figure 2 are used for this purpose. They encapsulate
proposed methods include:
the parameters and return type for any method. In these
Chan et al.[4] bring forward an advanced identifier
two classes, all non-primitive data types are
scrambling algorithm. They utilize the hierarchy
represented by some items in the object array aO. The
characteristic of jar package, i.e. a sub package or a
ParamObject has a few more Boolean fields than the
top-level type (classes and interfaces) may have the
ReturnObject, they are used to select the execution
same name as the enclosing package. Sequentially
branch after multiple methods have been merged.
generated identifiers are used to replace those original
There are three flag in ParamObject, and thus at most
identifiers in a package, and the generation of
four different methods can be merged into one DoIt
identifiers is restarted for every package. They also use
method. The interface DoJob declares only one method
the gap between a Java compiler and a Java virtual
DoIt, which uses ParamObject as its formal parameter
machine to construct source code level rules violation,

361
and ReturnObject as its return type. All the methods to 3.2. Inter-classes merging
be merged will eventually embedded into the DoIt
method of some subclasses of DoJob. In determining which method can be merged, some
public class ParamObject { factors such as inheritance relation and method
public double[] aD; public float[] aF; dependency relation must take into consideration.
Methods in the following scope should not be
public long[] aL; public int[] aI;
obfuscated.
public short[] aS; public byte[] aY;
y Method inherited from (implements of an abstract
public char[] aC; public boolean[] aB;
method or overrides an inherited method) super class
public Object[] aO; or super interface that is outside of the obfuscation
boolean flag1, flag2, flag3; scope.
} y Constructor, callback function, native method and
public class ReturnObject { finalize method
public double[] aD; public float[] aF; y Method declaration with throws statement
public long[] aL; public int[] aI; y Method which access inner-class
public short[] aS; public byte[] aY; y Method inside which invoke other non-public
public char[] aC; public boolean[] aB; method which inherited from super class or super
public Object[] aO; interface that is outside of the obfuscation scope.
} Two classes are defined in Figure 3, A and B. Figure
public iterface DoJob { 4 shows a possible merging instance. The method
public ReturnObject DoIt (ParamObject p); DoJobA1 of class A and the method DoJobB2 of class
} B are merged into one DoIt method. Those flag fields
not used in the ParamObject can be used to control the
Figure 2 Unification of invocation format execution of garbage code, which forms a kind of
obfuscating enhancement. The garbage code here
public class A { refers to the code that can executes normally, but will
public int DoJobA1(int x){ . . . } no destroy the data or control flow of the program.
public long DoJobA2(double x, double y){ . . . } Method polymorphism: If the merged method is
} inherited from super class or super interface that is
public class B { inside of the obfuscation scope, all methods with the
public int DoJobB1(int x, int y){ . . . } same signature (method name, formal parameter type
public char DoJobB2(String s){ . . . } and numbers) in the inherited chain should also be
public boolean DoJobB3(boolean b, char c){. . . } extracted and embedded into some DoIt methods
} respectively.
{ Method dependency: The merged method invokes
a = new A(); b = new B(); other method defined in current class or its super
a.DoJobA1(10); b.DoJobB3(false, 'A'); class, such as there is an invocation to DoJobA2 inside
} method DoJobA1. There are two approaches:
y If DoJobA2 is a user-defined method, it can be
Figure 3 Original classes definition and merged further, or else modify its access property to
method invocation public, and do the same as the second approach.
t
y Put the instance a of class A into the object array of
public class DoJob1 implements DoJob { ParamObject as a parameter, and use explicit
public ReturnObject DoJob(ParamObject p) { binding inside the corresponding DoIt method, i.e.
ReturnObject o = new ReturnObject(); DoJobA2 is converted to a. DoJobA2.
if(p.flag1){ //DoJobB2 } Field dependency: The merged method uses the field
else if(p.flag2){ //DoJobA1 } defined in current class or its super class. There are
else{ //Garbage code } two approaches also:
return o; y Add determiners for fields accessed by this method,
} which is similar to the second approach in method
} dependency. But this approach is not suitable for the
non-public field which inherited from super class
Figure 4 Inter-classes method merging that is outside of the obfuscation scope.
y Add GetFields and SetFields method for each class.
The GetFields returns an instance of ReturnObject

362
which includes fields used by all methods that are to is different from the key to request the same object,
be merged, and this instance is put into the object their relation and hashing table itself may be protected
array of the ParamObject. Code in DoIt method can by other data or control obfuscating algorithm. In
use this parameter to refer to the fields in the original Figure 6, invocation to class A’s mehod DoJobA1 is
class. After the execution of DoIt, an instance of replaced by invocation to DoIt method in one of
ReturnObject is transferred back by invoking the DoJob’s subclass.
SetFields method which making changes to the
UHash.Add(keyl, DoJob9)
fields in the original class. Level 1
Slot1 DoJob
3.3. Object pool Level 2
Slot2 Slot1
A lot of collection data types provided by JDK can Slot2 DoJob

...
be used to construct the object pool, such as List, Set

...
and Map etc.. However, these existing data types have
standard operation mode, which will divulge some Slotk DoJob
inner logical information of the program. We use the
DoJob o = UHash.Get(keym)
universal hashing to construct the object pool here.
The main idea behind universal hashing[8] is to Figure 5 Structure of hashing table
select the hash function at random from a carefully
designed class of functions at the beginning of DoJob1 dojob = new DoJob1(); a
execution. Randomization guarantees that no single UHash.Add( 217, dojob);
input will always evoke worst-case behavior. Because ...
of the randomization, the algorithm can behave //a.DoJobA1(10)
differently on each execution, even for the same input, ParamObject p = new ParamObject();
guaranteeing good average-case performance for any int[] i = new int[1]; i[0] = 10;
input. p.aI = d; p.flag2 = true;
Definition 1: Let H be a finite collection of hash DoJob dojob = UHash.Get( 3 * a1+ a2 + 13 );
functions that map a given universe U of keys into the ReturnObject r = do.DoIt(p);
range {0, 1, … , m - 1}. Such a collection is said to be a
universal if for each pair of distinct keys k, l U , the Figure 6 Invocation to class A’s mehod
number of hash functions h H for which h(k) = h(l) is DoJobA1 is replaced by invocation to DoIt
at most |H| / m. method in one of DoJob’s subclass
One construction algorithm for a universal class of However, using key to access hashing table directly
hash functions is: Choosing a prime number p large will cause some problem when in face of method
enough so that every possible key k is in the range 0 to polymorphism. Consider the inherited relation in
p - 1, inclusive. p is greater than the number of slots in Figure 7, if all methods in class A are extracted and
the hash table m, Let Zp1 denote the set {0, 1, ..., p - 1}, embedded into some DoIt methods, method extraction
and let Zp2 denote the set {1, 2, ..., p - 1}. For any a1∈ should be performed during ecch method overridden in
Zp1 and a2 ∈Zp2, the following equation makes up of a subclasses of A. Due to the complexity of points-to
collection of universal hashing functions. analysis, it’s hard to determine which instance a will
refers to in the statement a.DoJob1. As a result, it
ha1 , a2 (k ) = ((a1 k + a 2 ) mod p) mod m cannot be determined either that which key should
We use universal hashing table to store all instances used to access the hashing table. Under this situation,
of classes inherited from DoJob. If collision occurs, one method GetIDs should append to the super class A.
second level hashing table is established in the GetIDs will return an array includes all keys
corresponding slot. The randomizahion characteristic corresponding to those methods in current class which
of universal hashing enable us assign different values have been merged into the object pool. If subclass
to a1 and a2 each time the program start. Some overrides any method of parent class, the GetIDs
expressions are constructed based on a1 and a2, and method should also be overridden. Figure 8 shows the
used as the key to access hashing table. Under this return arrays of each class corresponding to Figure 7.
condition, the key is no longer a constant, and better All IDs of the overridden method have the same
information hiding result obtained. Figure 5 shows the position in the array as the original method in super
structure of hashing table. Instance of class DoJob9 is class. In this way, the statement a.DoJob1 can be
stored by key keym, and the same instance is acquired replaced by invocation to Uhash.Get with the first
by key keyl. Notice that the key used to store an object element in the array as the parameter.

363
A types.
Random assignment of parameters: Since only parts
DoJob1()
of the fields in ParamObject are used during one
DoJob2()
execution of a merged method, they may be used to
DoJob3() perform pattern matching attack by malicious users.
Those unused fields can be randomly assigned any
value before invocating to DoIt, and some garbage
B C codes which use these fields are added.
DoJob1() DoJob1() Hashing table extension: Extend some slots to insert
DoJob2() new objects. The parameters used in DoIt method of
A a; these newly inserted object are different from those
... D used in the already exists object’s DoIt method. When
a.DoJob1(); DoJob3() locating the slot which has more than one object by the
a.DoJob2(); given key, randomly select and return one. Before
a.DoJob3(); enter the corresponding execution branch according to
Figure 7 Method polymorphism the given flag, a check will be made to ensure whether
A
those formal parameters in ParamObject are valid or
ID1 ID2 ID3 not, including fields used by following instructions
B ID4 ID2 ID3 should not be null, and fields not used should be null.
If parameter mismatch found, an exception is thrown.
C ID5 ID6 ID3 Now the DoIt invocation code is enclosed in a loop
ID5 ID6 ID7 block (Figure 9), and following instruction will not
D
execute until a success invocation.
Figure 8 Array returned from different class by while(true){
calling its GetIDs method try{
dojob = UHash.Get( 572 );
4. Obfuscating enhancement r = do.DoIt(p);
break;
In order to perform effective attack, which instance }catch(ParamMismatchException e){
each DoJob references to should be precisely located. continue;
Since all objects added in to the object pool have been }
upcast to their super class DoJob, and different keys }
are used to store and access the same object in hashing
table, relying solely on static analysis it is not feasible. Figure 9 The DoIt method invocation model
However, frequently accessing of hashing table, and after hashing table extension
the if-else block partitioned according to flag in DoIt Dynamic adjusting of object pool: Multi-rules can be
method still leak some useful information to a adapted to construct the object pool. Randomly select
malicious end user. These information may be used as one operation rule at start point, and introduce a new
the start point of dynamic attack. There are many thread by which readjust the operation rule once in a
mechanisms to hide these information. while. The key used to access object pool should also
Multi-duplication: Make multi duplication to some be modified along with the rule change. Clearly,
merged methods. Each duplication is transformed by combined with the previous mechanism, this
different algorithm to have distinct appearance, such as enhancing measure can withstand dynamic analysis to
parameter type conversion, variable or array a certain extent.
compressing and decompressing, splitting and
merging. Every time a merged method execute, select 5. Experimental result
different object has the same functionality.
Method interleaving: Methods merged into the same We extend the refactor plugin in Eclipse, and apply
DoIt are branch selected by the flag value, bears the our scheme to five java programs. Currently, only the
obvious block characteristic. Further action may be basic obfuscating method is implemented, not
taken to interleave these blocks, obscuring their including those enhanced mechanisms such as
boundaries. The Boolean type flags can be obscured at multi-duplication, method interleaving etc. Some of the
the same time, e.g. importing opaque predicate, programs are Java applets which will never terminate
replacing one Boolean type with two or more integer without user action. We only measure their execution

364
Table 1 Experimental result by using only the basic obfuscation method
Method Before After
Program Description Ratio
Merged obfuscation obfuscation
Jar file size (byte) 13755 21892 1.58
WizCrypt File encryption tool 8
Execution time (sec) 50.46 51.25 1.02
Jar file size (byte) 14558 23497 1.62
MultiSort Collection of fifteen sortin algorithms 17
Execution time (sec) 102.06 107.83 1.06
Jar file size (byte) 16772 26123 1.56
Draw Draw random graphs 11
Execution time (sec) 6.12 6.23 1.02
Jar file size (byte) 87796 97149 1.11
ASM Artificial stock market 29
Execution time (sec) 31.20 32.38 1.04
Jar file size (byte) 59030 68555 1.17
DataReport Report generator for electric power data 22
Execution time (sec) 8.71 9.15 1.05
time in finite cycles. For an example, the ASM attacks to evaluate their actual security. Source code
program will simulate the stock market forever, and the level obfuscation can be regard as a special kind of
corresponding execution time given in Table 1 is based software refactoring, how to apply more refactoring
on the first thirty cycles. techniques to software obfuscation also need further
The experimental environment is as follows. research.
y Processor: Intel Pentium IV, 2.4GHz
y Memory: 1G RAM References
y OS: Windows 2003 Service Pack 1
y Compiler: j2sdk-1_6_0_10 [1] Collberg, C., Thomborson, C., and Low, D., A
From Table 1, the program size increasing ratio Taxonomy of Obfuscating Transformations. Technical
after obfuscated lies in between 1.11 and 1.62. With Report#148. 36 pp. Department of Computer Science,
the size growing of the original program, the ratio The University of Auckland, New Zealand. 1997.
presents a downward trend. This is because of the fact [2] Wang, C., Hill, J., Knight, J.C., and Davidson, J.W.,
Protection of software-based survivability mechanisms.
that all newly inserted codes are mainly used for object In Proceedings of the 2001 conference on Dependable
pool definition and operation, while the codes used for Systems and Networks. IEEE Computer Society. Pages
method merging and invocations are relatively few. 193-202. 2001.
The largest execution time decline is no more than 6%. [3] S. Chow et al., An Approach to the Obfuscation of
In fact, some of the merged methods are invoked more Control-Flow of Sequential Computer Programs, Proc.
than 10000 times, such as the join method in MultiSort. 4th Int’l Conf. Information Security, LNCS 2200,
However, since all objects in the object pool have been Springer-Verlag, pp. 144-155, 2001.
initialized soon after program start, accessing to the [4] Chan, J.T. and Yang, W., Advanced obfuscation
object pool the will only return an object which has techniques for Java bytecode, Journal of Systems add
Software 71, No.2. pp. 1~11, 2004.
already been instantiated. And at the same time, the
[5] Cimato, S., De Santis, A., and Ferraro Petrillo, U.,
classes ParamObject and ReturnObject are directly Overcoming the obfuscation of Java program by
inherited from Object, apart from the need for loading, identifier renaming, Journal of Systems and Software
linking and initialization during their first creation, the 78, pp. 60-72, 2005.
follow-up instantiation is only a small amount of work. [6] M. Sosonkin, G. Naumovich, and N.Memon,
Thus, our proposed scheme has little performance Obfuscation of design intent in object-oriented
influence to the original program. applications. In DRM ’03: Proceedings of the 3rd ACM
workshop on Digital rights management, pages
142–153, New York, NY, USA, 2003. ACM Press.
6. Conclusion [7] Y. Sakabe, M. Soshi, and A. Miyaji, Java obfuscation
with a theoretical basis for building secure mobile
In this paper, we introduced a software obfuscation agents. In Communications and Multimedia Security,
scheme by inter-classes method merging, which is pages 89–103, 2003.
applicable to any object oriented language. In the next, [8] J. Lawrence Carter and Mark N. Wegman, Universal
we will implement those enhanced mechanisms classes of hash functions. Journal of Computer and
discussed in Section 4, and apply a various forms of System Sciences, 18(2): pp. 143–154, 1979

365

You might also like