HPJava Seminar Report
HPJava Seminar Report
HPJava
1. Introduction
The explosion of java over the last year has been driven largely by its in role in bringing a new generation of interactive web pages to World Wide Web. Undoubtedly various features of the languages-compactness, byte code portability, security, and so onmake it particularly attractive as an implementation languages for applets embedded in web pages. But it is clear that the ambition of the Java development team go well beyond enhancing the functionality of HTML documents. Java is designed to meet the chalanges of application development on the context of heterogeneous, network-wide distributed environments. Paramaount amoung these chalanges is secure delivery of applications that consume the minimum of systems resources, can run on any hardware and software platform, can be extended dynamically.
Several of these concerns are mirrored in developments in the High Prerformance Computing world over a number of years. A decade ago the focus of interest in the parallel computing community was on parallel hardware. A parallel computer was typically built from specialized processers through a proprietary high-performance communication switch. If the machine also had to be programmed in a proprietary language, that was an acceptable price for the benefits of using a supercomputer. This attitude was not sustainable as one parallel architecture gave way to another, and cost of porting software became exorbitant. For several years now, portability across platforms had been a central concern in parallel computing.
Dept. of CSE 1 MESCE, Kuttippuram
Seminar Report03
HPJava
HPJava is a programming language extended from Java to support parallel programming, especially (but not exclusively) data parallel programming on message passing and distributed memory systems, from multi-processor systems to workstation clusters.
Although it has a close relationship with HPF, the design of HPJava does not inherit the HPF programming model. Instead the language introduces a high-level structured SPMD programming style-the HPspmd model. A program written in this kind of language explicitly coordinates well-defined process groups. These cooperate in a loosely synchronous manner, sharing logical threads of control. As in a conventional distributed-memory SPMD program, only a process owning a data item such as an array element is allowed to access the item directly. The language provides special constructs that allow
Besides the normal variables of the sequential base language, the language model introduces classes of global variables that are stored collectively across process groups. Primarily, these are distributed arrays. They provide a global name space in the form of globally subscripted arrays, with assorted distribution patterns. This helps to relieve programmers of error-prone activities such as the local-to-global, globalto-local subscript translations which occur in data parallel applications.
In addition to special data types the language provides special constructs to facilitate both data parallel and task parallel programming. Through these constructs, different processors can either work
Dept. of CSE 2 MESCE, Kuttippuram
Seminar Report03
HPJava
simultaneously on globally addressed data, or independently execute complex procedures on locally held data. The conversion between these phases is seamless.
In the traditional SPMD mold, the language itself does not provide implicit data movement semantics. This greatly simplifies the task of the compiler, and should encourage programmers to use algorithms that exploit locality. Data on remote processors is accessed exclusively through explicit library calls. In particular, the initial HPJava implementation relies on a library of collective communication routines originally developed as part of an HPF runtime library. Other distributed-array-oriented communication libraries may be bound to the language later. Due to the explicit SPMD programming model, low level MPI communication is always available as a fall-back. The language itself only provides basic concepts to organize data arrays and process groups. Different communication patterns are implemented as library functions. This allows the possibility that if a new communication pattern is needed, it is relatively easily integrated through new libraries.
2. Overview of HPJava
HPJava stands for high performance java. Java already provides parallelism through threads. But that model of parallelism can only be easily exploited on shared memory computers. HPJava is targetted at distributed memory parallel computers (most likely, networks of PCs and workstations).
Dept. of CSE
MESCE, Kuttippuram
Seminar Report03
HPJava
HPJava project started around 1997, growing out of groups earlier HPF work.People were starting to talking about java Grande and the possibility that eventually java could be a high-performance language. The reason was that the syntax was considerably simpler, and perhaps there was scope to extend it with the features we wanted for data-parallel computing.
Dept. of CSE
MESCE, Kuttippuram
Seminar Report03
HPJava
3. Characterestics
We have explored the practicality of doing parallel computing in Java, and of providing Java interfaces to High Performance Computing software. For various reasons, the success of this exercise was not a foregone conclusion. Java sits on a virtual machine model that is significantly different to the hardware-oriented model which C or Fortran exploit directly. Java discourages or prevents direct access to the some of the fundamental resources of the underlying hardware (most extremely, its memory).
Which is the better strategy? In the long term Java may become a major implementation language for large software packages like MPI. It certainly has advantages in respect of portability that could simplify implementations dramatically. In the immediate term recoding these packages does not appear so attractive. Java wrappers to existing software look more sensible. On a cautionary note, our experience with MPI suggests that interfacing Java to non-trivial communication packages may be less easy than it sounds. Nevertheless, we intend in the future to create a Java interface to an existing run-time library for data parallel computation.
It still has to be demonstrated that Java can be compiled to code of efficiency comparable with C or Fortran. Many avenues are being
Dept. of CSE 5 MESCE, Kuttippuram
Seminar Report03
HPJava
followed simultaneously towards a higher performance Java. Besides the Java chip effort of Sun, it has been reported at this workshop that IBM is developing an optimizing Java compiler which produces binary code directly, that Rice University and Rochester University are working on optimization and restructuring of bytecode generated by javac, and that Indiana University is working on source restructuring to parallelize Java. Parallel interpretation of bytecode is also an emerging practice. For example, the IBM JVM, an implementation of JVM on shared memory architectures, was released in spring 1996, and UIUC has recently started work aimed at parallel interpretation of Java bytecode for distributed memory systems.
Another promising approach under investigation is to integrate interpretation and compilation techniques for parallel execution of Java programs. In such a system, a partially ordered set of interpretive frames is generated by an II/CVM compiler. A frame is a description of some subtask, whose granularity may range from a single scalar assignment statement to a solver for a system of equations. Under supervision of the virtual machine (II/CVM), the actions specified in a frame may be performed in one of three ways:
directly, which
also
Some precompiled computational library function is invoked locally to accomplish the task; this function may be executed sequentially or in parallel.
The frame is sent to some registered remote system, which will get the work done, once again either sequentially or in parallel.
Dept. of CSE
MESCE, Kuttippuram
Seminar Report03
HPJava
With this approach, optimized binary codes for well formed computation subtasks exist in runtime libraries, supporting a high level interpretive environment. Task parallelism is observed among different frames executed by the three mechanisms simultaneously, while data parallelism is observed in the execution of some of the runtime functions. Presuming these efforts satisfactorily address the performance issue, the second aspect in question concerns expressiveness of the Java language. Our final interface to MPI is quite elegant, and provides much of the functionality of the standard C and Fortran bindings. But creating this interface was a more difficult process than one might hope, both in terms of getting a good specification, and in terms of making the implementation work. The lack of features like C++ templates (or any form of parametric polymorphism) and user-defined operator
overloading (available in many modern languages, from functional programming languages to Fortran) made it difficult to produce a completely satisfying interface to a data parallel library. The Java language as currently defined imposes various limits to the creativity of the programmer. In many respects Java is undoubtedly a better language than Fortran. It is object-oriented to the core and highly dynamic, and there is every reason to suppose that such features will be as valuable in scientific computing as in any other programming discipline. But to displace established scientific programming languages Java will surely have to acquire some of the facilities taken for granted in those languages. Popular acclaim aside, there are some reasons to think that Java may be a good language for scientific and parallel programming .
Dept. of CSE 7 MESCE, Kuttippuram
Seminar Report03
HPJava
Java is a descendant of C++ .C and C++ are used increasingly in scientific programming; they are already used almost universally by implementers of parallel libraries and compilers. In recent years numerous variations on the theme of C++ for parallel computing have appeared.
Java omits various features of C and C++ that are considered difficultnotably, pointers. Poor compiler analysis has often been blamed on these features. The inference is that Java, like Fortran, may be a suitable source language for highly optimizing compilers (although direct evidence for this belief is still lacking).
Java comes in built in multithreading. Independent threads may be scheduled on different processor by a suitable runtime. In any case multithreading can be very convenient in explicit message-passing styles of parallel programming. HPJava is a language for parallel programming, especially suitable
Seminar Report03
HPJava
The type-signatures and constructors of the multidimensional array use double brackets to distinguish them from ordinary arrays:
int [[,]] a = new int [[5, 5]] ; float [[,,]] b = new float [[10, n, 20]] ; int [[]] c = new int [[100]] ; a, b and c are respectively 2-, 3- and one- dimensional arrays. Of course c is very similar in structure to the standard array d, created by int [] d = new int [100] ; c and d are not identical, though.
Access to individual elements of a multidimensional array goes through a subscripting operation involving single brackets, for example for(int i = 0 ; i < 4 ; i++) a [i, i + 1] = i + c [i] ;
For reasons that will become clearer in later sections, this style of subscripting is called local subscripting. In the current sequential context, apart from the fact that a single pair of brackest may include several comma-separated subscripts, this kind of subscripting works just like ordinary Java array subscripting. Subscripts always start at zero, in the ordinary Java or C style (there is no Fortran-like lower bound). In general our language has no idea of Fortran-like array assignments. int [[,]] e = new int [[n, m]] ; ... a=e;
Dept. of CSE
MESCE, Kuttippuram
Seminar Report03
HPJava
In the assignment simply copies a handle to object referenced by e into a. There is no element-by-element copy involved. Similarly we introduce no idea of elemental arithmetic or elemental function application. If e and a are arrays, the expressions e+a Math.cos(e) are type errors.
Our HPJava does import a Fortran-90-like idea of array regular sections. The syntax for section subscripting is different to the syntax for local subscripting. Double brackets are used. These brackets can include scalar subscripts or subscript triplets.
A section is an object in its own right--its type is that of a suitable multi-dimensional array. It describes some subset of the elements of the parent array. This is slightly different to the situation in Fortran, where sections cannot usually be captured as named entities. int [[]] e = a [[2, 2 :]] ;
foo(b [[ : , 0, 1 : 10 : 2]]) ; e becomes an alias for the 3rd row of elements of a. The procedure foo should expect a two-dimensional array as argument. It can read or write to the set of elements of b selected by the section. As in Fortran, upper or lower bounds can be omitted in triplets, defaulting to the actual bound of the parent array, and the stride entry of the triplet is optional. The subscripts of e, like any other array, start at 0, although the first element is identified with a [2, 2].
Dept. of CSE
10
MESCE, Kuttippuram
Seminar Report03
HPJava
In our language, unlike Fortran, it is not allowed to use vectors of integers as subscripts. The only sections recognized are regular sections defined through scalar and triplet subscripts.
The language provides a library of functions for manipulating its arrays, closely analogous to the array transformational intrinsic functions of Fortran 90: int [[,]] f = new int [[5, 5]] ; HPJlib.shift(f, a, -1, 0, CYCL) ;
float g = HPJlib.sum(b) ;
The shift operation with shift-mode CYCL executes a cyclic shift on the data in its second argument, copying the result to its first argument-an array of the same shape. In the example the shift amount is -1, and the shift is performed in dimension 0 of the array--the first of its two dimensions. The sum operation simply adds all elements of its argument array. The copy operation copies the elements of its second argument to its first--it is something like an array assignment. These functions may have to be overloaded to apply to some finite set of array types, eg they may be defined for arrays with elements of any suitable Java primitive type, up to some maximum rank of array. Alternatively the type-
hierarchy for arrays can be defined in a way that allows these functions to be more polymorphic.
Dept. of CSE
11
MESCE, Kuttippuram
Seminar Report03
HPJava
An abstract base class Procs has subclasses Procs1, Procs2, ..., representing one-dimensional process arrays, two-dimensional process arrays, and so on. Procs2 p = new Procs2(2, 2) ; Procs1 q = new Procs1(4) ;
These declarations set p to represent a 2 by 2 process array and q to represent a 4-element, one-dimensional process array. In either case the object created describes a group of 4 processes. At the time the Procs constructors are executed the program should be executing on four or more processes. Either constructor selects four processes from this set and identifies them as members of the constructed group.
Procs has a member function called member, returning a boolean value. This is true if the local process is a member of the group, false otherwise. if(p.member()) { ... }
Dept. of CSE
12
MESCE, Kuttippuram
Seminar Report03
HPJava
The code inside the if is executed only if the local process is a member p. We will say that inside this construct the active process group is restricted to p.
The multi-dimensional structure of a process array is reflected in its set of process dimensions. An object is associated with each dimension. These objects are accessed through the inquiry member dim: Dimension x = p.dim(0) ; Dimension y = p.dim(1) ;
Dimension z = q.dim(0) ;
The object returned by the dim inquiry has class Dimension. The members of this class include the inquiry crd. This returns the coordinate of the local process with respect to the process dimension. The result is only well-defined if the local process is a member of the parent process array. The inner body code in if(p.member()) if(x.crd() == 0) if(y.crd() == 0) { ... } will only execute on the first process from p, with coordinates(0,0).
Seminar Report03
HPJava
represented by an object of class Range. A Range object defines a range of integer subscripts, and defines how they are mapped into a process array dimension. In fact the Dimension class introduced in the previous section is a subclass of Range. In this case the integer range is just the range of coordinate values associated with the dimension. Each value in the range is mapped, of course, to the process (or slice of processes) with that coordinate. This kind of range is also called a primitive range. More complex subclasses of Range implement more elaborate maps from integer ranges to process dimensions. Some of these will be introduced in later sections. For now we concentrate on arrays constructed with Dimension objects as their distributed ranges.
The syntax of section 2 is extended in the following way to support distributed arrays
A distributed range object may appear in place of an integer extent in the ``constructor'' of the array (the expression following the new keyword).
If a particular dimension of the array has a distributed range, the corresponding slot in the type signature of the array should include a # symbol.
In general the constructor of the distributed array must be followed by an on clause, specifying the process group over which the array is distributed. Distributed ranges of the array must be distributed over distinct dimensions of this group.
Assume p, x and y are declared as in the previous section, then float [[#,#,]] a = new float [[x, y, 100]] on p ; defines a as a 2 by 2 by 100 array of floating point numbers. Because the first two dimensions of the array are distributed ranges--dimensions of pDept. of CSE 14 MESCE, Kuttippuram
Seminar Report03
HPJava
-a is actually realized as four segments of 100 elements, one in each of the processes of p. The process in p with coordinates i, j holds the section a [[i, j, :]]. The distributed array a is equivalent in terms of storage to four local arrays defined by float [] b = new float [100] ; But because a is declared as a collective object we can apply collective operations to it. The HPJlib functions introduced in section 2 apply equally well to distributed arrays, but now they imply interprocessor communication. float [[#,#,]] a = new float [[x, y, 100]] on p, b = new float [[x, y, 100]] on p ;
The shift operation causes the local values of a to be overwritten with values of b from a processor adjacent in the x dimension. There is a catch in this. When subscripting the distributed dimensions of an array it is simply disallowed to use subscripts that refer to off-processor elements. While this: int i = x.crd(), j = y.crd() ; a [i, j, 20] = a [i, j, 21] ; is allowed, this: int i = x.crd(), j = y.crd() ; a [i, j, 20] = b [(i + 1) % 2, j, 20] ; is forbidden. The second example could apparently be implemented using a nearest neighbour communication, quite similar to the shift
Dept. of CSE 15 MESCE, Kuttippuram
Seminar Report03
HPJava
example above. But our language imposes an strict policy distinguishing it from most data parallel languages: while library functions may introduce communications, language primitives such as array
If subscripting distributed dimensions is so restricted, why are the i, j subscripts on the arrays needed at all? In the examples of this section these subscripts are only allowed one value on each processor. Well, the inconvience of specifying the subscripts will be reduced by language constructs introduced later, and the fact that only one subscript value is local is a special feature of the primitive ranges used here. The higher level distributed ranges introduced later map multiple elements to individual processes. Subscripting will no longer look so redundant.
Seminar Report03
HPJava
construct specifically changes the value of the APG to p. On exit from the construct, the APG is restored to its value on entry.
Elevating the active process group to a part of the language allows some simplifications. For example, it provides a natural default for the on clause in array constructors. More importantly, formally defining the active process group simplifies the statement of various rules about what operations are legal inside distributed control constructs like on.
The framework described is much more powerful than space allows us to demonstrate in this paper. This power comes in part from the flexibility to add features by extending the libraries associated with the language. We have only illustrated the simplest kinds of distribution
Dept. of CSE 17 MESCE, Kuttippuram
Seminar Report03
HPJava
format. But any HPF 1.0 array distribution format, plus various others, can be incorporated by extending the Range hierarchy in the run-time library. We have only illustrated shift and writeHalo operations from the communication library, but the library also includes much more powerful operations for remapping arrays and performing irregular data accesses. Our intention is to provide minimal language support for distributed arrays, just enough to facilitate further extension through construction of new libraries.
For a more complete description of a slightly earlier version of the proposed language, see
Seminar Report03
HPJava
PCRC runtime library, which has a kernel implemented in C++ and a Java interface implemented in Java and C++.
4.1.1. Java packages for HPspmd programming The current runtime interface for HPJava is called adJava. It consists of two Java packages. The first is the HPspmd runtime proper. It includes the classes needed to translate language constructs. The second package provides communication and some simple I/O functions. These two packages will be outlined in this section. The classes in the first package include an environment class, distributed array ``container classes'', and related classes describing process groups and index ranges. The environment class SpmdEnv provides functions to initialize and finalize the underlying
communication library (currently MPI). Constructors call native functions to prepare the lower level communication package. An important field, apg, defines the group of processes that is cooperating in ``loose synchrony'' at the current point of execution. The other classes in this package correspond directly to HPJava built-in classes. The first hierarchy is based on Group. A group, or process group, defines some subset of the processes executing the SPMD program. Groups have two important roles in HPJava. First they are used to describe how program variables such as arrays are distributed or replicated across the process pool. Secondly they are used to specify which subset of processes execute a particular code fragment. Important members of adJava Group class include the pair on(), no() used to translate the on construct.
Dept. of CSE 19 MESCE, Kuttippuram
Seminar Report03
HPJava
The most common way to create a group object is through the constructor for one of the subclasses representing a process grid. The subclass Procs represents a grid of processes and carries information on process dimensions: in particular an inquiry function dim(r) returns a range object describing the subclassed by Procs0, -th process dimension. Procs is further provide simpler
constructors for fixed dimensionality process grids. The class hierarchy of groups and process grids is shown in figure 1.
The second hierarchy in the package is based on Range. A range is a map from the integer interval into some process dimension
(ie, some dimension of a process grid). Ranges are used to parametrize distributed arrays and the overall distributed loop.
Dept. of CSE
20
MESCE, Kuttippuram
Seminar Report03
HPJava
The most common way to create a range object is to use the constructor for one of the subclasses representing ranges with specific distribution formats. The current class hierarchy is given in figure 2. Simple block distribution format is implemented by BlockRange, while CyclicRange and BlockCyclicRange represent other standard
distribution formats of HPF. The subclass CollapsedRange represents a sequential (undistributed range). Finally, DimRange represents the range of coordinates of a process dimension itself--just one element is mapped to each process.
The related adJava class Location represents an individual location in a particular distributed range. Important members of the adJava Range class include the function location(i) which returns the th location in a
range and its inverse, idx(l), which returns the global subscript associated with a given location. Important members of the Location class include at() and ta(), used in the implementation of the HPJava that at construct.
Dept. of CSE
21
MESCE, Kuttippuram
Seminar Report03
HPJava
Finally in this package we have the rather complex hierarchy of classes representing distributed arrays. HPJava global arrays declared using [[ ]] are represented by Java objects belonging to classes such as: Array1dI, Array1cI, Array2ddI, Array2dcI, Array2cdI, Array2ccI, ... Array1dF, Array1cF, Array2ddF, Array2dcF, Array2cdF, Array2ccF, ...
Generally
speaking
the
class
Arrayndc...T
represents
dimensional distributed array with elements of type T, currently one of I, F, ..., meaning int, float, ... string of . The penultimate part of the class name is a
or distributed. These correlate with presence or absence of an asterisk in slots of the HPJava type signature. The concrete Array... classes implement a series of abstract interfaces. These follow a similar naming convention, but the root of their names is Section rather than Array (so Array2dcI, for example, implements Section2dcI). The hierarchy of Section interfaces is illustrated in figure 3.
Dept. of CSE
22
MESCE, Kuttippuram
Seminar Report03
HPJava
The need to introduce the Section interfaces should be evident from the hierarchy diagram. The type hierarchy of HPJava involves a kind of multiple inheritance. The array type int [[*, *]], for example, is a specialization of both the types int [[*, ]] and int [[, *]]. Java allows ``multiple inheritance'' only from interfaces, not classes.
Here we mention some important members of the Section interfaces. The inquiry dat() returns an ordinary one dimensional Java array used to store the locally held elements of the distributed array. The member pos(i, ...), which takes arguments, returns the local offset of the
element specified by its list of arguments. Each argument is either a location (if the corresponding dimension is distributed) or an integer (if it is collapsed). The inquiry grp() returns the group over which elements of the array are distributed. The inquiry rng(d) returns the array.
Dept. of CSE 23 MESCE, Kuttippuram
th range of the
Seminar Report03
HPJava
The second package in adJava is the communication library. The adJava communication package includes classes corresponding to the various collective communication schedules provided in the NPAC PCRC kernel. Most of them provide of a constructor to establish a schedule, and an execute method, which carries out the data movement specified by the schedule. The communication schedules provided in this package are based on the NPAC runtime library. Different
The collective communication schedules can be used directly by the programmer or invoked through certain wrapper functions. A class named Adlib is defined with static members that create and execute communication schedules and perform simple I/O functions. This class includes, for example, the following methods, each implemented by constructing the appropriate schedule and then executing it. static public void remap(Section dst, Section src) static public void shift(Section dst, Section src, int shift, int dim, int mode) static public void copy(Section dst, Section src) static public void writeHalo(Section src, int[] wlo, int[] whi, int[] mode) Use of these functions will be illustrated in later examples. Polymorphism is achieved by using arguments of class Section.
Seminar Report03
HPJava
In HPJava, variable names are divided into two sets. In general those declared using ordinary Java syntax represent local variables and those declared with [[ ]] represent global variables. The two sectors are independent. In the implementation of HPJava the global variables have special data descriptors associated with them, defining how their components are divided or replicated across processes. The significance of the data descriptor is most obvious when dealing with procedure calls.
Dept. of CSE 25 MESCE, Kuttippuram
Seminar Report03
HPJava
Passing array sections to procedure calls is an important component in the array processing facilities of Fortran90 [1]. The data descriptor of Fortran90 will include stride information for each array dimension. One can assume that HPF needs a much more complex kind of data descriptor to allow passing distributed arrays across procedure boundaries. In either case the descriptor is not visible to the programmer. Java has a more explicit data descriptor concept; its arrays are considered as objects, with, for example, a publicly accessible length field. In HPJava, the data descriptors for global data are similar to those used in HPF, but more explicitly exposed to programmers. Inquiry functions such as grp(), rng() have a similar role in global data to the field length in an ordinary Java array.
Keeping two data sectors seems to complicate the language and its syntax. But it provides convenience for both task and data parallel processing. There is no need for things like the LOCAL mechanism in HPF to call a local procedure on the node processor. The descriptors for ordinary Java variables are unchanged in HPJava. On each node processor ordinary Java data will be used as local varables, like in an MPI program
Dept. of CSE
26
MESCE, Kuttippuram
Seminar Report03
HPJava
There are a limited number of Java operators overloaded. A group object can be restricted by a location using the / operation, and a subrange or location can be obtained from a range using the [ ] operator enclosing a triplet expression or an integer, These pieces of syntax can be considered as shorthand for suitable constructors in the corresponding classes. This is comparable to the way Java provides special syntax support for String class constructor.
Another kind of overloading occurs in location shift, which is used to support ghost regions. A shift operator + is defined between a location and an integer. It will be illustrated in the examples in the next section. This is a restricted operation--it has meaning (and is legal) only in an array subscript expression.
Dept. of CSE
27
MESCE, Kuttippuram
Seminar Report03
HPJava
We will mention two related projects. Spar [11] is a Java-based language for array-parallel programming. Like our language it introduces multi-dimensional arrays, array sections, and a parallel loop. There are some similarities in syntax, but semantically Spar is very different to our language. Spar expresses parallelism but not explicit data placement or communication--it is a higher level language. ZPL [10] is a new programming language for scientific computations. Like Spar, it is an array language. It has an idea of performing computations over a region, or set of indices. Within a compound statement prefixed by a region specifier, aligned elements of arrays distributed over the same region can be accessed. This idea has certain similarities to our over construct. Communication is more explicit than Spar, but not as explicit as in the language discussed in this article.
Seminar Report03
HPJava
these schedules, involving particular data arrays and other parameters, are created by the class constructors. Executing a schedule initiates the communications required to effect the operation. A single schedule may be executed many times, repeating the same communication pattern. In this way, especially for iterative programs, the cost of computations and negotiations involved in constructing a schedule can often be amortized over many executions. This paradigm was pioneered in the
CHAOS/PARTI libraries [8]. If a communication pattern is to be executed only once, simple wrapper functions can be made available to construct a schedule, execute it, then destroy it. The overhead of creating the schedule is essentially unavoidable, because even in the single-use case individual data movements generally have to be sorted and aggregated, for efficiency. The associated data structures are just those associated with schedule construction. Constructor and public method of the remap schedule for distributed arrays of float element can be described as follows:
The remap schedule combines two functionalities: it reorganizes data in the way indicated by the distribution formats of source and destination array. Also, if the destination array has a replicated distribution format, it broadcasts data to all copies of the destination. Here we will concentrate on the former aspect, which is handled by an object of class RemapSkeleton contained in every Remap object. During
Dept. of CSE 29 MESCE, Kuttippuram
Seminar Report03
HPJava
construction of a RemapSkeleton schedule, all send messages, receive messages, and internal copy operations implied by execution of the schedule are enumerated and stored in light-weight data structures. These messages have to be sorted before sending, for possible message agglomeration, and to ensure a deadlock-free communication schedule. These algorithms, and maintenance of the associated data structures, are dealt with in a base class of RemapSkeleton called BlockMessSchedule. The API for the superclass is outlined in Figure 11. To set-up such a lowlevel schedule, one makes a series of calls to sendReq and recvReq to define the required messages. Messages are characterized by an offset in some local array segment, and a set of strides and extents parameterizing a multi-dimensional patch of the (flat Java) array. Finally the build() operation does any necessary processing of the message lists. The schedule is executed in a ``forward'' or ``backward'' direction by invoking gather() or scatter().
Dept. of CSE
30
MESCE, Kuttippuram
Seminar Report03
HPJava
5.2. Titanium
The Titanium language is designed to support high-performance scientific applications. Historically, few languages that made such a claim have achieved a significant degree of serious use by scientific programmers. Among the reasons are the high learning curve for such languages, the dependence on heroic parallelizing compiler technology and the consequent absence of compilers and tools, and the incompatibilities with languages used for libraries. Our goal is to provide a language that gives its users access to modern program structuring through the use of object-oriented technology, that enables its users to write explicitly parallel code to exploit their understanding of the computation, and that has a compiler that uses optimizing compiler technology where it is reliable and gives predictable results. The starting design point for Titanium is Java. We have chosen Java for several reasons. Although Titanium project extend the Java language for scientific computing but compile down to C or C++.Those approaches offer the hope of tuning for higher ultimate performance, but sacrifice various benefits of the full Java platform.
5.3. FIDIL
The multidimensional array support in HPJava is strongly influenced by FIDIL maps and domains [6, 11]. HPJava, however, sacrifices expressiveness for performance. FIDIL maps have arbitrary shapes.
FIDIL has only a general domain type, thus making it harder to optimize code that uses the more common rectangular kind.
Dept. of CSE
31
MESCE, Kuttippuram
Seminar Report03
HPJava
5.4. Split-C
The parallel execution model and global address space support in HPJava are closely related to Split-C [5] and AC [3]. HPJava shares a common communication layer with Split-C on distributed memory machines, which we have extended as part of the HPJava project to run on shared memory machines. Split-C differs from HPJava in that the default pointer type is local rather than global; a local pointer default simplifies interfacing to existing sequential code, but a global default makes it easier to port shared memory applications to distributed memory machines. Split-C uses sequential consistency as its default consistency model, but provides explicit operators to allow non-blocking operations to be used. In AC the compiler introduces non-blocking memory operations automatically, using only dependence information, not parallel program analysis.
Dept. of CSE
32
MESCE, Kuttippuram
Seminar Report03
HPJava
6. Conclusion
Our experience thus far is that Java is a good choice as a base language: it is easy to extend, and its safety features greatly simplify the compiler writers task. We also believe that extending Java is easier than obtaining high performance within Javas strict language specs (assuming that the latter is at all feasible). Many of the features of HPJava would be hard or impossible to achieve as Java libraries, and the compiler would not be able to perform static analysis and optimizations on them. HPJava will be most helpful for problems that have some degree of regularity. To a first approximation, the HPJavas domain is similar to HPFs. Many of the most challenging problems in modern computational science have irregular structure, so the value of our language features in those domains is more controversial.
Dept. of CSE
33
MESCE, Kuttippuram
Seminar Report03
HPJava
Dept. of CSE
34
MESCE, Kuttippuram
Seminar Report03
HPJava
CONTENTS
1 2 Introduction Over view of HPJava 2.1 2.2 3 HPJava History HPJava Philosophy
Characteretics 3.1 3.2 3.3 3.4 3.5 Multidimensional arrays Process arrays Distributed arrays The on construct and the active process group Other features
Considerations in HPJava language design 4.1 Translation scheme 4.1.1 4.2 Java packages for HPspmd programming
Issues in the language design 4.2.1 4.2.2 4.2.3 Extending the Java language Datatypes in HPJava Programming convenience
Discussion and related work 5.1 5.2 5.3 5.4 Implementation of Collectives Titanium FIDIL Split-C
6 7
Conclusion References
Dept. of CSE
35
MESCE, Kuttippuram
Seminar Report03
HPJava
ABSTRACT
The idea that Java may enable new programming environments, combining attractive user interfaces with high performance computation, is gaining increasing attention amongst computational scientists. Java boasts a direct simplicity reminiscent of Fortran, but also incorporates many of the important ideas of modern object-oriented programming. Of course it comes with an established track-record in the domains of Web and Internet programming.
The language outlined here provides HPF-like distributed arrays as language primitives, and new distributed control constructs to facilitate access to the local elements of these arrays. In the SPMD mold, the model allows processors the freedom to independently execute complex procedures on local elements: it is not limited by SIMD-style array syntax. All access to non-local array elements must go through library functions--typically collective communication operations. This puts an extra onus on the programmer; but making communication explicit encourages the programmer to write algorithms that exploit locality, and simplifies the task of the compiler writer. On the other hand, by providing distributed arrays as language primitives we are able to simplify error-prone tasks such as converting between local and global array subscripts and determining which processor holds a particular element. As in HPF, it is possible to write programs at a natural level of abstraction where the meaning is insensitive to the detailed mapping of elements. Lower-level styles of programming are also possible.
Dept. of CSE
36
MESCE, Kuttippuram
Seminar Report03
HPJava
ACKNOWLEDGEMENTS
I express my sincere thanks to Prof. M.N Agnisarman Namboothiri (Head of the Department, Computer Science and Engineering, MESCE), Mr. Zainul Abid (Staff incharge) for their kind cooperation for presenting the seminar.
I also extend my sincere thanks to all other members of the faculty of Computer Science and Engineering Department and my friends for their co-operation and encouragement.
Rony V John
Dept. of CSE
37
MESCE, Kuttippuram