forked from cnuernber/dtype-next
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathdimensions-bytecode-gen.html
More file actions
478 lines (462 loc) · 28.6 KB
/
dimensions-bytecode-gen.html
File metadata and controls
478 lines (462 loc) · 28.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
<!DOCTYPE html PUBLIC ""
"">
<html><head><meta charset="UTF-8" /><title>Dimensions and Bytecode Generation</title><link rel="stylesheet" type="text/css" href="css/default.css" /><link rel="stylesheet" type="text/css" href="highlight/solarized-light.css" /><script type="text/javascript" src="highlight/highlight.min.js"></script><script type="text/javascript" src="js/jquery.min.js"></script><script type="text/javascript" src="js/page_effects.js"></script><script>hljs.initHighlightingOnLoad();</script></head><body><div id="header"><h2>Generated by <a href="https://github.com/weavejester/codox">Codox</a> with <a href="https://github.com/xsc/codox-theme-rdash">RDash UI</a> theme</h2><h1><a href="index.html"><span class="project-title"><span class="project-name">dtype-next</span> <span class="project-version">6.00-beta-11</span></span></a></h1></div><div class="sidebar primary"><h3 class="no-link"><span class="inner">Project</span></h3><ul class="index-link"><li class="depth-1 "><a href="index.html"><div class="inner">Index</div></a></li></ul><h3 class="no-link"><span class="inner">Topics</span></h3><ul><li class="depth-1 "><a href="buffered-image.html"><div class="inner"><span>Buffered Image Support</span></div></a></li><li class="depth-1 "><a href="cheatsheet.html"><div class="inner"><span>Cheatsheet</span></div></a></li><li class="depth-1 "><a href="datatype-to-dtype-next.html"><div class="inner"><span>Why dtype-next?</span></div></a></li><li class="depth-1 current"><a href="dimensions-bytecode-gen.html"><div class="inner"><span>Dimensions and Bytecode Generation</span></div></a></li></ul><h3 class="no-link"><span class="inner">Namespaces</span></h3><ul><li class="depth-1"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>tech</span></div></div></li><li class="depth-2"><div class="no-link"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>v3</span></div></div></li><li class="depth-3"><a href="tech.v3.datatype.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>datatype</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.datatype.argops.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>argops</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.datatype.bitmap.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>bitmap</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.datatype.datetime.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>datetime</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.datatype.errors.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>errors</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.datatype.functional.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>functional</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.datatype.jna.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>jna</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.datatype.list.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>list</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.datatype.mmap.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>mmap</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.datatype.native-buffer.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>native-buffer</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.datatype.nippy.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>nippy</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.datatype.packing.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>packing</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.datatype.reductions.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>reductions</span></div></a></li><li class="depth-4"><a href="tech.v3.datatype.rolling.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>rolling</span></div></a></li><li class="depth-3"><div class="no-link"><div class="inner"><span class="tree" style="top: -424px;"><span class="top" style="height: 433px;"></span><span class="bottom"></span></span><span>libs</span></div></div></li><li class="depth-4 branch"><a href="tech.v3.libs.buffered-image.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>buffered-image</span></div></a></li><li class="depth-4"><a href="tech.v3.libs.neanderthal.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>neanderthal</span></div></a></li><li class="depth-3"><div class="no-link"><div class="inner"><span class="tree" style="top: -83px;"><span class="top" style="height: 92px;"></span><span class="bottom"></span></span><span>parallel</span></div></div></li><li class="depth-4"><a href="tech.v3.parallel.for.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>for</span></div></a></li><li class="depth-3"><a href="tech.v3.tensor.html"><div class="inner"><span class="tree" style="top: -52px;"><span class="top" style="height: 61px;"></span><span class="bottom"></span></span><span>tensor</span></div></a></li><li class="depth-4 branch"><a href="tech.v3.tensor.color-gradients.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>color-gradients</span></div></a></li><li class="depth-4"><a href="tech.v3.tensor.dimensions.html"><div class="inner"><span class="tree"><span class="top"></span><span class="bottom"></span></span><span>dimensions</span></div></a></li></ul></div><div class="document" id="content"><div class="doc"><div class="markdown"><h1><a href="#dimensions-and-bytecode-generation" name="dimensions-and-bytecode-generation"></a>Dimensions and Bytecode Generation</h1>
<h2><a href="#translating-from-n-dimensional-space-to-1-dimensional-space" name="translating-from-n-dimensional-space-to-1-dimensional-space"></a>Translating from N-Dimensional Space to 1 Dimensional Space</h2>
<p>We want to be able to take 1 dimensional buffers of data and represent N-dimensional hyper-rectangular spaces. Lots of things are n-dimensional and hyperrectangular; things like images and datasets and volumes and 3d objects so let’s take a moment and talk about the abstractions required efficiently present this interface.</p>
<p><code>dimensions</code> objects describe the N-dimensional addressing scheme and perform the translation from our n-dimensional ‘global’ space into our 1-dimensional ‘local’ dense address spaces. In other words they describe how to randomly addressed data buffers using an n-dimensional addressing scheme that supports numpy style in-place slicing, transpose, broadcasting, reshape, and APL style rotations. In this way they are designed to add ‘ND’ operations to any randomly addressed system.</p>
<p>Some examples of dimension object operations:</p>
<pre><code class="clojure">user> (require '[tech.v3.tensor :as dtt]))
user> (def tens (dtt/->tensor (partition 2 (range 4))))
#'user/tens
user> tens
#tech.v3.tensor<object>[2 2]
[[0 1]
[2 3]]
#tech.v3.tensor<object>[2]
[2 3]
user> (tens 0)
#tech.v3.tensor<object>[2]
[0 1]
user> (tens 1)
#tech.v3.tensor<object>[2]
[2 3]
user> ;;Select selects subregions and can reorder dimension indexes
user> (dtt/select tens 1)
#tech.v3.tensor<object>[2]
[2 3]
user> (dtt/select tens 1 [1 0])
#tech.v3.tensor<object>[2]
[3 2]
user> ;;Transpose reorders dimensions
user> (dtt/transpose tens [1 0])
#tech.v3.tensor<object>[2 2]
[[0 2]
[1 3]]
user> ;;Broadcasting duplicates dimensions
user> (dtt/broadcast tens [2 6])
#tech.v3.tensor<object>[2 6]
[[0 1 0 1 0 1]
[2 3 2 3 2 3]]
</code></pre>
<p>These operations are all supported by the <a href="../src/tech/v3/tensor/dimensions.clj">dimensions</a> object. This object is responsible for, given an address in the (possibly n dimensional) input space, produce an address in the linearly-addressed ‘local’ space of the buffer.</p>
<pre><code class="clojure">user> (dtt/tensor->dimensions tens)
{:shape [2 2], :strides #list<int64>[2]
[2, 1, ], :n-dims 2, :shape-ecounts [2 2], :shape-ecount-strides #list<int64>[2]
[2, 1, ],
:overall-ecount 4,
:buffer-ecount 4,
:reduced-dims
{:shape [4],
:strides [1],
:offsets [0],
:shape-ecounts [4],
:shape-ecount-strides [1]},
:broadcast? false,
:offsets? false,
:shape-direct? true,
:native? true,
:access-increasing? true,
:global->local #<Delay@6e42efc3: [0 1 2 3]>,
:local->global #<Delay@4efebdd8: :not-delivered>}
```
## Linearizing the N-Dimensional Global Address Space
If you have used linear algebra libraries in the past you know they talk about if
an operation is 'row-major' or 'column-major'. Fortran presents a 'column-major'
abstraction while images (jpeg,png) are 'row-major'.
This term talk about the linearization of their spaces onto the underlying storage
layer. The datatype library is 'row-major', unlike Fortran. This means that:
```clojure
user> ;; A 2x2 matrix in row major
user> (dtt/->tensor (partition 2 (range 4)))
#tech.v3.tensor<object>[2 2]
[[0 1]
[2 3]]
user> ;;A 2x2 matrix in column major
user> (dtt/transpose (dtt/->tensor (partition 2 (range 4)))
[1 0])
#tech.v3.tensor<object>[2 2]
[[0 2]
[1 3]]
</code></pre>
<p>If we are going to iterate through every element in a matrix or N-dimensional object and we start at index 0 and go to index <code>(- element-count 1)</code> then we have linearized the N-dimensional address space. This operation happens a <em>lot</em> in the datatype library as it forms the foundation of copying data and applying elementwise operations.</p>
<p>There are some observations we want to make about this linearization that are very useful for many optimizations.</p>
<p>Transforming into this linear space is very simple, it is a summation of the input dimension multiplied by the stride at that dimension:</p>
<pre><code class="clojure">user> (dtt/tensor->dimensions (dtt/->tensor (partition 2 (range 4))))
{:shape [2 2],
:strides [2 1],
:offsets [0 0],
:max-shape [2 2],
:dense? true,
:global->local #<Delay@4a0f7470: :not-delivered>,
:local->global #<Delay@70144938: :not-delivered>}
user> (def dims (dtt/tensor->dimensions (dtt/->tensor (partition 2 (range 4)))))
#'user/dims
user> (+ (* 1 (nth (:strides dims) 0))
(* 0 (nth (:strides dims) 1)))
2
</code></pre>
<h2><a href="#reduced-dimensions" name="reduced-dimensions"></a>Reduced Dimensions</h2>
<p><a href="../src/tech/v3/tensor/dimensions/analysis.clj">tech.v3.tensor.dimensions.analysis.clj</a> presents ways to gain insight into some properties of the dimension objects. One operation it presents is the ability to reduce the dimensionality of an object while keeping the row-major iteration order of global->local indexes the same.</p>
<p>Reducing dimensions allows us to define a minimum number of operations required to go from a global address space to a local address space.</p>
<p>Here are some examples of reducing dimensions.</p>
<pre><code class="clojure">user> (require '[tech.v3.tensor.dimensions :as dims])
nil
user> (require '[tech.v3.tensor.dimensions.analytics :as dims-analytics])
nil
user> (dims-analytics/reduce-dimensionality
(dims/dimensions [4 4]))
{:shape [16],
:strides [1],
:offsets [0],
:shape-ecounts [16],
:shape-ecount-strides #list<int64>[1]
[1, ]}
user> (dims-analytics/reduce-dimensionality
(dims/dimensions [2 2 4]))
{:shape [16],
:strides [1],
:offsets [0],
:shape-ecounts [16],
:shape-ecount-strides #list<int64>[1]
[1, ]}
user> (dims-analytics/reduce-dimensionality
(dims/dimensions [2 2 2 2]))
{:shape [16],
:strides [1],
:offsets [0],
:shape-ecounts [16],
:shape-ecount-strides #list<int64>[1]
[1, ]}
user> ;;Broadcasting changes the max shape but not the shape
user> (dims-analytics/reduce-dimensionality
(dims/broadcast (dims/dimensions [4 4]) [8 4]))
{:shape [16],
:strides [1],
:offsets [0],
:shape-ecounts [32],
:shape-ecount-strides #list<int64>[1]
[1, ]}
</code></pre>
<p>Here are two of many important properties of reduced dimensions:</p>
<ul>
<li>
<p>If the size of the shape array is 1, the shape is a number, and the stride is 1 and there is no broadcasting or offsets then the underlying buffer is represented ‘natively’ which means that instead of using the tensor for an elementwise operation you can substitute the underlying buffer. This means that coping data into/outof the tensor can use fast paths such as <code>System/arrayCopy</code> and <code>memcpy</code>. In this case our address space operator simply returns the input.</p></li>
<li>
<p>If the last stride is 1 and the shape is a number then this object’s last dimension is accessed natively and there is a fast <code>row-copy</code> type operation that can be used to perform copying of, for instance, sub-images into or out of larger images and thus performing the basis of rendering sprites. In this case the address operator’s last (most rapidly changing) dimension operation is a ‘remainder’ operation of the input global address space.</p></li>
</ul>
<p>These types of properties do not fall out of full dimensions as there are many distinct different non-reduced-dimensions that all correspond to the same global->local address space transformation.</p>
<h2><a href="#building-an-addressing-operator" name="building-an-addressing-operator"></a>Building An Addressing Operator</h2>
<p>Regardless of how much we reduce our dimensionality problem, we can’t make it completely go away and we will need to have an operator that, given the reduced dimensions, can transform an index in global linearized space into local linearized space.</p>
<p>The old pathway we took had us attempting top spot specific optimizations that would hit various fast paths and build operators for exactly that condition. This worked OK but there are just a lot of possible interactions we couldn’t optimize for. So we decided to build an addressing operator by first producing an Abstract Syntax Tree (AST) implementation of the reduced dimension pathway and then producing an implementation using purely java bytecode.</p>
<p>This reduces the special case pathway to a set of general conditions that we can test much more thoroughly as well as producing objects that are tailored specifically to the addressing scheme presented by the reduced dimensions.</p>
<h3><a href="#step-1-an-abstract-syntax-tree" name="step-1-an-abstract-syntax-tree"></a>Step 1 - An Abstract Syntax Tree</h3>
<p>Our first step, once producing correct reduced dimensions is to produce an AST that describes the global->local transformation. We can build our transformation out of the reduced dimensions directly along with one more variable, <code>max-shape-strides</code> which are a strides array created out of the max-shape variable.</p>
<p>We reduce the dimensions. Below the example represents an image with dimensions [height width n-channels] that was cropped from a larger image with concrete dimensions [2048 2048 4]. That means that width and channels are contiguous in memory while each row is strided. Put another way height is non-contiguously strided by a dimension larger that <code>(* width n-channels)</code> and thus we cannot collapse it.</p>
<pre><code class="clojure"><br />user>;;Image dimensions when you have a 2048x2048 image and you
user>;;want to crop a 256x256 sub-image out of it.
user> (def src-dims (dims/dimensions [256 256 4] [8192 4 1]))
#'user/src-dims
user> ;;Because we are cropping out of a larger image, we have strided rows
user> ;;but data within a row is contiguous.
user> (def reduced-dims (dims-analytics/reduce-dimensionality src-dims))
#'user/reduced-dims
user> reduced-dims
{:shape [256 1024],
:strides [8192 1],
:offsets [0 0],
:shape-ecounts [256 1024],
:shape-ecount-strides #list<int64>[2]
[1024, 1, ]}
</code></pre>
<p>Once we have a reduced expression of our dimension space, we can build an AST that expressed exactly the transformation required to transform correctly from the global index space into the local index space:</p>
<pre><code class="clojure">user> (require '[tech.v3.tensor.dimensions.global-to-local :as gtol])
nil
user> (require '[tech.v3.tensor.dimensions.global-to-local :as gtol])
nil
user> (def test-ast (gtol/global->local-ast reduced-dims))
Syntax error compiling at (*cider-repl cnuernber/dtype-next:localhost:40099(clj)*:197:21).
No such var: gtol/global->local-ast
user> (def signature (gtol/signature->ast reduced-dims))
Execution error (NullPointerException) at tech.v3.tensor.dimensions.global-to-local/signature->ast (global_to_local.clj:146).
null
user> (def signature (gtol/reduced-dims->signature reduced-dims))
#'user/signature
user> signature
{:n-dims 2,
:direct-vec [true true],
:offsets? false,
:broadcast? false,
:trivial-last-stride? true}
user> (def test-ast (gtol/signature->ast signature))
#'user/test-ast
user> test-ast
{:signature
{:n-dims 2,
:direct-vec [true true],
:offsets? false,
:broadcast? false,
:trivial-last-stride? true},
:ast
(+
(*
(quot idx {:ary-name :shape-ecount-stride, :dim-idx 0})
{:ary-name :stride, :dim-idx 0})
(rem idx {:ary-name :shape, :dim-idx 1}))}
</code></pre>
<p>The AST above efficiently implements the global->local address space translation for exactly those reduced dims. The AST can be completely recreated given only the signature thus we can cache compiled AST representations by signature.</p>
<p>Keeping in mind that the least rapidly changing dimension, <code>height</code>, is dimension 0 and that width and channels have been collapsed into a single contiguous dimension we can write that AST in a slightly more human readable way:</p>
<pre><code class="clojure">'(+ (* (quot idx max-shape-stride-height) stride-height)
(rem idx shape-widthchan))
</code></pre>
<h3><a href="#step-2-a-class-definition" name="step-2-a-class-definition"></a>Step 2 - A Class Definition</h3>
<p>Now we start interacting with Justin Conklin’s excellent <a href="https://github.com/jgpc42/insn">insn</a> library. <code>insn</code> is great because it wraps the bytecode generation facilities of <a href="https://asm.ow2.io/">ow2.asm</a> in a functional, declarative abstraction <em>and stops right there</em> :-). This is a signficant foundational piece to building a great bytecode compiler because it is easy to visually inspect the bytecode before it is sent to the actual compiler. It is also easy to build as it is simply an abstraction built out of some of the fundamental types of Clojure. This is really nice because this AST gets automatic visualization via the REPL thus solving one of the problems with AST’s - namely that they can be opaque and difficult to debug.</p>
<p>The <a href="https://en.wikipedia.org/wiki/Java_bytecode_instruction_listings">bytecode</a> translation of our AST is:</p>
<pre><code class="clojure">user> (def class-def (gtol/gen-ast-class-def test-ast))
#'user/class-def
user> class-def
{:name tech.v3.datatype.GToL2TTOffFBcastFTrivLastST,
:interfaces [tech.v3.datatype.LongReader],
:fields
[{:flags #{:public :final}, :name "shape1", :type :long}
{:flags #{:public :final}, :name "shapeEcountStride0", :type :long}
{:flags #{:public :final}, :name "stride0", :type :long}
{:flags #{:public :final}, :name "nElems", :type :long}],
:methods
[{:flags #{:public},
:name :init,
:desc [[Ljava.lang.Object; [J [J [J [J :void],
:emit
[[:aload 0]
[:invokespecial :super :init [:void]]
[:aload 0]
[:aload 1]
[:ldc 1]
[:aaload]
[:checkcast java.lang.Long]
[:invokevirtual java.lang.Long "longValue"]
[:putfield :this "shape1" :long]
[:aload 0]
[:aload 5]
[:ldc 0]
[:laload]
[:putfield :this "shapeEcountStride0" :long]
[:aload 0]
[:aload 2]
[:ldc 0]
[:laload]
[:putfield :this "stride0" :long]
[:aload 0]
[:aload 4]
[:ldc 0]
[:laload]
[:aload 5]
[:ldc 0]
[:laload]
[:lmul]
[:putfield :this "nElems" :long]
[:return]]}
{:flags #{:public},
:name "lsize",
:desc [:long],
:emit [[:aload 0] [:getfield :this "nElems" :long] [:lreturn]]}
{:flags #{:public},
:name "readLong",
:desc [:long :long],
:emit
[[:lload 1]
[:aload 0]
[:getfield :this "shapeEcountStride0" :long]
[:ldiv]
[:aload 0]
[:getfield :this "stride0" :long]
[:lmul]
[:lload 1]
[:aload 0]
[:getfield :this "shape1" :long]
[:lrem]
[:ladd]
[:lreturn]]}]}
</code></pre>
<p>In order to formulate this, we simply wrote a couple java files and compiled them, then printed the byte code with <a href="https://docs.oracle.com/javase/8/docs/technotes/tools/windows/javap.html">javap</a>.</p>
<p>It is important to note that the class derives from <a href="../java/tech/v3/datatype/LongReader.java">LongReader</a>. This is an interface that defines an [:int64->:int64] randomly addressable translation which is appropriate for our global->local address space pathway.</p>
<h3><a href="#defining-classes-and-the-rest" name="defining-classes-and-the-rest"></a>Defining Classes And The Rest</h3>
<p>We now compile the bytecode to a class and call that class’s constructor:</p>
<pre><code class="clojure">user> (require '[insn.core :as insn])
nil
user> (def class-obj (insn/define class-def))
#'user/class-obj
user> (def first-constructor (first (.getConstructors class-obj)))
#'user/first-constructor
user> (def idx-obj (.newInstance first-constructor
(gtol/reduced-dims->constructor-args reduced-dims)))
#'user/idx-obj
</code></pre>
<p>And what did we get back?</p>
<pre><code class="clojure">user> (instance? tech.v2.datatype.LongReader idx-obj)
true
user> (count idx-obj)
262144
user> ;;The image was 256x256x4
user> (* 256 256 4)
262144
user> ;;Due to striding, there is a discontinuity at index 1024
user> (map idx-obj (range 1020 1030))
(1020 1021 1022 1023 8192 8193 8194 8195 8196 8197)
</code></pre>
<p>This is great! We now have an implementation of LongReader compiled specifically to the equation it takes to transform those dimensions (and that have the same properties) as efficiently as possible into the local address space. This overall type translation also works if we do something like reverse the indexes of the first dimension:</p>
<pre><code class="clojure">user> (def src-dims (dims/dimensions [256 256 [3 2 1 0]]
:strides [8192 4 1]))
#'user/src-dims
user> (def reduced-dims (dims-analytics/reduce-dimensionality src-dims))
#'user/reduced-dims
user> reduced-dims
{:shape [256 256 [3 2 1 0]],
:strides [8192 4 1],
:offsets [0 0 0],
:shape-ecounts [256 256 4],
:shape-ecount-strides #list<int64>[3]
[1024, 4, 1, ]}
user> (def test-ast (-> (gtol/reduced-dims->signature reduced-dims)
(gtol/signature->ast)))
#'user/test-ast
user> test-ast
{:signature
{:n-dims 3,
:direct-vec [true true false],
:offsets? false,
:broadcast? false,
:trivial-last-stride? true},
:ast
(+
(*
(quot idx {:ary-name :shape-ecount-stride, :dim-idx 0})
{:ary-name :stride, :dim-idx 0})
(*
(rem
(quot idx {:ary-name :shape-ecount-stride, :dim-idx 1})
{:ary-name :shape, :dim-idx 1})
{:ary-name :stride, :dim-idx 1})
(.read
{:ary-name :shape, :dim-idx 2}
(rem idx {:ary-name :shape, :dim-idx 2, :fn-name :lsize})))}
user> (def class-def (gtol/gen-ast-class-def test-ast))
#'user/class-def
user> class-def
{:name tech.v3.datatype.GToL3TTFOffFBcastFTrivLastST,
:interfaces [tech.v3.datatype.LongReader],
:fields
[{:flags #{:public :final}, :name "shape1", :type :long}
{:flags #{:public :final}, :name "shape2", :type tech.v3.datatype.Buffer}
{:flags #{:public :final}, :name "shape2-lsize", :type :long}
{:flags #{:public :final}, :name "shapeEcountStride0", :type :long}
{:flags #{:public :final}, :name "shapeEcountStride1", :type :long}
{:flags #{:public :final}, :name "stride0", :type :long}
{:flags #{:public :final}, :name "stride1", :type :long}
{:flags #{:public :final}, :name "nElems", :type :long}],
:methods
[{:flags #{:public},
:name :init,
:desc [[Ljava.lang.Object; [J [J [J [J :void],
:emit
[[:aload 0]
[:invokespecial :super :init [:void]]
[:aload 0]
[:aload 1]
[:ldc 1]
[:aaload]
[:checkcast java.lang.Long]
[:invokevirtual java.lang.Long "longValue"]
[:putfield :this "shape1" :long]
[:aload 0]
[:aload 1]
[:ldc 2]
[:aaload]
[:checkcast tech.v3.datatype.Buffer]
[:putfield :this "shape2" tech.v3.datatype.Buffer]
[:aload 0]
[:aload 1]
[:ldc 2]
[:aaload]
[:checkcast tech.v3.datatype.Buffer]
[:invokeinterface tech.v3.datatype.Buffer "lsize"]
[:putfield :this "shape2-lsize" :long]
[:aload 0]
[:aload 5]
[:ldc 0]
[:laload]
[:putfield :this "shapeEcountStride0" :long]
[:aload 0]
[:aload 5]
[:ldc 1]
[:laload]
[:putfield :this "shapeEcountStride1" :long]
[:aload 0]
[:aload 2]
[:ldc 0]
[:laload]
[:putfield :this "stride0" :long]
[:aload 0]
[:aload 2]
[:ldc 1]
[:laload]
[:putfield :this "stride1" :long]
[:aload 0]
[:aload 4]
[:ldc 0]
[:laload]
[:aload 5]
[:ldc 0]
[:laload]
[:lmul]
[:putfield :this "nElems" :long]
[:return]]}
{:flags #{:public},
:name "lsize",
:desc [:long],
:emit [[:aload 0] [:getfield :this "nElems" :long] [:lreturn]]}
{:flags #{:public},
:name "readLong",
:desc [:long :long],
:emit
[[:lload 1]
[:aload 0]
[:getfield :this "shapeEcountStride0" :long]
[:ldiv]
[:aload 0]
[:getfield :this "stride0" :long]
[:lmul]
[:lload 1]
[:aload 0]
[:getfield :this "shapeEcountStride1" :long]
[:ldiv]
[:aload 0]
[:getfield :this "shape1" :long]
[:lrem]
[:aload 0]
[:getfield :this "stride1" :long]
[:lmul]
[:ladd]
[:aload 0]
[:getfield :this "shape2" tech.v3.datatype.Buffer]
[:lload 1]
[:aload 0]
[:getfield :this "shape2-lsize" :long]
[:lrem]
[:invokeinterface tech.v3.datatype.Buffer "readLong"]
[:ladd]
[:lreturn]]}]}
user> (def class-obj (insn/define class-def))
#'user/class-obj
user> (def first-constructor (first (.getDeclaredConstructors class-obj)))
#'user/first-constructor
user> (def reversed-idx-obj (.newInstance first-constructor
(gtol/reduced-dims->constructor-args reduced-dims)))
#'user/reversed-idx-obj
user> (map reversed-idx-obj (range 1020 1030))
(1023 1022 1021 1020 8195 8194 8193 8192 8199 8198)
</code></pre>
<h2><a href="#wrapping-up" name="wrapping-up"></a>Wrapping Up</h2>
<p>We covered a lot of ground so if you are still reading at this point, good on you!</p>
<p>The JVM presents a great platform for high performance computing and being able to generate great code out of abstract syntax trees allows us to customize what we are doing to precisely the conditions present. Declaratively producing your bytecode allows us to easily visually debug what is going on. Our clojure representation is strikingly close to the representation used by the javap decompiler which allows us to easily compare what we are doing to what javac will do and thus bootstrap a new problem without having to be experts in the JVM bytecode.</p>
<p>We hope this encourages you to explore what is possible with extremely late-bound and abstract transformations of your problem space! In this way we can take advantage of the highest levels of abstraction possible but not pay a large performance cost for using these abstractions which, put colloquially, just makes our programming lives better and more dynamic. For an example of a really well done system for producing late-bound but extremely fast code allow us direct you to CMUCL’s <a href="https://www.cons.org/cmucl/doc/different-compilers.html">compiler stack</a> :-).</p>
<p>Enjoy!</p></div></div></div></body></html>