This is a fork that publishes jars to com.cnuernber/jarrow on clojars. This is LGPL code so it is necessary users can replace this jar at a late-binding stage.
Jarrow is a lightweight java implementation for I/O of data stored in formats related to Apache Arrow. Currently, it only has support for the Arrow-related Feather format, but it may in future grow support for the Arrow IPC File format or other evolutions of Feather. Or it may not.
Why write this when there's already a Java implementation of Feather I/O provided by Apache? I wanted something without all those dependencies, and for which I had full control over the data access. I'm using it to provide Feather table I/O handlers in STIL/TOPCAT.
This library probably does less clever stuff than the Apache one but it's much more compact and has no external dependencies.
If you want the library, the best thing is just to pick up the
pre-built jarrow.jar
file from the
release.
However, if you want to build it from source, there's a makefile. It may need editing since some targets contains references to directories you don't have. But basically to build the library you just need to run javac on all the java files.
The source file is Java 1.6 compatible, and the distributed jarrow.jar file contains Java 1.6-compatible classes.
Only feather files are currently supported. All feather files can be read, but currently the following column types are not fully supported on input:
- CATEGORY: I haven't come across any feather files with category column types, and it's not clear to me how to interpret the feather format documentation for this type, so it's not supported.
- UINT64: There's no java primitive or primitive-wrapper type that can represent unsigned 64-bit integers, so it's are not supported.
- TIMESTAMP, DATE, TIME: These values can be read, but the type-specific metadata/unit information is not currently available.
The reading is implemented using memory mapping (MappedByteBuffers).
The LARGE_UTF8
and LARGE_BINARY
types defined in the
Arrow
but not in the
Feather
version of the flatbuffers metadata file are supported.
The flatbuffers java source files are generated by running the flatc compiler from Google Flatbuffers version 1.11.0 on the Arrow version of feather.fbs. I subsequently moved the generated source files into a different java package to avoid possible namespace clashes with external code that may use a different version of flatbuffers.
Comprehensive documentation is provided in the javadocs.
The classes in the package uk.ac.bristol.star.feather form the
usable parts of the I/O library. The classes in the
uk.ac.bristol.star.fbs.* packages are flatbuffer support files
that you shouldn't need to use.
To read a table, you can use FeatherTable.fromFile(File)
method;
examples in FeatherTable.main
.
To write a table, use FeatherWriter.write(OutputStream)
;
this requires you to implement some FeatherColumnWriter
objects
in some way appropriate to the data structures in which your table
data resides; there are examples in FeatherWriter.main
.
I don't know whether anybody else will want to use this package. If you do, and if you are interested in features that are not currently present, please contact me (@mbtaylor).
This library includes google flatbuffers code which is licenced under the Apache 2.0 licence. I'm prepared to offer any licence to the original parts of this project that suits you and that's legally possible. For now, I assert that it's licenced under the LGPL. Unless somebody tells me I'm not allowed to do that.
- Version 1.0 (27 Feb 2020): Initial release