Skip to content

Commit c7d38bd

Browse files
committed
a more consistent implementation for diffs across the board
1 parent e716fb1 commit c7d38bd

27 files changed

+537
-209
lines changed

README.md

+50-13
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
<dependency>
55
<groupId>io.lacuna</groupId>
66
<artifactId>bifurcan</artifactId>
7-
<version>0.2.0-alpha1</version>
7+
<version>0.2.0-alpha4</version>
88
</dependency>
99
```
1010

@@ -16,13 +16,14 @@ This library provides high-quality Java implementations of mutable and immutable
1616
* customizable equality semantics
1717
* contiguous memory used wherever possible
1818
* performance equivalent to, or better than, existing alternatives
19+
* changes to a collection can be tracked in a **diff** data structure, which can be subsequently rebased onto a different collection
1920
* [ALPHA] durable (disk-backed) representations which share the API and asymptotic performance of their in-memory counterparts
2021

2122
Rather than using the existing collection interfaces in `java.util` such as `List` or `Map`, it provides its own interfaces (`IList`, `IMap`, `ISet`) that provide functional semantics - each update to a collection returns a reference to a new collection. Each interface provides a method (`toList`, `toMap`, `toSet`) for coercing the collection to a read-only version of the standard Java interfaces.
2223

2324
### what makes this better?
2425

25-
Some aspects of this library, like the inverted indices and durable collections, are unique.
26+
Some aspects of this library, like the inverted indices, diffs, and durable collections, are unique.
2627

2728
There are, however, many existing implementations of "functional" (aka persistent, immutable) data structures on the JVM. As shown in [these in-depth comparisons](https://github.com/lacuna/bifurcan/blob/master/doc/comparison.md), Bifurcan's performance is equivalent to the best existing implementations for basic operations, and significantly better for batch operations such as `union`, `intersection`, and `difference`.
2829

@@ -80,28 +81,64 @@ for (int i = 0; i < 1000; i++) {
8081
}
8182
```
8283

83-
If we call `forked()` on this collection, it will be wrapped in an immutable "diff" facade which tracks changes without touching the underlying collection. These facades have similar performance to typical collections, but do not support efficient set operations.
84+
If we call `forked()` on this collection, it will be wrapped in a **diff** facade, which is described below.
8485

8586
### virtual collections
8687

87-
These facades also allow us to define collections programmatically:
88+
Bifurcan offers a variety of collection implementations, but you can also create your own by implementing a handful of methods.
89+
90+
A list, at its base, is just a `size` and a function that, given an index, returns the corresponding element. This can be constructed using the `Lists.from` method:
8891

8992
```java
90-
// a list of numbers within [0,1e6)
9193
IList<Long> list = Lists.from(1_000_000, i -> i);
92-
93-
// the set of numbers within [0,1e6)
94-
ISet<Long> set = Sets.from(list, i -> (0 <= i && i < list.size()) ? OptionalLong.of(i) : OptionalLong.empty());
95-
96-
// a map of numbers within [0,1e6) onto their square
97-
IMap<Long, Long> map = Maps.from(set, i -> i * i);
9894
```
9995

100-
These collections are not realized in-memory, and can be used as a translation layer for other data structure implementations. Using our facades, however, we can still update them like any other collection, and only those changes will be directly represented in-memory.
96+
This creates a list of the numbers within `[0, 1e6)` without any of the elements being stored in memory. All of the other operations associated with lists (adding and removing elements, updating elements, concatenating other lists, and so on) have efficient default implementations, which will be discussed in the next section.
97+
98+
An unsorted set is just a list of elements, plus a function that, given an value, returns an `OptionalLong` describing the index of that element:
99+
100+
```java
101+
Function<Long, OptionalLong> indexOf = n -> (0 <= n && n < list.size()) ? OptionalLong.of(i) : OptionalLong.empty();
102+
103+
ISet<Long> set = Sets.from(list, indexOf)
104+
```
105+
106+
A sorted set, conversely, is a list of elements, a comparator, and a function that, given a value, returns an `OptionalLong` describing the index of the closest element which equal to or less than that value (referred to as the "floor index"):
107+
108+
```java
109+
Function<Double, OptionalLong> floorIndexOf = n -> indexOf.apply((long) n);
110+
111+
ISet<Double> sortedSet = Sets.from(list, Comparator.naturalOrder(), floorIndexOf);
112+
```
113+
114+
Sorted and unsorted maps are just their corresponding sets, plus a function from key to value. These can be constructed using `Maps.from`, or by calling `zip` on a set:
115+
116+
```java
117+
IMap<Long, Double> squareRoots = set.zip(n -> Math.sqrt(n))
118+
```
119+
120+
### diffs
121+
122+
These virtual collections can be modified just like any other Bifurcan collection:
123+
124+
```java
125+
Lists.from(1, x -> 1).addLast(42)
126+
// [1, 42]
127+
```
128+
129+
This is made possible by [diffs](https://lacuna.io/docs/bifurcan/io/lacuna/bifurcan/IDiff.html), which track changes on an immutable **underlying** collection. Diff implementations exists for all variants of Bifurcan collections, and share the asymptotic performance of their normal counterparts. By calling `diff()` on any collection, we create a diff wrapper whose changes can then be **rebased** onto a new underlying collection:
130+
131+
```java
132+
IList<Integer> numDiff = List.of(1, 2, 3).diff().removeFirst().addLast(42)
133+
// [2, 3, 42]
134+
135+
IList<Integer> rebased = numDiffs.rebase(List.of(4, 5, 6))
136+
// [5, 6, 42]
137+
```
101138

102139
### durable collections
103140

104-
All in-memory structures can be saved to disk, while retaining the same API and asymptotic performance. These durable collections are optimized for reads and batched writes, which means they are not a replacement for general-purpose databases, but they are still [useful in a variety of applications](doc/durable.md).
141+
All in-memory structures can be also saved to disk, while retaining the same API and asymptotic performance. These durable collections are optimized for reads and batched writes, which means they are not a replacement for general-purpose databases, but they are still [useful in a variety of applications](doc/durable.md).
105142

106143
### no lazy collections
107144

src/io/lacuna/bifurcan/FloatMap.java

+2-3
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ public OptionalLong floorIndex(double key) {
122122
}
123123

124124
@Override
125-
public OptionalLong floorIndex(Double key) {
125+
public OptionalLong inclusiveFloorIndex(Double key) {
126126
return floorIndex((double) key);
127127
}
128128

@@ -139,8 +139,7 @@ public FloatMap<V> slice(double min, double max) {
139139
return new FloatMap<>(map.slice(doubleToLong(min), doubleToLong(max)));
140140
}
141141

142-
@Override
143-
public FloatMap<V> slice(Double min, Double max) {
142+
public FloatMap<V> sliceReal(Double min, Double max) {
144143
return slice((double) min, (double) max);
145144
}
146145

src/io/lacuna/bifurcan/IDiff.java

+16-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,20 @@
11
package io.lacuna.bifurcan;
22

3-
public interface IDiff<C extends ICollection<C, V>, V> {
3+
/**
4+
* A generic interface for diffs, which represent changes to an underlying collection, and which can be rebased atop
5+
* a new underlying collection.
6+
*/
7+
public interface IDiff<C> {
8+
9+
/**
10+
* The underlying collection
11+
*/
412
C underlying();
13+
14+
/**
15+
* Returns a new diff, which is rebased atop the new underlying collection. The returned diff may not reflect any
16+
* changes which cannot be applied to the new collection (removing an element which isn't present in the new
17+
* collection, for instance), and so {@code a.rebase(b).rebase(c) } is not necessarily equivalent to {@code a.rebase(c) }.
18+
*/
19+
IDiff<C> rebase(C newUnderlying);
520
}

src/io/lacuna/bifurcan/IDiffList.java

+32-16
Original file line numberDiff line numberDiff line change
@@ -4,34 +4,50 @@
44

55
import java.util.Iterator;
66

7-
public interface IDiffList<V> extends IList<V>, IDiff<IList<V>, V> {
7+
public interface IDiffList<V> extends IList<V>, IDiff<IList<V>> {
88

9-
class Range {
10-
public final long start, end;
9+
interface Durable<V> extends IDiffList<V>, IDurableCollection {
10+
}
11+
12+
/**
13+
* A descriptor for the number of elements removed from the front and back of the underlying list.
14+
*/
15+
class Slice {
16+
public final long fromFront, fromBack;
17+
18+
public Slice(long fromFront, long fromBack) {
19+
this.fromFront = fromFront;
20+
this.fromBack = fromBack;
21+
}
1122

12-
public Range(long start, long end) {
13-
this.start = start;
14-
this.end = end;
23+
public long size(IList<?> underlying) {
24+
return Math.max(0, underlying.size() - (fromBack + fromFront));
1525
}
1626

17-
public long size() {
18-
return end - start;
27+
public <V> V nth(IList<V> underlying, long idx) {
28+
return underlying.nth(idx + fromFront);
29+
}
30+
31+
public <V> Iterator<V> iterator(IList<V> underlying, long startIdx) {
32+
return Iterators.range(fromFront + startIdx, size(underlying), underlying::nth);
1933
}
2034
}
2135

2236
IList<V> underlying();
2337

24-
Range underlyingSlice();
38+
Slice slice();
2539

2640
IList<V> prefix();
2741

2842
IList<V> suffix();
2943

44+
IDiffList<V> rebase(IList<V> newUnderlying);
45+
3046
@Override
3147
default IList<V> concat(IList<V> l) {
3248
IList<V> result = Lists.concat(
3349
prefix(),
34-
underlying().slice(underlyingSlice().start, underlyingSlice().end),
50+
underlying().slice(slice().fromFront, slice().fromBack),
3551
suffix(),
3652
l
3753
);
@@ -41,7 +57,7 @@ default IList<V> concat(IList<V> l) {
4157

4258
@Override
4359
default long size() {
44-
return prefix().size() + underlyingSlice().size() + suffix().size();
60+
return (prefix().size() + suffix().size() + slice().size(underlying()));
4561
}
4662

4763
@Override
@@ -51,20 +67,20 @@ default V nth(long idx) {
5167
}
5268
idx -= prefix().size();
5369

54-
if (idx < underlyingSlice().size()) {
55-
return underlying().nth(underlyingSlice().start + idx);
70+
long underlyingSize = slice().size(underlying());
71+
if (idx < underlyingSize) {
72+
return slice().nth(underlying(), idx);
5673
}
57-
idx -= underlyingSlice().size();
74+
idx -= underlyingSize;
5875

5976
return suffix().nth(idx);
6077
}
6178

6279
@Override
6380
default Iterator<V> iterator() {
64-
Range r = underlyingSlice();
6581
return Iterators.concat(
6682
prefix().iterator(),
67-
r.size() == underlying().size() ? underlying().iterator() : Iterators.range(r.start, r.end, underlying()::nth),
83+
slice().iterator(underlying(), 0),
6884
suffix().iterator()
6985
);
7086
}

src/io/lacuna/bifurcan/IDiffMap.java

+7-2
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,10 @@
1313
import java.util.function.BiPredicate;
1414
import java.util.function.ToLongFunction;
1515

16-
public interface IDiffMap<K, V> extends IMap<K, V>, IDiff<IMap<K, V>, IEntry<K, V>> {
16+
public interface IDiffMap<K, V> extends IMap<K, V>, IDiff<IMap<K, V>> {
17+
18+
interface Durable<K, V> extends IDiffMap<K, V>, IDurableCollection {
19+
}
1720

1821
/**
1922
* The baseline data structure.
@@ -30,6 +33,8 @@ public interface IDiffMap<K, V> extends IMap<K, V>, IDiff<IMap<K, V>, IEntry<K,
3033
*/
3134
ISortedSet<Long> removedIndices();
3235

36+
IDiffMap<K, V> rebase(IMap<K, V> newUnderlying);
37+
3338
@Override
3439
default ToLongFunction<K> keyHash() {
3540
return underlying().keyHash();
@@ -84,7 +89,7 @@ default Iterator<IEntry<K, V>> iterator() {
8489
}
8590

8691
@Override
87-
default Durable<K, V> save(IDurableEncoding encoding, Path directory) {
92+
default IMap.Durable<K, V> save(IDurableEncoding encoding, Path directory) {
8893
if (removedIndices().size() == 0 && added().size() == 0) {
8994
return underlying().save(encoding, directory);
9095
} else {

src/io/lacuna/bifurcan/IDiffSet.java

+6-1
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,15 @@
44
import java.util.function.BiPredicate;
55
import java.util.function.ToLongFunction;
66

7-
public interface IDiffSet<V> extends ISet<V>, IDiff<IMap<V, Void>, IEntry<V, Void>> {
7+
public interface IDiffSet<V> extends ISet<V>, IDiff<IMap<V, Void>> {
8+
9+
interface Durable<V> extends IDiffSet<V>, IDurableCollection {
10+
}
811

912
IDiffMap<V, Void> diffMap();
1013

14+
IDiffSet<V> rebase(IMap<V, Void> newUnderlying);
15+
1116
default IMap<V, Void> underlying() {
1217
return diffMap().underlying();
1318
}

src/io/lacuna/bifurcan/IDiffSortedMap.java

+17-4
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,28 @@
22

33
import io.lacuna.bifurcan.utils.Iterators;
44

5+
import java.util.Comparator;
56
import java.util.Iterator;
67
import java.util.OptionalLong;
78

8-
public interface IDiffSortedMap<K, V> extends ISortedMap<K, V> {
9+
public interface IDiffSortedMap<K, V> extends IDiff<ISortedMap<K, V>>, ISortedMap<K, V> {
10+
11+
interface Durable<K, V> extends IDiffSortedMap<K, V>, IDurableCollection {
12+
}
13+
14+
ISortedMap<K, V> underlying();
915

1016
ISortedMap<K, ISortedMap<K, V>> segments();
1117

1218
ISortedSet<Long> segmentOffsets();
1319

20+
IDiffSortedMap<K, V> rebase(ISortedMap<K, V> newUnderlying);
21+
22+
@Override
23+
default Comparator<K> comparator() {
24+
return underlying().comparator();
25+
}
26+
1427
@Override
1528
default long size() {
1629
return segments().size() == 0
@@ -19,18 +32,18 @@ default long size() {
1932
}
2033

2134
@Override
22-
default OptionalLong floorIndex(K key) {
35+
default OptionalLong inclusiveFloorIndex(K key) {
2336
ISortedMap<K, ISortedMap<K, V>> segments = segments();
2437
ISortedSet<Long> segmentOffsets = segmentOffsets();
2538

26-
OptionalLong oSegmentIdx = segments.floorIndex(key);
39+
OptionalLong oSegmentIdx = segments.inclusiveFloorIndex(key);
2740
if (!oSegmentIdx.isPresent()) {
2841
return OptionalLong.empty();
2942
}
3043
long segmentIdx = oSegmentIdx.getAsLong();
3144

3245
ISortedMap<K, V> segment = segments.nth(segmentIdx).value();
33-
OptionalLong oIdx = segment.floorIndex(key);
46+
OptionalLong oIdx = segment.inclusiveFloorIndex(key);
3447

3548
if (oIdx.isPresent()) {
3649
return OptionalLong.of(segmentOffsets.nth(segmentIdx) + oIdx.getAsLong());

src/io/lacuna/bifurcan/IDiffSortedSet.java

+13-3
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,20 @@
66
import java.util.Iterator;
77
import java.util.OptionalLong;
88

9-
public interface IDiffSortedSet<V> extends ISortedSet<V> {
9+
public interface IDiffSortedSet<V> extends IDiff<ISortedSet<V>>, ISortedSet<V> {
10+
11+
interface Durable<V> extends IDiffSortedSet<V>, IDurableCollection {
12+
}
1013

1114
IDiffSortedMap<V, Void> diffMap();
1215

16+
IDiffSortedSet<V> rebase(ISortedSet<V> newUnderlying);
17+
18+
@Override
19+
default ISortedSet<V> underlying() {
20+
return diffMap().underlying().keys();
21+
}
22+
1323
@Override
1424
default Comparator<V> comparator() {
1525
return diffMap().comparator();
@@ -21,8 +31,8 @@ default long size() {
2131
}
2232

2333
@Override
24-
default OptionalLong floorIndex(V value) {
25-
return diffMap().floorIndex(value);
34+
default OptionalLong inclusiveFloorIndex(V value) {
35+
return diffMap().inclusiveFloorIndex(value);
2636
}
2737

2838
@Override

0 commit comments

Comments
 (0)