Bayard is a full-text search and indexing server written in Rust built on top of Tantivy that implements The Raft Consensus Algorithm by raft-rs and The gRPC (HTTP/2 + Protocol Buffers) by grpc-rs and rust-protobuf.
Achieves consensus across all the nodes, ensures every change made to the system is made to a quorum of nodes.
Bayard makes easy for programmers to develop search applications with advanced features and high availability.
- Full-text search/indexing
- Index replication
- Bringing up a cluster
- Command line interface is available
$ make build
Running node in standalone mode is easy. See following command:
$ ./bin/bayard serve
Indexing a document is as following:
$ ./bin/bayard set 1 '{"text":"Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust."}'
$ ./bin/bayard set 2 '{"text":"Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java."}'
$ ./bin/bayard set 3 '{"text":"Bleve is a modern text indexing library for go."}'
$ ./bin/bayard set 4 '{"text":"Whoosh is a fast, pure Python search engine library."}'
$ ./bin/bayard set 5 '{"text":"Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more."}'
$ ./bin/bayard set 6 '{"text":"Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured."}'
$ ./bin/bayard set 7 '{"text":"Riot is Go Open Source, Distributed, Simple and efficient full text search engine."}'
$ ./bin/bayard set 8 '{"text":"Blast is a full text search and indexing server, written in Go, built on top of Bleve."}'
$ ./bin/bayard set 9 '{"text":"Toshi is meant to be a full-text search engine similar to Elasticsearch. Toshi strives to be to Elasticsearch what Tantivy is to Lucene."}'
$ ./bin/bayard set 10 '{"text":"Sonic is a fast, lightweight and schema-less search backend."}'
$ ./bin/bayard set 11 '{"text":"Bayard is a full text search and indexing server, written in Rust, built on top of Tantivy."}'
Getting a document is as following:
$ ./bin/bayard get 1 | jq .
You can see the result in JSON format. The result of the above command is:
{
"id": [
"11"
],
"text": [
"Bayard is a full text search and indexing server, written in Rust, built on top of Tantivy."
]
}
Searching documents is as like following:
$ ./bin/bayard search text:"search engine" | jq .
You can see the result in JSON format. The result of the above command is:
[
{
"id": [
"4"
],
"text": [
"Whoosh is a fast, pure Python search engine library."
]
},
{
"id": [
"7"
],
"text": [
"Riot is Go Open Source, Distributed, Simple and efficient full text search engine."
]
},
{
"id": [
"1"
],
"text": [
"Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust."
]
},
{
"id": [
"2"
],
"text": [
"Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java."
]
},
{
"id": [
"6"
],
"text": [
"Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured."
]
},
{
"id": [
"9"
],
"text": [
"Toshi is meant to be a full-text search engine similar to Elasticsearch. Toshi strives to be to Elasticsearch what Tantivy is to Lucene."
]
},
{
"id": [
"10"
],
"text": [
"Sonic is a fast, lightweight and schema-less search backend."
]
},
{
"id": [
"11"
],
"text": [
"Bayard is a full text search and indexing server, written in Rust, built on top of Tantivy."
]
},
{
"id": [
"8"
],
"text": [
"Blast is a full text search and indexing server, written in Go, built on top of Bleve."
]
}
]
$ ./bin/bayard delete 11
Bayard can easily bring up a cluster. Running in standalone is not fault tolerant. If you need to improve fault tolerance, start two more nodes as follows:
$ ./bin/bayard serve \
--host=0.0.0.0 \
--port=5001 \
--id=1 \
--peers="1=0.0.0.0:5001" \
--data-directory=./data/1 \
--schema-file=./etc/schema.json \
--unique-key-field-name=id
$ ./bin/bayard serve \
--host=0.0.0.0 \
--port=5002 \
--id=2 \
--peers="1=0.0.0.0:5001,2=0.0.0.0:5002" \
--leader-id=1 \
--data-directory=./data/2 \
--schema-file=./etc/schema.json \
--unique-key-field-name=id
$ ./bin/bayard serve \
--host=0.0.0.0 \
--port=5003 \
--id=3 \
--peers="1=0.0.0.0:5001,2=0.0.0.0:5002,3=0.0.0.0:5003" \
--leader-id=1 \
--data-directory=./data/3 \
--schema-file=./etc/schema.json \
--unique-key-field-name=id
Above example shows each Bayard node running on the same host, so each node must listen on different ports. This would not be necessary if each node ran on a different host.
Recommend 3 or more odd number of nodes in the cluster. In failure scenarios, data loss is inevitable, so avoid deploying single nodes.
If one of the nodes in a cluster goes down due to a hardware failure and raft logs and metadata is lost, that node cannot join the cluster again.
$ ./bin/bayard leave \
--host=127.0.0.1 \
--port=5001 \
--id=3 \
--peers="1=0.0.0.0:5001,2=0.0.0.0:5002" \
--leader-id=1