log-surgeon
is a library for high-performance parsing of unstructured text
logs. It allows users to parse and extract information from the vast amount of
unstructured logs generated by today's open-source software.
Some of the library's features include:
- Parsing and extracting variable values like the log event's log-level and any other user-specified variables, no matter where they appear in each log event.
- Parsing by using regular expressions for each variable type rather than regular expressions for an entire log event.
- Improved latency, and memory efficiency compared to popular regex engines.
- Parsing multi-line log events (delimited by timestamps).
Note that log-surgeon
is not a generic regex engine and does impose some
constraints on how log events can be parsed.
Let's say we want to parse and inspect multi-line log events like this:
2023-02-23T18:10:14-0500 DEBUG task_123 crashed. Dumping stacktrace:
#0 0x000000000040110e in bar () at example.cpp:6
#1 0x000000000040111d in bar () at example.cpp:10
#2 0x0000000000401129 in main () at example.cpp:15
Using the example schema file which includes these rules:
timestamp:\d{4}\-\d{2}\-\d{2}T\d{2}:\d{2}:\d{2}\-\d{4}
...
loglevel:INFO|DEBUG|WARN|ERROR
We can parse and inspect the events as follows:
// Define a reader to read from your data source
Reader reader{/* <Omitted> */};
// Instantiate the parser
ReaderParser parser{"examples/schema.txt"};
parser.reset_and_set_reader(reader);
// Get the loglevel variable's ID
optional<uint32_t> loglevel_id{parser.get_variable_id("loglevel")};
// <Omitted validation of loglevel_id>
while (false == parser.done()) {
if (ErrorCode err{parser.parse_next_event()}; ErrorCode::Success != err) {
throw runtime_error("Parsing Failed");
}
// Get and print the timestamp
Token* timestamp{event.get_timestamp()};
if (nullptr != timestamp) {
cout << "timestamp: " << timestamp->to_string_view() << endl;
}
// Get and print the log-level
auto const& loglevels = event.get_variables(*loglevel_id);
if (false == loglevels.empty()) {
// In case there are multiple matches, just get the first one
cout << "loglevel:" << loglevels[0]->to_string_view() << endl;
}
// Other analysis...
// Print the entire event
LogEventView const& event = parser.get_log_parser().get_log_event_view();
cout << event->to_string() << endl;
}
For advanced uses, log-surgeon
also has a
BufferParser that reads directly from a buffer.
Requirements:
- CMake
- GCC >= 10 or Clang >= 7
- Catch2 >= 3
- On Ubuntu <= 20.04, you can install it using:
sudo tools/deps-install/ubuntu/install-catch2.sh 3.6.0
- On Ubuntu >= 22.04, you can install it using:
sudo apt-get update sudo apt-get install catch2
- On macOS, you can install it using:
brew install catch2
- On Ubuntu <= 20.04, you can install it using:
From the repo's root, run:
# Generate the CMake project
cmake -S . -B build -DBUILD_TESTING=OFF
# Build the project
cmake --build ./build -j
# Install the project to ~/.local
cmake --install ./build --prefix ~/.local
To build the debug version and tests replace the first command with:
cmake -S . -B ./build -DCMAKE_BUILD_TYPE=Debug -DBUILD_TESTING=ON
- docs contains more detailed documentation including:
- The schema specification, which describes the syntax for writing your own schema
log-surgeon
's design objectives
- examples contains programs demonstrating usage of the library.
To run unit tests, run:
cmake --build ./build --target test
Before submitting a PR, ensure you've run our linting tools and either fixed any violations or suppressed the warning.
We currently support running our linting tools on Linux and macOS. If you're developing on another OS, you can submit a feature request. If you can't run the linting workflows locally, you can enable and run the lint workflow in your fork.
To run the linting tools, besides commonly installed tools like tar
, you'll need:
- clang-tidy
md5sum
- Python 3.8 or newer
- python3-venv
- Task
./tools/init.sh
Currently, clang-tidy
has to be run manually:
find src tests \
-type f \
\( -iname "*.cpp" -o -iname "*.hpp" \) \
-print0 | \
xargs -0 clang-tidy --config-file .clang-tidy -p build
To report all errors run:
task lint:check
To fix cpp errors, and report yml errors, run:
task lint:fix
You can use GitHub issues to report a bug or request a feature.
Join us on Zulip to chat with developers and other community members.
The following are issues we're aware of and working on:
- Schema rules must use ASCII characters. We will release UTF-8 support in a future release.
- Timestamps must appear at the start of the message to be handled specially (than other variable values) and support multi-line log events.
- A variable pattern has no way to match text around a variable, without having
it also be a part of the variable.
- Support for submatch extraction will be coming in a future release.