KnowLog: Knowledge Enhanced Pre-trained Language Model for Log Understanding

This repository contains PyTorch implementation for KnowLog: Knowledge Enhanced Pre-trained Language Model for Log Understanding.

Logs as semi-structured text are rich in semantic information, making their comprehensive understanding crucial for automated log analysis. With the recent success of pre-trained language models in natural language processing, many studies have leveraged these models to understand logs. Despite their successes, existing pre-trained language models still suffer from three weaknesses. Firstly, these models fail to understand domain-specific terminology, especially abbreviations. Secondly, these models struggle to adequately capture the complete log context information. Thirdly, these models have difficulty in obtaining universal representations of different styles of the same logs. To address these challenges, we introduce KnowLog, a knowledge-enhanced pre-trained language model for log understanding. Specifically, to solve the previous two challenges, we exploit abbreviations and natural language descriptions of logs from public manuals as local and global knowledge, respectively, and leverage this knowledge by designing novel pre-training tasks for enhancing the model. To solve the last challenge, we design a contrastive learning-based pre-training task to obtain universal representations. We evaluate KnowLog by fine-tuning it on six different log understanding tasks. Extensive experiments demonstrate that KnowLog significantly enhances log understanding and achieves state-of-the-art results compared to existing pre-trained language models without knowledge enhancement.

Usage

Requirements

tokenizers==0.12.1
scikit-learn==1.1.2
tqdm==4.64.1
transformers==4.22.2
numpy==1.22.4
torch==1.10.1
huggingface-hub==0.10.0

pip install -r requirements.txt

Structure of Files

KnowLog
 |-- datasets	
 |    |-- pre-train # Pre-training datasets
 |    |-- tasks # Downstream tasks datasets
 
 |-- sentence_transformers # We modified the code for losses, evaluation and SentenceTransformer to implement KnowLog
 |    |-- cross_encoder
 |    |-- datasets
 |    |-- evaluation
 |    |-- losses
 |    |-- models
 |    |-- readers
 |    |-- __init__.py
 |    |-- LoggingHandler.py
 |    |-- model_card_templates.py
 |    |-- SentenceTransformer.py 
 |    |-- util.py

 |-- KnowLog_finetune_pair.py # fine-tune for log_pair tasks

 |-- KnowLog_finetune_single.py # fine-tune for log_single tasks

 |-- KnowLog_pretrain.py # pre-train main

Pre-training Dataset

We collecte 96,060 log templates and the corresponding natural language descriptions from Cisco, Huawei and H3C public product manuals for four products in switches, routers, security and WLAN. The data statistics are as follows:

	Switches	Routers	Security	WLAN	All
Cisco	41,628	22,479	1,578	6,591	72,276
Huawei	6,418	4,980	3,737	1,001	16,136
H3C	2,171	2,364	1,852	1,261	7,648
All	50,217	29,823	7,167	8,853	96,096

Required pre-trained models

In our code, we use 'bert-base-uncased' and 'roberta-base' as the pre-trained model, and you can use 'bert-base-uncased' or 'roberta-base' directly or download bert-base-uncased,roberta-base into your directory.

Training

To train KnowLog from scratch, run:

python KnowLog_pretrain.py --pretrain_data ./datasets/pre-train/all_log.json --abbr ./datasets/pre-train/abbr.json --base_model bert-base-uncased

Evaluation

To evaluate the model on log-single tasks, run:

python KnowLog_finetune_single.py --train_data ./datasets/tasks/MC/hw_switch_train.json --dev_data ./datasets/tasks/MC/hw_switch_dev.json --test_data ./datasets/tasks/MC/hw_switch_test.json

To evaluate the model on log-pair tasks, run:

python KnowLog_finetune_pair.py --train_data ./datasets/tasks/MC/hw_switch_train.json --dev_data ./datasets/tasks/MC/hw_switch_dev.json --test_data ./datasets/tasks/MC/hw_switch_test.json

License

MIT License

Acknowledgements

Our code is inspired by sentence-transformers, Hugging Face

Our pre-training datasets and downstream tasks datasets are collected from public manuals Cisco, Huawei and H3C.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KnowLog: Knowledge Enhanced Pre-trained Language Model for Log Understanding

Usage

Requirements

Structure of Files

Pre-training Dataset

Required pre-trained models

Training

Evaluation

License

Acknowledgements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.idea		.idea
KnowLog		KnowLog
datasets		datasets
fig		fig
sentence_transformers		sentence_transformers
KnowLog_finetune_pair.py		KnowLog_finetune_pair.py
KnowLog_finetune_single.py		KnowLog_finetune_single.py
KnowLog_pretrain.py		KnowLog_pretrain.py
README.md		README.md
requirements.txt		requirements.txt

LeaperOvO/KnowLog

Folders and files

Latest commit

History

Repository files navigation

KnowLog: Knowledge Enhanced Pre-trained Language Model for Log Understanding

Usage

Requirements

Structure of Files

Pre-training Dataset

Required pre-trained models

Training

Evaluation

License

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages