Skip to content
/ logzip Public

An optimized log compression tool via iterative clustering [ASE'19]

License

Notifications You must be signed in to change notification settings

logpai/logzip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

logzip

Logzip is an efficient compression tool specific for log files. It compresses log files by utilizing the inherent structures of raw log messages, and thereby achieves a high compression ratio.

The repository contains the source code of logzip. More details could be found in our ASE2019 paper:

[ASE'19] Jinyang Liu, Jieming Zhu, Shilin He, Pinjia He, Zibin Zheng, Michael R. Lyu. Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression. International Conference on Automated Software Engineering (ASE), 2019.

Prerequisites

  • python3
  • pandas

Installation

Logzip can be directly execute through source code.

  1. Download and install python3 here.

  2. Install Pandas.

    $ pip3 install pandas

  3. Clone logzip.

    $ clone https://github.com/logpai/logzip.git

Data

We've conducted comprehensive experiments to evaluate the efficiency of logzip on five real-world datasets. All the datasets that we use are available at loghub.

Usage

A demo is uploaded to this repo (logzip/demo). We use a HDFS log file with 2k lines as a demo.

For other kinds of logs, please specify templates (could be generated by a log parser) and log format accordingly.

Compression

$ cd logzip/demo/
$ python3 zip_demo.py

About

An optimized log compression tool via iterative clustering [ASE'19]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages