Skip to content
/ ETCPy Public

Compute Effort To Compress (ETC) using the NSRPS algorithm in Python

License

Notifications You must be signed in to change notification settings

pranaysy/ETCPy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETCPy

Effort-To-Compress in Python


What is this

A Python implementation of the compression-complexity measure called Effort-To-Compress or ETC. ETC captures the compressibility and complexity of discrete symbolic sequences using lossless compression. It has been shown to robustly estimate complexity, comparing favorably for short and noisy time series in comparison with entropy and Lempel-Ziv complexity.

Using ETC, causal information flow between multiple discrete symbolic sequences can be assessed and recently, such a use has been presented, rigorously proven and demonstrated to be an effective model-free measure of causality. Introduced as Compression-Complexity Causality or CCC, this measure is robust to numerous data contaminants, noise sources and pre-processing artifacts. On comparison with Granger Causality and Transfer Entropy, CCC compares favorably and outperforms them on synthetic as well as real world causal interactions. An implementation of CCC is included in this repository.

While any lossless compressor may be used with ETC and subsequently with CCC, a grammar-based lossless compression algorithm called Non-Sequential Recursive Pair Substitution or NSRPS is used presently. NSRPS has been rigorously studied and shown to be an effective tool for data compression and entropy estimation. This repository also contains a fast Cython implementation of NSRPS for use with ETC and CCC.

References

What can it do

Study Haemodynamics, Heart-Rate Variability and Cardiac Aging using ECG/EKG

Network Neuroscience, Psychophysics and Scientific Study of Consciousness

Genome Complexity Analysis and Classification of Nucleotide Sequences

Audio Signal Processing and Denoising

How to use it

The simplest way right now is to use pip to clone this repository and install locally inside a conda or a virtualenv environment. This way several functions implemented in Cython will be automatically compiled natively on the host system. Instructions below.

While the repository is called ETCPy, the package namespsace available for use is ETC. All functionality is available through the ETC namespace.

For running tests (strongly recommended), additional packages need to be installed.

Operating System Support

  • GNU/Linux-based distributions (tested on Ubuntu 16.04, 18.04, 20.04)
  • Currently does not work out of the box on Windows. Cython and C/C++ build toolchain need to be setup properly for compilation on Windows to work. It may work with some gymnastics using MinGW + Visual Studio Build Tools, currently untested. Although does work on WSL!

Dependencies

For core functionality:

  • numpy
  • pandas
  • joblib
  • cython
    • Note: Cython needs a working C/C++ compiler such as GCC/Clang and associated build-utils/toolchain. While it should work out of the box on any modern Linux distribution, ensure a proper installation as instructed in the official documentation..

For tests:

  • pytest
  • hypothesis

Installation

Skip the first step if an environment is already available:

  1. Create a fresh conda or pip/virtualenv-based environment with numpy and cython packages. Choose an appropriate name instead of myenv.

    $ conda create -n myenv python numpy pandas joblib cython
  2. Activate environment using conda activate myenv or virtualenv equivalent.

    If git is not installed, then:

    • either install it at a system level directly from the official website or via prefereed package manager
    • or install it within the newly created conda environment using conda install git
  3. Use pip* to install directly from GitHub using the git VCS backend

    $ python -m pip install git+https://github.com/pranaysy/ETCPy.git
  4. Done! Open a Python shell, execute import ETC and proceed to the demo


*mixing pip and conda is not a generally advised but can be used based on certain recommendations

Updating

Use the -U flag with pip for updating to the most current version available from this repository:

$ python -m pip install -U git+https://github.com/pranaysy/ETCPy.git

This will rebuild the compiled Cython functions as well.

Usage

Please check out demo.py to see ETC in action. Functions for dealing with NumPy arrays are also available. In addition to the core functionality of ETC, a brief demo of Compression-Complexity Causality (CCC) is also included for uncoupled as well as coupled first-order auto-regressive processes.

The implementations of ETC as well as CCC include multicore parallelization (using joblib) and can benefit from more available CPU cores for multiple sequences.

Testing

Most of the tests are property-based or behavior-based, and are implemented using the awesome hypothesis framework. Make sure dependencies are satisfied within the working environment:

$ python -m pip install -U pytest hypothesis

Grab a copy of this repository using git and enter the local directory:

$ git clone https://github.com/pranaysy/ETCPy.git
$ cd ETCPy

Run tests:

$ pytest ETC/

MATLAB Implementation

TODO

  • Hyperparameter optimization for CCC
  • Add performance metrics
  • Automated tests with tox
  • Better packaging: pip vs conda
  • Visualizations
  • Improve test coverage
  • Documentation using Sphinx/MkDocs
  • Windows support

License

Copyright 2021 Pranay S. Yadav and Nithin Nagaraj

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

Compute Effort To Compress (ETC) using the NSRPS algorithm in Python

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages