Skip to content

Latest commit

 

History

History
 
 

preprocessing

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Preprocessing

The preprocessing module implements common data preprocessing routines.

  • nlp.py: Routines and objects for handling text data.

    • n-gram generators
    • Word and character tokenization
    • Punctuation and stop-word removal
    • Vocabulary / unigram count objects
    • Huffman tree encoding / decoding
    • Term frequency-inverse document frequency (tf-idf) encoding
  • dsp.py: Routines for handling audio and image data.

    • Signal windowing
    • Signal autocorrelation
    • Discrete Fourier transform
    • Signal resampling via (bi-)linear interpolation and nearest neighbor
  • general.py: General data preprocessing objects and functions.

    • Feature hashing (Moody, 1989)
    • Mini-batch generators
    • One-hot encoding / decoding
    • Feature standardization