Data Preprocessing PDF
Data Preprocessing PDF
In computer aided systems preprocessing images is a valuable step since it improves the quality of
the original images by removing unrelated parts of the image. Preprocessing facilitates the
visibility of regions to be detected such as border detection. Image pre-processing is the term for
operations on images at the lowest level of abstraction. The aim of pre-processing is an
improvement of the image data that suppresses unwilling distortions or enhances some image
features important for further processing, although geometric transformations of images (e.g.,
rotation, scaling, translation) are classified among pre-processing methods here since similar
techniques are used.
The Fourier Transform is an important image processing tool which is used to decompose an image
into its sine and cosine components. The output of the transformation represents the image in the
Fourier or frequency domain, while the input image is the spatial domain equivalent. In the Fourier
domain image, each point represents a particular frequency contained in the spatial domain image.
The Fourier Transform is used in a wide range of applications, such as image analysis, image
filtering, image reconstruction and image compression. The Discrete Fourier Transform is the
sampled Fourier Transform and therefore does not contain all frequencies forming an image, but
only a set of samples which is large enough to fully describe the spatial domain image. The number
of frequencies corresponds to the number of pixels in the spatial domain image.
i.e., the image in the spatial and Fourier domain are of the same size: For a square image of size
N×N
Audio Pre-processing
Many real-world applications rely on machine learning. One of the important applications of
machine learning is audio processing. Audio processing aims to extract meaningful information
(descriptions or explanations) from audio, such as the type of a sound event, the content of a speech
or the artist of music. Preprocessing audio data includes tasks like resampling audio files to a consistent
sample rate, removing regions of silence, and trimming audio to a consistent duration. Audio is highly
dimensional and contains redundant and often unnecessary information. Historically, mel-frequency
cepstral coefficients and low-level features, such as the zero-crossing rate and spectral shape descriptors,
have been the dominant features derived from audio signals for use in machine learning systems. Machine
learning systems trained on these features are computationally efficient and typically require less training
data.
In development of an automatic sound recognition system, preprocessing is considered the first
phase of other phases in speech recognition to differentiate the voiced or unvoiced signal and
create feature vectors. Preprocessing adjusts or modifies the audio signal, x(n), so that it will be
more acceptable for feature extraction analysis. The major factor to consider when it comes to
audio signal processing is to check the speech, x(n) if is corrupted by some background or
ambient noise, d(n), for example as additive disturbance.
Where s(n) is the clean speech signal. In noise reduction, there are different methods that can be
adopted to perform the task on a noisy speech signal.
Pre-emphasis is done before starting with feature extraction. We do this by boosting only the
signal’s high-frequency components, while leaving the low-frequency components in their original
states. This is done in order to compensate the high-frequency section, which is suppressed naturally
when humans make sounds. A spoken audio signal may have frequency components that fall off
at high frequencies. High frequency components are emphasized and low frequency components
are attenuated. This is quite a standard preprocessing step. By pre-emphasis, we imply the
application of a high pass filter, which is usually a first-order FIR of the form
Normally, a single coefficient filter digital filter known as pre-emphasis filter is used:
Where Vsignal is the voltage of correct signal, Vnoise is the voltage of the noise. Background or
ambient noise is normally produced by sounds of air conditioning system, fans, fluorescent lamps,
type writers, computer systems, back conversation, footsteps, traffic noise, alarms, bird’s noise,
opening and closing of doors. The filter adopted to remove the background or ambient noise is as
follows:
Where, the Es is log energy of block of N samples and ϵ is a small positive constant added to
prevent the computing of log zero. S(n) be the nth speech sample in the block of N samples.
4) Voice Activity Detection /Speech Word Detection
The major issue of getting or locating the endpoints of a signal in an audio is a main problem for
the speech recognizer. Inaccurate endpoint detection will decrease the performance of the speech
recognizer. However, in detecting endpoints of a speech utterance, it seems to be relatively trivial,
and has been found to be very difficult in practice in speech recognition systems. When a proper
SNR is given, the work of developing automatic sound recognition system is made easier.
Framing is the process of breaking the continuous stream of audio samples into components of
constant length to facilitate block-wise processing of the signal. In the same vein, speech can be
thought of been a quasi-stationary signal and is stationary only for a short period of time. This
simply means that the signal is divided or blocked in to frames of typically 20-30 msec. In this
aspect, adjacent frames normally overlap each other with 30-50%, this is done in order not to lose
any vital information of the speech signal due to the windowing.
6) Windowing
At this stage the signal has been framed into segments, each frame is multiplied with a window
function w(n) with length N, where N is the length of the frame. Windowing is the process of
multiplying a waveform of speech signal segment by a time window of given shape, to stress pre-
defined characteristics of the signal. To reduce the discontinuity of Audio signal at the beginning
and end of each frame, the signal should be tapered to zero or close to zero, and hence minimize
the mismatch. Moreover, this can be arrived at by windowing each frame of the signal to increase
the correlation of the Mel Frequency Cepstrum Coefficients (MFCC) and spectral estimates
between consecutive frames.