Our Mini Project Report
Our Mini Project Report
Karnataka
R.T.E Society’s
Rural Engineering College Hulkoti – 582205
Project Associates:
2023-2024
CERTIFICATE
The knowledge and satisfaction that accompanies a successful completion of a project phase1
is hard to describe. Behind any successful project there are wise people guiding throughout. We
thank them for guiding us, correcting our mistakes, and providing valuable feedback. We would
consider it as our privilege to express our gratitude and respect to all those who guided and
encouraged us in this project.
We extend our heartfelt gratitude to our beloved principal Dr. V. M. Patil, REC Hulkoti, for
the success of this project.
We are grateful to Dr. S H Angadi, Head of CSE Department Rural Engineering College
Hulkoti, for providing support and encouragement.
We convey our sincerest regards to our project guide, prof. [Link], Dept. of CSE, Rural
Engineering College Hulkoti, for providing guidance and encouragement at all times needed.
PROJECT ASSOCIATES
i
ABSTRACT
ii
CONTENTS
Chapters Page No
ACKNOWLEDGEMENT i
ABSTRACT ii
1. INTRODUCTION 1
5. IMPLEMENTATION 12-17
6. SNAPSHOTS 18-19
7. CONCLUSION 20
8. REFERANCES 21
GENDER RECOGNITION USING VOICE
INTRODUCTION
Key Components
Page | 1
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
LITERATURE REVIEW
Page | 2
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
Page | 3
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
SYSTEM SPECIFICATION
Hardware Requirements:
1. Processor (CPU):
o Minimum: Multi-core CPU (Intel i5 or AMD Ryzen 5).
o Recommended: Higher performance CPU like Intel i7 or AMD Ryzen 7
for faster processing.
2. Memory (RAM):
o Minimum: 8 GB RAM.
o Recommended: 16 GB or more, especially for handling large datasets and
multiple processes.
3. Graphics Card (GPU):
o Optional: For faster training times, an NVIDIA GPU with CUDA support
(e.g., NVIDIA GTX 1060 or higher, RTX series) is recommended.
o Minimum: If using a GPU, ensure it has adequate VRAM (at least 4GB).
4. Storage:
o Minimum: 256 GB SSD for faster read/write operations.
o Recommended: Additional HDD for storage of large datasets (at least 1
TB).
• CPU: A powerful CPU with multiple cores is recommended for training the
model. A GPU can significantly speed up the training process.
• RAM: At least 8GB of RAM is recommended, but 16GB or more is ideal for
larger datasets and complex models.
• Storage: Enough storage space to store the dataset and the trained model. The
amount of storage space required will vary depending on the size of the
dataset and the complexity of the model.
Page | 4
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
• Sound card: A good quality sound card is important for capturing high-quality
audio.
• The size and complexity of the dataset: Larger and more complex datasets
will require more powerful hardware.
• The type of model being used: Some models are more computationally
expensive than others.
• The desired training time: If you want to train the model quickly, you will
need more powerful hardware.
Page | 5
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
Software Requirements
The following software and libraries are essential to implement the system:
Operating System
• Linux (Ubuntu recommended), Windows 10/11, or macOS
Python Environment
• Python Version: 3.7 or higher
• Package Manager: pip3 for installing dependencies
Required Libraries
The dependencies are specified in the [Link] file. Install them using:
bash
Copy code
pip3 install -r [Link]
Key Libraries:
1. TensorFlow 2.x.x: Core library for building and training the deep learning
model.
2. Scikit-learn: Provides tools for preprocessing, model evaluation, and metrics.
3. Numpy: Numerical computations and handling feature vectors.
4. Pandas: Data manipulation and processing of CSV files.
5. PyAudio: Capturing live audio input for real-time inference.
6. Librosa: Audio processing and feature extraction (e.g., Mel Spectrogram).
Additional Tools
• Git: For cloning the repository.
• Jupyter Notebook (optional): For exploratory data analysis or testing snippets of
code.
Dataset Specifications
The project uses Mozilla's Common Voice dataset, a large, open-source dataset
containing labeled voice recordings.
Key Points About the Dataset:
• Size: The full dataset is approximately 13 GB.
• Preprocessing:
o Invalid or corrupted samples are filtered out.
o Only labeled samples (male or female) are considered.
o The dataset is balanced (equal male and female samples).
o Mel Spectrogram is used for feature extraction.
Page | 6
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
File Types:
• Processed dataset files are saved as NumPy arrays (.npy).
• CSV files contain metadata about the audio files and features.
Feature Extraction:
• Mel Spectrogram: Converts audio files into spectrogram images, which are
easier to interpret for deep learning models.
Here are some general factors to consider when choosing a system for gender
recognition:
• CPU: A gender recognition system will typically require a CPU with a decent
amount of processing power. The number of cores and the clock speed of the
CPU will both be important factors.
• Memory: The amount of memory required will depend on the size of the
model and the amount of data that needs to be processed.
• Storage: The system will need enough storage to hold the model and any
training data.
• Audio input: The system will need a way to input audio data. This could be a
microphone, a line-in port, or a file.
For simple gender recognition tasks, a system with a modest CPU, such as a
Raspberry Pi, may be sufficient. However, for more complex tasks, a more powerful
system, such as a laptop or server, will be required.
Page | 7
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
SYSTEM DESIGN
the system design for the Gender Recognition Using Voice project, we'll break
down the components, data flow, and processing steps involved. This will provide a
comprehensive overview of how the system operates from end to end.
Key Components
Page | 8
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
Page | 9
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
4. Model Training:
o Input: Feature vectors
o Process: Neural network training
o Output: Trained model
5. Model Evaluation and Inference:
o Input: New audio samples
o Process: Feature extraction, prediction
o Output: Gender prediction, confidence scores
The system design for the Gender Recognition Using Voice project involves
collecting and preprocessing audio data, extracting meaningful features, training a
neural network model, and then using that model to make real-time gender predictions
on new audio samples. This pipeline ensures that the model is robust, accurate, and
efficient in its predictions.
The system design for this project is carefully crafted to ensure accurate gender
prediction from voice samples. The process involves several critical stages, each
contributing to the overall effectiveness of the model:
• Data Collection and Preprocessing: Ensures the dataset is clean, balanced, and
representative, which is crucial for training a reliable model.
• Feature Extraction: Converts raw audio into a format that highlights relevant
characteristics for gender prediction.
• Model Training: Utilizes a well-designed neural network architecture to learn
patterns in the audio data associated with different genders.
• Model Testing and Inference: Provides a robust mechanism for evaluating new
audio samples, whether from files or live recordings.
By meticulously following these steps, the project demonstrates how deep learning can
be applied to solve the problem of gender recognition from voice. This system design
showcases the importance of each phase, from data preparation to model evaluation, in
building a successful AI application.
Page | 10
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
6. Model Training:
o Input: Feature vectors
o Process: Neural network training
o Output: Trained model
7. Model Evaluation and Inference:
o Input: New audio samples
o Process: Feature extraction, prediction
o Output: Gender prediction, confidence scores
The system design for the Gender Recognition Using Voice project involves
collecting and preprocessing audio data, extracting meaningful features, training a
neural network model, and then using that model to make real-time gender predictions
on new audio samples. This pipeline ensures that the model is robust, accurate, and
efficient in its predictions.
The system design for this project is carefully crafted to ensure accurate gender
prediction from voice samples. The process involves several critical stages, each
contributing to the overall effectiveness of the model:
• Data Collection and Preprocessing: Ensures the dataset is clean, balanced, and
representative, which is crucial for training a reliable model.
• Feature Extraction: Converts raw audio into a format that highlights relevant
characteristics for gender prediction.
• Model Training: Utilizes a well-designed neural network architecture to learn
patterns in the audio data associated with different genders.
• Model Testing and Inference: Provides a robust mechanism for evaluating new
audio samples, whether from files or live recordings.
By meticulously following these steps, the project demonstrates how deep learning can
be applied to solve the problem of gender recognition from voice. This system design
showcases the importance of each phase, from data preparation to model evaluation, in
building a successful AI application.
Page | 11
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
IMPLEMENTATION
python [Link]
Page | 12
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
def create_model():
model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 1)),
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
Flatten(),
Dense(128, activation='relu'),
Dropout(0.5),
Dense(2, activation='softmax')
])
return model
• Use the [Link] script to train the model. This script loads the preprocessed
data, compiles the model, and trains it:
import os
from [Link] import ModelCheckpoint, TensorBoard, EarlyStopping
Page | 13
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
batch_size = 64
epochs = 100
# train the model using the training set and validating using validation set
[Link](data["X_train"], data["y_train"], epochs=epochs, batch_size=batch_size,
validation_data=(data["X_valid"], data["y_valid"]),
callbacks=[tensorboard, early_stopping])
By following these steps, you can set up, train, and evaluate a deep learning model
for gender recognition using voice. The use of TensorFlow, along with effective
preprocessing and feature extraction techniques, ensures that the model can
accurately predict the gender of a speaker based on audio input.
import pyaudio
import os
import wave
import librosa
import numpy as np
from sys import byteorder
from array import array
from struct import pack
THRESHOLD = 500
CHUNK_SIZE = 1024
FORMAT = pyaudio.paInt16
RATE = 16000
SILENCE = 30
Page | 14
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
def is_silent(snd_data):
"Returns 'True' if below the 'silent' threshold"
return max(snd_data) < THRESHOLD
def normalize(snd_data):
"Average the volume out"
MAXIMUM = 16384
times = float(MAXIMUM)/max(abs(i) for i in snd_data)
r = array('h')
for i in snd_data:
[Link](int(i*times))
return r
def trim(snd_data):
"Trim the blank spots at the start and end"
def _trim(snd_data):
snd_started = False
r = array('h')
for i in snd_data:
if not snd_started and abs(i)>THRESHOLD:
snd_started = True
[Link](i)
elif snd_started:
[Link](i)
return r
Page | 16
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
import argparse
parser = [Link](description="""Gender recognition script, this will
load the model you trained,
and perform inference on a sample you provide (either using your
voice or a file)""")
Page | 17
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
SNAPSHOTS
Page | 18
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
Page | 19
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
CONCLUSION
The Gender Recognition Using Voice project exemplifies how deep learning models can
be effectively applied to audio data for gender identification. Here's a summary of the
essential aspects:
Key Achievements:
Page | 20
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
REFERENCES
[2] S. G. Sharan and N. T. Gupta, "Gender Classification from Speech Using Machine
Learning Algorithms," International Journal of Computer Science and Information
Technologies, vol. 5, no. 6, pp. 8252-8258, 2014.
[3] B. Vegnanarayana, "Signal Processing and Machine Learning for Gender Recognition,
" IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 4, pp. 1165-1175,
2014.
[4] K. R. Lathika and M. Anand, "Gender Identification from Speech Using Different
Classification Algorithms," International Journal of Computer Applications, vol. 119,
no. 13, pp. 1-5, 2015.
[5] P. N. Ghosh and S. Saha, "Speech-Based Gender Recognition Using Machine Learning:
A Survey," International Journal of Signal Processing, vol. 7, no. 1, pp. 1-7, 2018.
[6] L. G. Lee and M. S. Narayanan, "Speech Processing for Gender Classification," IEEE
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019,
pp. 512-517.
[8] A. Sharma and R. Kumari, "A Study on Gender Classification Using Different Speech
Features," International Journal of Speech Technology, vol. 22, no. 2, pp. 135-142, 2020.
These references cover various aspects of gender recognition using speech, including machine
learning algorithms, signal processing, and feature extraction. They provide a solid
foundation for your project.
Page | 21
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
Page | 22
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
Page | 23
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE
Page | 24
Dept. Of CSE, REC HULKOTI