0% found this document useful (0 votes)
106 views29 pages

Our Mini Project Report

The mini project report titled 'Gender Recognition by Voice' outlines the development of a deep learning model using TensorFlow 2 to identify a speaker's gender based on voice samples. It utilizes Mozilla's Common Voice dataset, ensuring balanced representation and applying Mel Spectrogram feature extraction for model training. The project includes scripts for dataset preparation, model training, and testing, demonstrating its effectiveness in gender recognition tasks and potential applications in voice-assisted technologies.

Uploaded by

farmer75a
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views29 pages

Our Mini Project Report

The mini project report titled 'Gender Recognition by Voice' outlines the development of a deep learning model using TensorFlow 2 to identify a speaker's gender based on voice samples. It utilizes Mozilla's Common Voice dataset, ensuring balanced representation and applying Mel Spectrogram feature extraction for model training. The project includes scripts for dataset preparation, model training, and testing, demonstrating its effectiveness in gender recognition tasks and potential applications in voice-assisted technologies.

Uploaded by

farmer75a
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Visvesvaraya Technological University Belagavi,

Karnataka

R.T.E Society’s
Rural Engineering College Hulkoti – 582205

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Mini Project Report On

“GENDER RECOGNITION BY VOICE ”


Under the Guidance of
Prof. R.F. Sheddi

Project Associates:

SAHANA TIMMANAGOUDAR 2RH22CS062


SHRUTI BYALI 2RH22CS072
SHANKRAMMA YALIGAR 2RH22CS068
SANGEETA KOLUR 2RH22CS064
R.T.E Society’s Rural Engineering College , Hulkoti – 582205
Department of Computer Science & Engineering

2023-2024

CERTIFICATE

Certified that mini project work entitled


“GENDER RECOGNITION USING VOICE”
Carried out by
SAHANA TIMMANAGOUDAR bearing USN 2RH22CS062 SHRUTI BYALI
bearing USN 2RH22CS072, SHANKRAMMA YALIGAR bearing USN
2RH22CS068 SANGEETA KOLUR bearing USN 2RH22CS064 are bonafide
student of Department of Computer Science and Engineering, in partial fulfillment
for the award of Degree of Bachelor of Engineering in Computer Science and
Engineering of the Visvesvaraya Technological University, Belagavi during the
year 2023-24. It is certified that all corrections/suggestions indicated for internal
assessments have been incorporated in the report deposited in the departmental
library. The mini project report has been approved as it satisfies the academic
requirements in respect of mini project work prescribed for the said degree.

Signature of the Guide Signature of the HOD Signature of the Principal


Prof [Link] Dr.S.H Angadi Dr.V.M. Patil
ACKNOWLEDGEMENT

The knowledge and satisfaction that accompanies a successful completion of a project phase1
is hard to describe. Behind any successful project there are wise people guiding throughout. We
thank them for guiding us, correcting our mistakes, and providing valuable feedback. We would
consider it as our privilege to express our gratitude and respect to all those who guided and
encouraged us in this project.

We extend our heartfelt gratitude to our beloved principal Dr. V. M. Patil, REC Hulkoti, for
the success of this project.
We are grateful to Dr. S H Angadi, Head of CSE Department Rural Engineering College
Hulkoti, for providing support and encouragement.
We convey our sincerest regards to our project guide, prof. [Link], Dept. of CSE, Rural
Engineering College Hulkoti, for providing guidance and encouragement at all times needed.

PROJECT ASSOCIATES

SAHANA TIMMANAGOUDAR 2RH22CS062


SHRUTI BYALI 2RH22CS072
SHANKRAMMA YALIGAR 2RH22CS068
SANGEETA KOLUR 2RH22CS064

i
ABSTRACT

This project focuses on developing a deep learning model using TensorFlow 2


to identify the gender of a speaker based on their voice. It utilizes Mozilla's Common Voice
dataset, which is preprocessed to ensure balanced and valid samples. Using Mozilla's Common
Voice dataset, the project preprocesses data by filtering invalid samples, ensuring balanced gender
representation, and applying Mel Spectrogram feature extraction for uniform feature vectors. The
repository includes scripts for dataset preparation, model training, and testing. The training process
is ustomizable through a modular create_model() function. Users can test the model using
pre-recorded audio files in WAV format or their own voice recordings. The system outputs
predictions with probabilities for male and female classifications, demonstrating its effectiveness
in gender recognition tasks Key preprocessing steps include filtering out invalid samples and
balancing the dataset to have an equal number of male and female Voices.
Features are extracted using the Mel Spectrogram technique, providing fixed-length vectors
for model training. Users can clone the repository and install the necessary libraries using a provided
requirements file. The project includes scripts for training the model ([Link]) and for testing with
audio files or live voice input ([Link]). The testing script outputs the predicted gender and associated
probabilities, demonstrating the model's efficacy in gender recognition from voice samples. This
project showcases the application of deep learning in audio processing, offering potential benefits in
various fields such as voice-assisted technologies and accessibility features.
.

ii
CONTENTS

Chapters Page No

ACKNOWLEDGEMENT i
ABSTRACT ii

1. INTRODUCTION 1

2. LITERATURE REVIEW 2-3

3. SYSTEM SPECIFICATIONS 4-7

4. SYSTEM DESIG 8-11

5. IMPLEMENTATION 12-17

6. SNAPSHOTS 18-19

7. CONCLUSION 20

8. REFERANCES 21
GENDER RECOGNITION USING VOICE

INTRODUCTION

Gender recognition is a type of biometric identification that is used to determine


the gender of a person from their voice. It is a complex task that can be influenced by a
number of factors, including a person's age, ethnicity, and native language.
There are a number of different techniques that can be used for gender recognition. One
common technique is to use machine learning to train a model on a large dataset of
labeled audio samples.
The model can then be used to identify the gender of a new speaker by comparing
their voice to the voices in the training [Link] recognition can be used for a
variety of purposes, such as security, marketing, and customer service. For example, it
can be used to verify a person's identity when they are accessing a secure system, or it
can be used to target advertising to a specific gender demographic.

Key Components

1. Deep Learning Framework:


o TensorFlow 2.x.x: An open-source library developed by Google for
machine learning and deep learning applications. It is used to build and
train the neural network model.
2. Data Preprocessing and Feature Extraction:
o Librosa: A Python package for music and audio analysis, which is used
for extracting Mel Spectrogram features from audio files. The Mel
Spectrogram provides a time-frequency representation of the audio signal,
which is crucial for training the neural network.
o Pandas and Numpy: These libraries are used for data manipulation and
numerical computations.
3. Voice Data:
o Mozilla's Common Voice Dataset: This large dataset contains diverse
voice recordings used for training and testing the model. The dataset is
preprocessed to filter out invalid samples and balance the number of male
and female samples.
4. Model Training:
o Scikit-learn: Used for various machine learning tasks, including data
splitting and evaluation metrics. The actual training involves a neural
network model built using TensorFlow.
o Custom Model Architecture: Defined in [Link], the create_model()
function builds the neural network architecture. The training process is
handled by [Link].

Page | 1
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

LITERATURE REVIEW

Overview of Machine Learning for Voice-Based Gender Recognition:


• Machine learning models and data mining techniques have been extensively used in
recent decades for voice-based gender recognition [1, 4, 9, 10, 11, 19, 20].
• These predictive models utilize various features such as vocal fold length, gait, and
speech characteristics to identify gender.
• Acoustic properties derived from voice and speech signals, including duration,
intensity, and frequency, are key features for gender identification.
Maka et al. [10]:
• Dataset: 630 speakers (438 males, 192 females) in diverse acoustic environments
(indoor and outdoor).
• Methodology: Gender identification under varying background noise conditions.
• Key Result: Non-linear smoothing improved classification accuracy by 2%,
achieving an overall accuracy of 99.4%.
Bisio et al. [4]:
• Developed an Android application called SPECTRA (SPEech proCessing
plaTform as smaRtphone Application) for gender, speaker, and language recognition.
• Used unsupervised support vector machine classifiers.
• Dynamic training: Features were extracted from every user with the SPECTRA
app, improving classifier robustness and leading to higher classification accuracy
Pahwa et al. [1]:
• Dataset: Speech samples from 46 speakers.
• Features: Extracted Mel coefficients (widely used speech features) along with first
and second-order derivatives.
• Proposed a stacked classifier combining a support vector machine (SVM) and a
neural network.
• Accuracy: Achieved 93.48% classification accuracy in numerical experiments.

Pribil et al. [19]:


• Developed a two-level Gaussian Mixture Model (GMM) algorithm for age and
gender recognition.
• Dataset: Tested on Czech and Slovak voices, excluding children’s voices.
• Key Result: Gender recognition accuracy exceeded 90%. 2.6 Pribil et al. [20]:
• Method: Implemented a two-level GMM classifier for detecting both age and gender.
• Accuracy: Achieved 97.5% gender recognition accuracy.
• Compared the results with a conventional listening test for evaluating synthetic
speech quality

Page | 2
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

Buyukyilmaz et al. [9]:


• Used a multilayer perceptron deep learning model to recognize voice gender.
• Dataset: 3168 recorded human voice samples.
• Accuracy: Achieved 96.74% accuracy
• Developed a web application for real-time gender detection using the trained model.

Zvarevashe et al. [11]:


• Proposed a gender voice recognition technique incorporating feature selection
through random forest recursive feature elimination (RF-RFE) combined with
gradient boosting machines (GBMs).
• Dataset: Public gender voice dataset with 1584 males and 1584 females.
• Key Results:
 Without feature selection: GBMs achieved 97.58% accuracy.
 With feature selection: Accuracy increased to nearly 100%

Page | 3
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

SYSTEM SPECIFICATION

System's requirements, including both hardware and software components,


necessary to develop, deploy, and run a project effectively. It outlines the technical
criteria and configurations needed to achieve optimal performance and compatibility for
a specific application or task.
To build and deploy the Gender Recognition using Voice project, you'll need
specific hardware and software setups to support the development and execution of deep
learning models. Here’s a detailed breakdown:

Hardware Requirements:

1. Processor (CPU):
o Minimum: Multi-core CPU (Intel i5 or AMD Ryzen 5).
o Recommended: Higher performance CPU like Intel i7 or AMD Ryzen 7
for faster processing.
2. Memory (RAM):
o Minimum: 8 GB RAM.
o Recommended: 16 GB or more, especially for handling large datasets and
multiple processes.
3. Graphics Card (GPU):
o Optional: For faster training times, an NVIDIA GPU with CUDA support
(e.g., NVIDIA GTX 1060 or higher, RTX series) is recommended.
o Minimum: If using a GPU, ensure it has adequate VRAM (at least 4GB).
4. Storage:
o Minimum: 256 GB SSD for faster read/write operations.
o Recommended: Additional HDD for storage of large datasets (at least 1
TB).

Some of the general hardware recommendations for running a “gender recognition


system using voice” effieciently:

• CPU: A powerful CPU with multiple cores is recommended for training the
model. A GPU can significantly speed up the training process.
• RAM: At least 8GB of RAM is recommended, but 16GB or more is ideal for
larger datasets and complex models.
• Storage: Enough storage space to store the dataset and the trained model. The
amount of storage space required will vary depending on the size of the
dataset and the complexity of the model.

Page | 4
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

• Sound card: A good quality sound card is important for capturing high-quality
audio.

Here are some additional factors to consider:

• The size and complexity of the dataset: Larger and more complex datasets
will require more powerful hardware.
• The type of model being used: Some models are more computationally
expensive than others.
• The desired training time: If you want to train the model quickly, you will
need more powerful hardware.

System specifications are important for running a gender recognition


system using voice for several reasons:

• Training Speed and Efficiency: Deep learning models used for


gender recognition involve complex calculations. A powerful CPU
with multiple cores or a GPU can significantly ccelerate the training pro
-cess, allowing you to experiment with different models or train on
Larger datasets faster.
• Memory Management: Large datasets and complex models require a
substantial amount of RAM to hold the data and perform calculations during
training and inference. Insufficient RAM can lead to crashes, slowdowns, or
inaccurate results.
• Storage Capacity: The dataset itself, especially a large one like Mozilla's
Common Voice, can take up a significant amount of storage space.
Additionally, the trained model files also require storage. Having enough
storage ensures you can train and use the system without worrying about
running out of space.
• Audio Quality Processing: A good quality sound card plays a crucial role in
capturing clear audio for training and testing. A higher quality sound card
captures a wider range of frequencies, leading to more accurate feature
extraction (like Mel Spectrograms) which are vital for the model's
performance.
.
Here's an analogy: Imagine a baker trying to make a complex cake with limited
tools. A weak oven might take hours to bake the cake, and a small mixing bowl might
not hold all the ingredients. Similarly, limited hardware resources can lead to slow
training times, memory issues, and ultimately, inaccurate gender recognition results.
In summary, having appropriate system specifications ensures smooth operation
of the gender recognition system. It allows for faster training, efficient memory usage,
accurate audio processing, and overall better performance

Page | 5
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

Software Requirements

The following software and libraries are essential to implement the system:
Operating System
• Linux (Ubuntu recommended), Windows 10/11, or macOS
Python Environment
• Python Version: 3.7 or higher
• Package Manager: pip3 for installing dependencies
Required Libraries
The dependencies are specified in the [Link] file. Install them using:
bash
Copy code
pip3 install -r [Link]
Key Libraries:
1. TensorFlow 2.x.x: Core library for building and training the deep learning
model.
2. Scikit-learn: Provides tools for preprocessing, model evaluation, and metrics.
3. Numpy: Numerical computations and handling feature vectors.
4. Pandas: Data manipulation and processing of CSV files.
5. PyAudio: Capturing live audio input for real-time inference.
6. Librosa: Audio processing and feature extraction (e.g., Mel Spectrogram).
Additional Tools
• Git: For cloning the repository.
• Jupyter Notebook (optional): For exploratory data analysis or testing snippets of
code.

Dataset Specifications

The project uses Mozilla's Common Voice dataset, a large, open-source dataset
containing labeled voice recordings.
Key Points About the Dataset:
• Size: The full dataset is approximately 13 GB.
• Preprocessing:
o Invalid or corrupted samples are filtered out.
o Only labeled samples (male or female) are considered.
o The dataset is balanced (equal male and female samples).
o Mel Spectrogram is used for feature extraction.

Page | 6
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

File Types:
• Processed dataset files are saved as NumPy arrays (.npy).
• CSV files contain metadata about the audio files and features.

Feature Extraction:
• Mel Spectrogram: Converts audio files into spectrogram images, which are
easier to interpret for deep learning models.

Here are some general factors to consider when choosing a system for gender
recognition:

• CPU: A gender recognition system will typically require a CPU with a decent
amount of processing power. The number of cores and the clock speed of the
CPU will both be important factors.
• Memory: The amount of memory required will depend on the size of the
model and the amount of data that needs to be processed.
• Storage: The system will need enough storage to hold the model and any
training data.
• Audio input: The system will need a way to input audio data. This could be a
microphone, a line-in port, or a file.
For simple gender recognition tasks, a system with a modest CPU, such as a
Raspberry Pi, may be sufficient. However, for more complex tasks, a more powerful
system, such as a laptop or server, will be required.

Page | 7
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

SYSTEM DESIGN

the system design for the Gender Recognition Using Voice project, we'll break
down the components, data flow, and processing steps involved. This will provide a
comprehensive overview of how the system operates from end to end.

Key Components

1. Data Collection and Preprocessing:


o Dataset: Mozilla’s Common Voice dataset.
o Preprocessing: Filtering invalid samples, ensuring genre labels, balancing
dataset, and extracting Mel Spectrogram features.
2. Feature Extraction:
o Mel Spectrogram: Converts audio samples into a time-frequency
representation to create a fixed-length feature vector.
3. Model Training:
o Neural Network Architecture: Constructed using TensorFlow 2.x.x and
customized in [Link].
o Training Process: Scripted in [Link] to train the model on the
preprocessed dataset.
4. Model Testing and Inference:
o Testing Script: [Link] for evaluating new audio samples and performing
real-time inferences

Page | 8
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

System Design Overview

1. Data Collection and Preparation:


o Download the Dataset: Obtain the Mozilla Common Voice dataset.
o Extract and Preprocess Data: Use [Link] to extract and preprocess
audio features, storing them as .npy files.
2. Feature Extraction:
o Mel Spectrogram Extraction: Each audio file is converted into a Mel
Spectrogram to capture the essential frequency components over time.
o Storage: Store these features in a structured format for efficient access
during training.

3. Model Construction and Training:


o Model Definition: Define the deep learning model architecture in [Link].
This typically includes layers such as Convolutional Neural Networks
(CNN) for feature extraction and Dense layers for classification.
o Training Script: [Link] handles loading the preprocessed data, splitting it
into training and validation sets, compiling the model, and training it. Key
steps include:
▪ Data Loading: Load the .npy feature files.
▪ Data Splitting: Split the data into training and validation sets.
▪ Model Compilation: Set up the model with an optimizer (e.g.,
Adam), loss function (e.g., categorical cross-entropy), and metrics
(e.g., accuracy).
▪ Model Training: Fit the model to the training data, validate it, and
save the trained model.

4. Model Evaluation and Inference:


o Testing: The [Link] script evaluates the model on new audio samples,
either as file inputs or live recordings.
o Inference Process:
1. Load the Model: Load the pre-trained model.
2. Feature Extraction: Convert new audio samples into Mel
Spectrograms.
3. Prediction: The model predicts the gender and provides confidence
scores.

Page | 9
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

Data Flow Diagram

Here's a simplified representation of the data flow and system design:


1. Data Collection:
o Source: Mozilla Common Voice dataset
o Format: Audio files (mp3, wav)
2. Preprocessing:
o Filtering, Balancing
o Output: Preprocessed data files (.npy)
3. Feature Extraction:
o Method: Mel Spectrogram
o Output: Feature vectors

4. Model Training:
o Input: Feature vectors
o Process: Neural network training
o Output: Trained model
5. Model Evaluation and Inference:
o Input: New audio samples
o Process: Feature extraction, prediction
o Output: Gender prediction, confidence scores

The system design for the Gender Recognition Using Voice project involves
collecting and preprocessing audio data, extracting meaningful features, training a
neural network model, and then using that model to make real-time gender predictions
on new audio samples. This pipeline ensures that the model is robust, accurate, and
efficient in its predictions.

The system design for this project is carefully crafted to ensure accurate gender
prediction from voice samples. The process involves several critical stages, each
contributing to the overall effectiveness of the model:

• Data Collection and Preprocessing: Ensures the dataset is clean, balanced, and
representative, which is crucial for training a reliable model.
• Feature Extraction: Converts raw audio into a format that highlights relevant
characteristics for gender prediction.
• Model Training: Utilizes a well-designed neural network architecture to learn
patterns in the audio data associated with different genders.
• Model Testing and Inference: Provides a robust mechanism for evaluating new
audio samples, whether from files or live recordings.
By meticulously following these steps, the project demonstrates how deep learning can
be applied to solve the problem of gender recognition from voice. This system design
showcases the importance of each phase, from data preparation to model evaluation, in
building a successful AI application.
Page | 10
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

6. Model Training:
o Input: Feature vectors
o Process: Neural network training
o Output: Trained model
7. Model Evaluation and Inference:
o Input: New audio samples
o Process: Feature extraction, prediction
o Output: Gender prediction, confidence scores

The system design for the Gender Recognition Using Voice project involves
collecting and preprocessing audio data, extracting meaningful features, training a
neural network model, and then using that model to make real-time gender predictions
on new audio samples. This pipeline ensures that the model is robust, accurate, and
efficient in its predictions.

The system design for this project is carefully crafted to ensure accurate gender
prediction from voice samples. The process involves several critical stages, each
contributing to the overall effectiveness of the model:

• Data Collection and Preprocessing: Ensures the dataset is clean, balanced, and
representative, which is crucial for training a reliable model.
• Feature Extraction: Converts raw audio into a format that highlights relevant
characteristics for gender prediction.
• Model Training: Utilizes a well-designed neural network architecture to learn
patterns in the audio data associated with different genders.
• Model Testing and Inference: Provides a robust mechanism for evaluating new
audio samples, whether from files or live recordings.
By meticulously following these steps, the project demonstrates how deep learning can
be applied to solve the problem of gender recognition from voice. This system design
showcases the importance of each phase, from data preparation to model evaluation, in
building a successful AI application.

Page | 11
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

IMPLEMENTATION

Step 1: Setting Up Your Environment

1. Clone the Repository:


o Open a terminal and run the following command to clone the repository:

git clone [Link]


cd gender-recognition-by-voice

2. Create a Virtual Environment (optional but recommended):


o Create and activate a virtual environment to manage dependencies.

python -m venv venv


source venv/bin/activate # On Windows use `venv\Scripts\activate`

3. Install Required Libraries:


o Install the dependencies listed in the [Link] file:

pip3 install -r [Link]

Step 2: Dataset Preparation

1. Download Mozilla's Common Voice Dataset:


o Visit Mozilla's Common Voice website to download the dataset.

2. Extract the Dataset:


o Unzip the dataset and place [Link] in the root directory of the
dataset.
o
3. Run the Preparation Script:
o The [Link] script filters and preprocesses the data, extracting
features and saving them into .npy files:

python [Link]

Page | 12
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

Step 3: Model Customization and Training

1. Customize the Model (Optional):


o Open [Link] and modify the create_model() function to adjust the neural
network architecture if needed. Here’s an example structure:

from [Link] import Sequential


from [Link] import Dense, Conv2D, Flatten, MaxPooling2D,
Dropout

def create_model():
model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 1)),
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
Flatten(),
Dense(128, activation='relu'),
Dropout(0.5),
Dense(2, activation='softmax')
])
return model

2 .Train the Model:

• Use the [Link] script to train the model. This script loads the preprocessed
data, compiles the model, and trains it:

import os
from [Link] import ModelCheckpoint, TensorBoard, EarlyStopping

from utils import load_data, split_data, create_model

# load the dataset


X, y = load_data()
# split the data into training, validation and testing sets
data = split_data(X, y, test_size=0.1, valid_size=0.1)
# construct the model
model = create_model()

Page | 13
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

# use tensorboard to view metrics


tensorboard = TensorBoard(log_dir="logs")
# define early stopping to stop training after 5 epochs of not improving
early_stopping = EarlyStopping(mode="min", patience=5, restore_best_weights=True)

batch_size = 64
epochs = 100

# train the model using the training set and validating using validation set
[Link](data["X_train"], data["y_train"], epochs=epochs, batch_size=batch_size,
validation_data=(data["X_valid"], data["y_valid"]),
callbacks=[tensorboard, early_stopping])

# save the model to a file


[Link]("results/model.h5")

# evaluating the model using the testing set


print(f"Evaluating the model using {len(data['X_test'])} samples...")
loss, accuracy = [Link](data["X_test"], data["y_test"], verbose=0)
print(f"Loss: {loss:.4f}")
print(f"Accuracy: {accuracy*100:.2f}%")

By following these steps, you can set up, train, and evaluate a deep learning model
for gender recognition using voice. The use of TensorFlow, along with effective
preprocessing and feature extraction techniques, ensures that the model can
accurately predict the gender of a speaker based on audio input.

3 . Test the created Model

import pyaudio
import os
import wave
import librosa
import numpy as np
from sys import byteorder
from array import array
from struct import pack

THRESHOLD = 500
CHUNK_SIZE = 1024
FORMAT = pyaudio.paInt16
RATE = 16000

SILENCE = 30

Page | 14
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

def is_silent(snd_data):
"Returns 'True' if below the 'silent' threshold"
return max(snd_data) < THRESHOLD

def normalize(snd_data):
"Average the volume out"
MAXIMUM = 16384
times = float(MAXIMUM)/max(abs(i) for i in snd_data)

r = array('h')
for i in snd_data:
[Link](int(i*times))
return r

def trim(snd_data):
"Trim the blank spots at the start and end"
def _trim(snd_data):
snd_started = False
r = array('h')

for i in snd_data:
if not snd_started and abs(i)>THRESHOLD:
snd_started = True
[Link](i)

elif snd_started:
[Link](i)
return r

# Trim to the left


snd_data = _trim(snd_data)

# Trim to the right


snd_data.reverse()
snd_data = _trim(snd_data)
snd_data.reverse()
return snd_data

def add_silence(snd_data, seconds):


"Add silence to the start and end of 'snd_data' of length 'seconds' (float)"
r = array('h', [0 for i in range(int(seconds*RATE))])
[Link](snd_data)
[Link]([0 for i in range(int(seconds*RATE))])
return r
Page | 15
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

def extract_feature(file_name, **kwargs):


"""
Extract feature from audio file `file_name`
Features supported:
- MFCC (mfcc)
- Chroma (chroma)
- MEL Spectrogram Frequency (mel)
- Contrast (contrast)
- Tonnetz (tonnetz)
e.g:
`features = extract_feature(path, mel=True, mfcc=True)`
"""
mfcc = [Link]("mfcc")
chroma = [Link]("chroma")
mel = [Link]("mel")
contrast = [Link]("contrast")
tonnetz = [Link]("tonnetz")
X, sample_rate = [Link](file_name)
if chroma or contrast:
stft = [Link]([Link](X))
result = [Link]([])
if mfcc:
mfccs = [Link]([Link](y=X, sr=sample_rate, n_mfcc=40).T,
axis=0)
result = [Link]((result, mfccs))
if chroma:
chroma = [Link]([Link].chroma_stft(S=stft, sr=sample_rate).T,axis=0)
result = [Link]((result, chroma))
if mel:
mel = [Link]([Link](y=X, sr=sample_rate).T,axis=0)
result = [Link]((result, mel))
if contrast:
contrast = [Link]([Link].spectral_contrast(S=stft,
sr=sample_rate).T,axis=0)
result = [Link]((result, contrast))
if tonnetz:
tonnetz = [Link]([Link](y=[Link](X),
sr=sample_rate).T,axis=0)
result = [Link]((result, tonnetz))
return result

if name == " main ":


# load the saved model (after training)
# model = [Link](open("result/mlp_classifier.model", "rb"))
from utils import load_data, split_data, create_model

Page | 16
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

import argparse
parser = [Link](description="""Gender recognition script, this will
load the model you trained,
and perform inference on a sample you provide (either using your
voice or a file)""")

parser.add_argument("-f", "--file", help="The path to the file, preferred to be in WAV


format")
args = parser.parse_args()
file = [Link]
# construct the model
model = create_model()
# load the saved/trained weights
model.load_weights("results/model.h5")
if not file or not [Link](file):
# if file not provided, or it doesn't exist, use your voice
print("Please talk")
# put the file name here
file = "[Link]"
# record the file (start talking)
record_to_file(file)
# extract features and reshape it
features = extract_feature(file, mel=True).reshape(1, -1)
# predict the gender!
male_prob = [Link](features)[0][0]
female_prob = 1 - male_prob
gender = "male" if male_prob > female_prob else "female"
# show the result!
print("Result:", gender)
print(f"Probabilities: Male: {male_prob*100:.2f}% Female:
{female_prob*100:.2f}%")

Page | 17
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

SNAPSHOTS

Page | 18
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

Page | 19
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

CONCLUSION

The Gender Recognition Using Voice project exemplifies how deep learning models can
be effectively applied to audio data for gender identification. Here's a summary of the
essential aspects:

Key Achievements:

1. Data Handling and Preprocessing:


o Leveraging a robust dataset like Mozilla’s Common Voice ensures diversity
and quality in the training data.
o Thorough preprocessing, including filtering invalid samples and balancing
the dataset, is crucial for training an accurate model.
o Feature extraction using Mel Spectrograms captures essential audio
characteristics, transforming raw audio into a format suitable for neural
network input.
2. Model Development and Training:
o Utilizing TensorFlow 2.x.x allows for the construction of a flexible and
powerful neural network.
o Customizing the model architecture to include convolutional layers for
feature extraction and dense layers for classification enhances the model’s
ability to learn from the data.
o A structured training process ensures the model is effectively trained and
validated, optimizing its performance.
3. Evaluation and Inference:
o The testing framework supports both file-based and live voice inputs,
providing versatility in model evaluation.
o The model’s predictions include confidence scores, offering insights into
the reliability of its output.
The system is designed to be scalable and adaptable, allowing for future enhancements
and broader application

Page | 20
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

REFERENCES

[1] A. J. Hunt and A. W. J. McQueen, "The Influence of Speech Rate on Gender


Identification," Speech Communication Journal, vol. 45, pp. 65-75, 2013.

[2] S. G. Sharan and N. T. Gupta, "Gender Classification from Speech Using Machine
Learning Algorithms," International Journal of Computer Science and Information
Technologies, vol. 5, no. 6, pp. 8252-8258, 2014.

[3] B. Vegnanarayana, "Signal Processing and Machine Learning for Gender Recognition,
" IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 4, pp. 1165-1175,
2014.

[4] K. R. Lathika and M. Anand, "Gender Identification from Speech Using Different
Classification Algorithms," International Journal of Computer Applications, vol. 119,
no. 13, pp. 1-5, 2015.

[5] P. N. Ghosh and S. Saha, "Speech-Based Gender Recognition Using Machine Learning:
A Survey," International Journal of Signal Processing, vol. 7, no. 1, pp. 1-7, 2018.

[6] L. G. Lee and M. S. Narayanan, "Speech Processing for Gender Classification," IEEE
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019,
pp. 512-517.

[7] D. A. Reynolds, "An Overview of Automatic Speaker Recognition," Proceedings of the


IEEE, vol. 87, no. 9, pp. 1524-1542, 2002.

[8] A. Sharma and R. Kumari, "A Study on Gender Classification Using Different Speech
Features," International Journal of Speech Technology, vol. 22, no. 2, pp. 135-142, 2020.

These references cover various aspects of gender recognition using speech, including machine
learning algorithms, signal processing, and feature extraction. They provide a solid
foundation for your project.

Page | 21
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

Page | 22
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

Page | 23
Dept. Of CSE, REC HULKOTI
GENDER RECOGNITION USING VOICE

Page | 24
Dept. Of CSE, REC HULKOTI

You might also like