Grin logo
de en es fr
Shop
GRIN Website
Texte veröffentlichen, Rundum-Service genießen
Zur Shop-Startseite › Informatik - Sonstiges

Who is Speaking? Male or Female

Titel: Who is Speaking? Male or Female

Masterarbeit , 2013 , 79 Seiten , Note: A+

Autor:in: Hassam Sheikh (Autor:in)

Informatik - Sonstiges
Leseprobe & Details   Blick ins Buch
Zusammenfassung Leseprobe Details

The aim of this project was to create a gender identification system that can be
used to identify the gender of the speaker. In this dissertation I have explained the
signal processing background such as Fourier transforms and DCT etc. that was
needed to understand the underlying signal processing happening in digital devices.
Apart from that I also investigated the different classification techniques such as
Adaboost and Gaussian Mixture Models and different types of methods such as
Fusion method, acoustic methods and pitch methods used in gender identification.
From this perspective I have implemented 3 types of models (4 Models) that
are explained in the literature and introducing a new method for gender recognition
that uses SDC feature with pitch to identify the gender. All models were tested
and trained on the same amount of speech. The SDC and SDC fused model gave
satisfactory results on Voxforge dataset. Finally I tested the acoustic and fused
models on YouTube video which gave almost 90% accuracy. The results of my
implementations are shown in chapter 6.

Leseprobe


Table of Contents

1 Introduction

2 Background

2.1 Speech

2.1.1 Speech Signal

2.2 Speech Signal Processing

2.2.1 Fourier Transform

2.2.2 Discrete Cosine Transform

2.2.3 Digital Filters

2.2.4 Nyquist Shannon Sampling Theorem

2.2.5 Window Functions

3 Speech Enhancement

3.1 Signal to Noise Ratio

3.2 Spectral Subtraction

3.3 Cepstral Mean Normalization

3.4 RASTA Filtering

3.5 Voice Activity Detector

3.5.1 The Empirical Mode Decomposition Method

3.5.2 The Hilbert Spectrum Analysis

3.5.3 Voice Activity Detection

4 Gender Identification Systems

4.1 Acoustic Features

4.1.1 Mel Frequency Cepstral Coefficients (MFCC)

4.1.2 Shifted Delta Cepstral (SDC)

4.1.3 Pitch Extraction Method

4.2 Pitch Based Models

4.3 Models based on Acoustic Features

4.4 Fused Models

5 Learning Techniques for Gender Identification

5.1 Overview

5.2 Adaboost

5.3 Gaussian Mixture Model (GMM)

5.3.1 GMM Training

5.3.2 GMM Testing

5.4 Decision Making

5.5 Likelihood Ratio

5.6 Universal Background Model

5.6.1 UBM Training

6 System Design and Implementation

6.1 Toolboxes

6.1.1 Signal Processing Toolbox

6.1.2 Machine Learning Toolbox

6.2 System Design

6.2.1 Requirement

6.2.2 Initial Approach

6.2.3 Algorithm

6.2.4 Feature Selection

6.3 Experiments and Results

6.3.1 Pitch Based Models

6.3.2 Models Based on Acoustic Features

6.3.3 Fused Model

6.3.4 YouTube Videos

7 Conclusion

7.1 Summary

7.2 Future Recommendation

Project Goals and Research Topics

This project aims to develop an automatic gender identification system using speech data, addressing the challenges posed by real-world acoustic conditions such as background noise and silence. The research explores and implements various classification techniques and feature extraction methods, specifically comparing pitch-based approaches, acoustic feature models (MFCC and SDC), and hybrid fusion models to achieve robust performance.

  • Automatic Gender Identification (AGI) methodology
  • Speech enhancement techniques for noise reduction
  • Acoustic feature extraction using MFCC and SDC
  • Comparison of Gaussian Mixture Models (GMM) and Support Vector Machines (SVM)
  • Performance evaluation through cross-validation and real-world simulation

Excerpt from the Book

6.3.1 Pitch Based Models

Pitch based models are those models which use only pitch as discriminating factor to identify the gender of the speaker. For training the model, the data was prepared by applying the pre-processing explained earlier. After that pitch for every frame was estimated using the harmonic to sub harmonic method and the average of pitch was used as a key. Using the training data I trained non-linear SVM using the RBF kernel to identify the gender of the speech. For testing, the pitch was estimated using the same method and the mean of the pitch was passed to the model as an input for classification.

I performed different types of experiments to evaluate the performance of the pitch based models. The primary motivation behind these experiments was to examine the factor of speaker variability in pitch based gender identification model. The secondary motivation behind performing these experiments was to determine the training settings at which pitch based models perform highest and to understand the behaviour of pitch based models when trained with speeches from different languages in different conditions. The length of each speech file is between 1.5 second and 4 seconds.

Summary of Chapters

1 Introduction: Discusses the significance of human-machine voice interaction and the motivation for developing automatic gender identification systems.

2 Background: Provides the theoretical foundation regarding speech signals, digital signal processing, sampling theorems, and window functions.

3 Speech Enhancement: Details pre-processing techniques like signal-to-noise ratio estimation, spectral subtraction, and voice activity detection to ensure robust feature extraction.

4 Gender Identification Systems: Explores acoustic features like MFCC and SDC, and reviews various model architectures including pitch-based and fused systems.

5 Learning Techniques for Gender Identification: Examines classification algorithms such as Support Vector Machines, Adaboost, and Gaussian Mixture Models used in gender classification.

6 System Design and Implementation: Outlines the development process, including toolboxes, datasets, feature selection, and the experimental results of different models.

7 Conclusion: Summarizes the project findings, highlighting the success of the SDC fused model, and suggests future improvements for the system.

Keywords

Gender Identification, Speech Processing, MFCC, SDC, Pitch Extraction, Gaussian Mixture Model, SVM, Adaboost, Spectral Subtraction, Voice Activity Detection, Signal Processing, Feature Extraction, Robustness, MATLAB, Digital Signal Analysis

Frequently Asked Questions

What is the primary focus of this research?

The research focuses on building an automatic gender identification system that can accurately determine a speaker's gender using speech features in real-world, noisy environments.

What are the core technical fields involved?

The work primarily integrates Digital Signal Processing (DSP) and Machine Learning techniques to analyze human speech.

What is the core objective or research question?

The goal is to create a robust gender identification system that maintains high accuracy despite acoustic disturbances like background noise, dialects, and different languages.

Which scientific methods were employed?

The author employed GMM-based acoustic modeling, SDC feature extraction, pitch-based discrimination, and SVM classification, evaluating them individually and in combination (fused models).

What does the main body cover?

It covers the theoretical background, specific speech enhancement methods, system architecture, training algorithms, and comprehensive performance testing against various models and datasets.

How would you characterize the work using keywords?

Key terms include Gender Identification, SDC (Shifted Delta Cepstral), MFCC, GMM, SVM, and robust speech enhancement.

Why are SDC features considered more effective than simple MFCC?

SDC features are long-term features that capture more contextual information from the speech signal, making them significantly more robust than short-term MFCC features in noisy conditions.

What was the result of testing the fused models on YouTube videos?

The fused models demonstrated exceptional performance, achieving up to 96% accuracy, validating the effectiveness of combining acoustic features with pitch information for real-world scenarios.

How was the training dataset constructed?

The dataset was created by collecting speech samples from 12 different languages (e.g., English, Urdu, Arabic, Spanish) from diverse sources, ensuring language and dialect independence.

Ende der Leseprobe aus 79 Seiten  - nach oben

Details

Titel
Who is Speaking? Male or Female
Hochschule
University of Manchester
Note
A+
Autor
Hassam Sheikh (Autor:in)
Erscheinungsjahr
2013
Seiten
79
Katalognummer
V265700
ISBN (eBook)
9783656554363
ISBN (Buch)
9783656554493
Sprache
Englisch
Schlagworte
speaking male female
Produktsicherheit
GRIN Publishing GmbH
Arbeit zitieren
Hassam Sheikh (Autor:in), 2013, Who is Speaking? Male or Female, München, GRIN Verlag, https://www.grin.com/document/265700
Blick ins Buch
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
  • Wenn Sie diese Meldung sehen, konnt das Bild nicht geladen und dargestellt werden.
Leseprobe aus  79  Seiten
Grin logo
  • Grin.com
  • Versand
  • Kontakt
  • Datenschutz
  • AGB
  • Impressum