Voice recognition is a computer software program or hardware device with the ability to decode the human voice. Voice recognition is a system that allows for a secure method of authenticating speakers, the system work in such a way that it general speaker model during the enrollment phase which based on the speaker characteristics. The system testing phase typically involves making a claim on the identity of an unknown speaker using the given speech characteristics and the trained models.
However, speaker identification is known to be one among the two categories of speaker recognition system because speaker recognition can be categorized also as speaker verification whereas, the main difference between both speaker identification and speaker verification ensure to known if the person speaking and claim to be is fully verified while speaker identification make multiple decision by comparing of the person speaking with the one trained or store in database as an attempt to identify the speaker. The interest of the assignment is speaker identification; therefore, speaker identification is the main focus for this study.
Inhaltsverzeichnis (Table of Contents)
- Introduction
- Theoretical Concepts
- Speaker Recognition
- Classification of Automatic Speaker Recognition
- Speech Feature Extraction
- Objectives
- Design implementation
- Vocal Activity Detection (VAD)
- Speaker Identification
- Frame Blocking
- Widowing
- Mel-frequency Wrapping
- Cepstrum and Feature Extraction
- Distance Calculation
- GUI
- Design innovativeness
- Simulation results
- Train/Enrollment Result
- Recognition
- GUI Result
- Euclidean distance between voices
- Discussion
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This assignment aims to develop a speaker identification system that can accurately identify speakers from a group of people in a recorded audio track. The system utilizes voice activity detection (VAD) to improve speech intelligibility and recognition. Both speaker identification and VAD employ the Mel Frequency Cepstrum Coefficient (MFCC) for voice feature extraction. The main objective is to create a reliable system that allows for speaker identification based on their voice.
- Speaker identification using voice analysis
- Voice activity detection for speech enhancement
- Mel Frequency Cepstrum Coefficient (MFCC) for feature extraction
- Text-independent speaker recognition
- Comparison of speaker characteristics for identification
Zusammenfassung der Kapitel (Chapter Summaries)
The introduction establishes the significance of voice recognition as a social behavior and a key element in speaker identification systems. It outlines the challenges of identifying individual voices within a group recording and introduces the concept of VAD as a solution for improving speech intelligibility. The chapter further explains how MFCCs are used to extract speech features and quantize them for speaker recognition.
The "Theoretical Concepts" chapter delves into the fundamentals of voice recognition, categorizing it into speaker verification and speaker identification. It also introduces the distinction between text-dependent and text-independent voice recognition, explaining the rationale for focusing on text-independent recognition in this assignment.
The "Design Implementation" chapter describes the VAD algorithm and its role in enhancing speech recognition. It then explores the process of speaker identification, including stages like frame blocking, widowing, Mel-frequency wrapping, Cepstrum and feature extraction, distance calculation, and GUI design.
Schlüsselwörter (Keywords)
The key terms and concepts central to this work include speaker identification, voice activity detection (VAD), Mel Frequency Cepstrum Coefficient (MFCC), speech feature extraction, text-independent speaker recognition, and GUI design. These concepts represent the primary focus areas and research themes explored within the assignment.
- Quote paper
- Bandar Hezam (Author), 2019, Speaker Recognition, Munich, GRIN Verlag, https://www.grin.com/document/1420967