In this study a model-based speech synthesis prototype for Tigrinya spoken language idiom is developed in an integrated speech synthesis framework (Festival speech synthesis system). While the frontend of the framework is Graphemebased synthesizer, the backend is CLUSTERGEN Synthesizer which is an instance of statistical parametric speech synthesis. The under resourced linguistic nature of the language was the main reason to choose this framework. 249 Tigrinya graphemes were considered as phonemes independently; irrespective of its 32 phonological phonemes.
For this study, 800 previously prepared sentences and rerecorded again in a recommended way is used as corpus. Amendments and additions to the adopted methodology was done. The whole prototype synthesis development was done automatically. A tenfold threshold method was used for training and testing of the prototype. The synthesized speech was android deployable prototype. This synthesized speech resulted a score of 5.82 using Mel Cepstral Distortion ( which is built-in objective measurement metric); while subjective evaluation resulted 4.5 and 4.3 out of 5 score, naturalness and intelligibility of the synthesized speech respectively. Both evaluations were interpreted as the synthesized speech was almost the same as natural human speech. Finally, future works were indicated.
Inhaltsverzeichnis (Table of Contents)
- Abstract
- Introduction
- Tigrinya Language and Its Writing System
- Literature Review
- A. Natural Language Processing (NLP)
- B. Digital Signal Processing (DSP)
- Related Works
- The Proposed Solution
- Discussion
- Conclusion
- References
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This study focuses on developing a model-based speech synthesis prototype for the Tigrinya language using the Festival speech synthesis system. The goal is to create a system that can accurately convert written Tigrinya text into spoken language. The research prioritizes the under-resourced linguistic nature of Tigrinya, utilizing a grapheme-based synthesizer for the frontend and a statistical parametric speech synthesis method (CLUSTERGEN) for the backend. The prototype aims to achieve high naturalness and intelligibility in the synthesized speech.
- Development of a speech synthesis prototype for Tigrinya
- Addressing the under-resourced nature of the Tigrinya language
- Integration of a grapheme-based synthesizer and a statistical parametric speech synthesis method
- Evaluation of the synthesized speech for naturalness and intelligibility
- Exploration of future directions for Tigrinya speech synthesis research
Zusammenfassung der Kapitel (Chapter Summaries)
- Abstract: This chapter introduces the study's objective, methodology, and key findings. It highlights the use of a grapheme-based synthesizer and statistical parametric speech synthesis for developing a Tigrinya speech synthesis prototype.
- Introduction: This chapter provides an overview of speech synthesis, its applications, and the importance of naturalness and intelligibility. It outlines the two main components of text-to-speech systems: natural language processing (NLP) and digital signal processing (DSP).
- Tigrinya Language and Its Writing System: This chapter delves into the characteristics of the Tigrinya language, its writing system, and its significance as an official language in Ethiopia and Eritrea. It provides details on the phonetic nature of Tigrinya's script and its organization into seven orders.
- Literature Review: This chapter examines existing research on speech synthesis, focusing on natural language processing (NLP) and digital signal processing (DSP) modules. It discusses different speech synthesis methods, including parametric, concatenative, and statistical parametric speech synthesis. The chapter emphasizes the CLUSTERGEN Synthesizer, a model-based statistical parametric speech synthesis method, which plays a key role in the proposed solution.
- Related Works: This chapter reviews previous attempts at synthesizing Tigrinya speech, focusing on the limitations of existing approaches. It highlights the significance of this study in addressing the challenges of previous research, such as reliance on concatenative synthesis and a lack of integration between frontend and backend systems.
Schlüsselwörter (Keywords)
The primary keywords and focus topics of this research encompass speech synthesis, statistical speech synthesis, grapheme-based synthesis, Tigrinya language, under-resourced language, phonetic writing system, and the CLUSTERGEN Synthesizer. These terms represent the core concepts and research focuses of the study, highlighting the development of a speech synthesis prototype for Tigrinya using a novel combination of techniques.
- Quote paper
- Luel Negasi Tewelde (Author), 2017, Grapheme Based Tigrinya Speech Synthesis Using Statistical Parametric Speech Synthesis, Munich, GRIN Verlag, https://www.grin.com/document/434779