The NeLRaLEC project has developed Text-To-Speech (TTS) application for Nepali language. This software can read Nepali texts from any documents, files, websites for people who cannot read, perhaps because they are visually impaired or are non-literate. To assist in the text-to-speech generation we have recorded normative reading of Unicode Nepali texts that represent the range of Nepali speech. At least two normative speakers, one of each gender, are used, with at least 2 hours of recording for each. This TTS is also a part of the NNC.
Development of Nepali TTS-(An Introduction)
Nepali TTS is being developed using the framework of Festival Speech Synthesis System developed by University of Edinburgh. This is a free software which supports multi-lingual speech sythesis and has an open architecture for research in this field.
For more details you can visit the site:
http://www.cstr.ed.ac.uk/projects/festival/
Festvox Speech software provides a better way of building a new synthetic voices. This project is lead by Prof. Dr. Alan W. Black. This project is within Language Technologies Institute at Carnegie Mellon University.
For more details you can visit the site: http://www.festvox.org/
Festival and Festvox uses Edinburgh Speech tool library. It has all the general libraries for speech processing routines.
For more details, you can visit the link:
http://www.festvox.org/docs/speech_tools-1.2.0/c23.htm
One month training on Building Synthetic voices was provided to the developers involved in Nepali TTS. For this purpose the developers were sent to Language Research Center (LRC) in IIIT Hyderabad, which is lead by Dr. Rajeev Sangal. The training was lead by Kishore Prahallad. He is a research scientist in LRC, IIIT and is a visiting researher in Carnegie Mellon University, working with Dr. Raj Reddy, Dr. James K. Baker and Dr. Alan W. Black. He is also supporting the developement of this application under the guidance of Prof. Dr. Alan W. Black.
Phase I-Analysis of Nepali Language:
- Defining Nepali Phoneset
| |
Nepali phone set contains 11 vowels, 35 consonants and 21 diphthongs. This set is being build by consulting linguists working in Madan Puraskar Pustakalaya. This is the first version of Nepali phone set. If we find more phones during our research, then we may include or exclude more phones in this set.
Please click here to view the phoneset. |
- Defining stress, syllabification and schwa (insertion-deletion) rules
- Defining letter to sound rules
- Defining a Lexicon for Nepali Text-to-Speech:
| |
A lexicon of around 6,000 words most frequently occurring words in Nepali text has to be built explicitly for the purpose of training the data for Letter-to sound rules. These words are syllabified, POS tagged and schwa deletion and insertion is done as appropriate.
Please click here to view the lesxicon |
- Part-Of-Speech taggaing for homograph disambiguiation
- Text Normalization
Phase II-Building Synthetic Voices: (Prototype of this phase has been developed)
- Optimal Text Collection
| |
Around 1200 sentences capturing all the diphones existing in Nepali Language is being recorded in the given two files. These sentences are collected from Nepali National Corpus being developed at MPP under Bhasha Sanchar Project. This text was recorded for generating the Nepali synthetic voice. This time we have used a female voice.
You can veiw the text from the link below:
 |
sentencesOne |
 |
sentencesTwo |
|
- Recording of speech (One male and one female)
- Speech labelling
- Extracting the speech parameters
- Building Utterances and Synthesizing
- Playing the voice
- Tuning the voice
| Download |
 |
Full Manual of Nepali TTS |
We are grateful to Mr. Kedar Sharma and Ms. Ekta Silwal for voluntarily providing their voices for the speech generation software. It was both a learning and fun phase for all of us to spend long hours in the studio and recording almost thirteen hundred sentences. |