World Library  
Flag as Inappropriate
Email this Article

Pronunciation Lexicon Specification

 

Pronunciation Lexicon Specification

The Pronunciation Lexicon Specification (PLS) is a W3C Recommendation, which is designed to enable interoperable specification of pronunciation information for both speech recognition and speech synthesis engines within voice browsing applications. The language is intended to be easy to use by developers while supporting the accurate specification of pronunciation information for international use.

The language allows one or more pronunciations for a word or phrase to be specified using a standard pronunciation alphabet or if necessary using vendor specific alphabets. Pronunciations are grouped together into a PLS document which may be referenced from other markup languages, such as the Speech Recognition Grammar Specification SRGS and the Speech Synthesis Markup Language SSML.

Contents

  • Usage 1
  • Common Use Cases 2
    • Multiple pronunciations for the same orthography 2.1
    • Multiple orthographies 2.2
    • Homophones 2.3
    • Homographs 2.4
    • Pronunciation by Orthography (Acronyms, Abbreviations, etc.) 2.5
  • Status and Future 3
  • See also 4
  • References 5
  • External links 6

Usage

Here is an example PLS document:

 
 
   
     judgment
     judgement
     ˈdʒʌdʒ.mənt
      
   
   
     fiancé
     fiance
     fiˈɒns.eɪ
      
     ˌfiː.ɑːnˈseɪ
      
   
 

which could be used to improve TTS as shown in the following SSML 1.0 document:

 
 
   
   

In the judgement of my fiancé, Las Vegas is the best place for a honeymoon. I replied that I preferred Venice and didn't think the Venetian casino was an acceptable compromise.

but also to improve ASR in the following SRGS 1.0 grammar:

 
 
   
   
     
             Terminator 2: Judgment Day 
             My Big Fat Obnoxious Fiance 
             Pluto's Judgement Day
      
   
 

Common Use Cases

Multiple pronunciations for the same orthography

For ASR systems it is common to rely on multiple pronunciations of the same word or phrase in order to cope with variations of pronunciation within a language. In the Pronunciation Lexicon language, multiple pronunciations are represented by more than one (or ) element within the same element.

In the following example the word "Newton" has two possible pronunciations.

 
 
   
     Newton
     ˈnjuːtən
     
     ˈnuːtən
     
   
 

Multiple orthographies

In some situations there are alternative textual representations for the same word or phrase. This can arise due to a number of reasons. See Section 4.5 of PLS for details. Because these are representations that have the same meaning (as opposed to homophones), it is recommended that they be represented using a single element that contains multiple graphemes.

Here are two simple examples of multiple orthographies: alternative spelling of an English word and multiple writings of a Japanese word.

 
 
   
   
     colour
     color
     ˈkʌlər
     
   
 

 
 
   
   
     nihongo
     日本語
     にほんご
     ɲihoŋŋo
     
   
 

Homophones

Most languages have homophones, words with the same pronunciation but different meanings (and possibly different spellings), for instance "seed" and "cede". It is recommended that these be represented as different lexemes.

 
 
   
     cede
     siːd
     
   
   
     seed
     siːd
     
   
 

Homographs

Most languages have words with different meanings but the same spelling (and sometimes different pronunciations), called homographs. For example, in English the word bass (fish) and the word bass (in music) have identical spellings but different meanings and pronunciations. Although it is recommended that these words be represented using separate elements that are distinguished by different values of the role attribute (see Section 4.4 of PLS 1.0), if a pronunciation lexicon author does not want to distinguish between the two words they could simply be represented as alternative pronunciations within the same element. In the latter case the TTS processor will not be able to distinguish when to apply the first or the second transcription.

In this example the pronunciations of the homograph "bass" are shown.

 
 
   
     bass
     bæs
     
     beɪs
     
   
 

Note that English contains numerous examples of noun-verb pairs that can be treated either as homographs or as alternative pronunciations, depending on author preference. Two examples are the noun/verb "refuse" and the noun/verb "address".

 
 
   
     refuse
     rɪˈfjuːz
     
   
   
     refuse
     ˈrefjuːs
     
   
 

Pronunciation by Orthography (Acronyms, Abbreviations, etc.)

For some words and phrases pronunciation can be expressed quickly and conveniently as a sequence of other orthographies. The developer is not required to have linguistic knowledge, but instead makes use of the pronunciations that are already expected to be available. To express pronunciations using other orthographies the element may be used.

This feature may be very useful to deal with acronym expansion.

 
 
   
   
     W3C
     World Wide Web Consortium
   
   
   
     101
     one hundred and one
   
   
   
     Thailand
     tie land
   
   
   
     BBC 1
     be be sea one
   
 

Status and Future

  • PLS 1.0 reached the status of W3C Recommendation on 14 October 2008.

See also

References

  • PLS Specification (W3C Recommendation)

External links

  • PLS Specification (W3C Recommendation)
  • W3C Press Release
  • SRGS Specification (W3C Recommendation)
  • SSML Specification (W3C Recommendation)
  • VoiceXML Forum
  • France Telecom Orange Labs implementation of PLS 1.0 under the Gnu General Public License version 3
  • SourceForge project for Java-based implementation of PLS 1.0
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
 
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
 
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.
 


Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.