Another Post

Post date: Feb 18, 2010 5:27:20 AM

First, note that in English the relationship between orthography and pronunciation is rather complicated. There are a variety of interesting reasons for this, but unfortunately a full discussion is beyond the scope of the current document.

If you encounter this problem and if you know the correct pronunciation, you should add it to a custom dictionary and lmtool will use it (you can do this on the lmtool-advanced page). Of course the simplest reason for a bad pronunciation is just that the dictionary has an error in it (it was created by humans); in this case we would appreciate it if you could let us know about it so that we can fix the entry.

The two main ones to bear in mind are: 1) word tokens are expected to contain fewer than 35 characters. 2) a token may contain only ASCII characters. These limitations are present because lmtool is currently Anglo-centric. Non-English words may often result in bad pronunciation guesses. This includes loan words (“entrepreneur”), proper names (“Megher”) and technical terms (such as found in biology or medicine). Although this does not address the pronunciation issue, we are working to extend the system to be able to deal with UTF-8 text and to include other languages.

lmtool will try to give you a pronunciation for every word-like string in your corpus (by word we mean any sequence of characters delimited by whitespace). It does this in two ways, by querying a large pronunciation dictionary (cmudict) and by applying letter-to-sound (LtoS) rules to words not found in the dictionary. Actually, if a word is not found in the dictionary, lmtool first tries to apply a few simple affix rules to the unknown word to see if there's a baseform available that can be deterministically modified (for example, many plurals or possessives are predictable). If no dictionary-based pronunciation is available, lmtool applies LtoS rules. These rules are based on systematic patterns in the English language (such as they are).

Yes. You can download it; it is in open source. Once you get it and look at the code you will realize that while it is a quite terse implementation of a language modeler, it is also quite inefficient. This is another reason to use a standard toolkit if your corpus starts to get big.

lmtool uses the quicklm statistical language modeler. quicklm was designed to create suitable language models from very little data, say corpora smaller than about 50,000 words. If you have a larger corpus available you should be using one of the standard tools mentioned above. quicklm assumes that n-gram counts computed from small corpora are inherently unreliable and consequently uses a discounting scheme not tied to counts and which applies a uniform "ratio" discount to all n-grams. The ratio can be varied to control the "looseness" of the model. Experiments indicate that the resulting models produce better recognition performance than standard approaches (but, again, only for very small corpus sizes).

The language model file is plain text. The format is the commonly used "arpa" format which is standard in speech recognition research. It lists 1-,2- and 3-grams along with their likelihood (the first field) and a back-off factor (the third field).

It’s the customary way to configure the Sphinx decoder. Many commercial decoders use finite-state grammars (FSG) to perform the same function; these are also referred to as language models, but they are very different in conception and in operation. Sphinx can use FSG’s; currently you need to figure out the details on your own.

The language model used in a decoder captures the constraint inherent in the word sequences found in a corpus. This information is used to constrain search and as a result significantly improve recognition accuracy. It follows that using a language model that does not correspond to your target domain will result in very poor recognition performance (not only because sequence statistics are incorrect but also because the vocabulary will be incomplete). Note that the lmtool model is statistical.

To produce satisfactory models you should observe the following guidelines:

· Avoid the use of mixed-case text, specifically different versions of the same word. "Black" and "black" will be treated as different words. On the other hand, if "Black" is indeed a different word, say a proper noun, you may want to retain the distinction.

· Remove punctuation, since it can introduce irrelevant word variants, i.e., "therefore," and "therefore" will be treated as different words.

· Enter all numbers in terms of the words you expect them to be spoken, e.g., "23" should be "twenty three". Otherwise your number will be rendered as a digit sequence, e.g. “two three”.

· Text conditioning is unfortunately a bit of a black art, and correct interpretation often depends on context. For example, "id" could be an abbreviation for "identification"; then again it could be a technical term in a discussion of psychoanalysis. There are at least two problems: the two words will have different pronunciation and the words might have very different distributional characteristics if both occur in the same corpus. You want to avoid such ambiguities.

Please see the page on Conditioning for additional discussion and suggestions. Note that lmtool will not do any conditioning for you, as it doesn’t want to make any assumptions about what you are trying to do.

It is most appropriate for generation English-language models. There is a current limit of 6000 word for the corpus. If you need non-English models you may still be able to use the tool, either by downloading quicklm or by using the handdict option (see below). Otherwise we recommend that you use one of the standard tools described above.

As you gain experience with your application you may notice that some of the pronunciations generated by lmtool are not quite right or are different from how you or your users pronounce the words. The advanced tool allows you to also upload your own custom dictionary, whose pronunciations will override the standard ones. Earlier versions of the advanced tool gave the user control over additional aspects of compilation. This was at a time at which there was less standardization in Sphinx and, generally, in ASR. Currently many of these options are no longer of use. They will be removed in newer versions.

You need a corpus, which in this case means a set of sentences (or more precisely, utterances) that you expect your recognition system to be able to handle (i.e., what people might reasonably want to say to your system).

The corpus needs to be in the form of an ASCII text file, with one sentence to a line. Upload this file, click the compile button, and your language model and dictionary will appear. You do need to prepare the text according to certain conventions. See Conditioning, below.

lmtool was developed in the Speech Group at Carnegie Mellon University in the early 1990's as a convenient way to quickly generate language models for experiments using the Sphinx decoder. It was used to generate language models for very small domains, particularly ones for spoken language interfaces. Since then it has been extended a number of times and has generally been updated to track the needs of its users. Alexander Rudnicky built the initial version and has been maintaining it since then.

Currently the following two toolkits are in widespread use: the CMU-CU slmtk (open source) and the SRILM (free, but under license). There are several other language modeling toolkits available that are specialized for working with very large corpora, however most of these are no used in speech recognition but for other applications that use language models, for example statistical machine translation.

Lmtool is best for small domains, application prototyping and for small scale experimentation with ASR systems. lmtool has been developed primarily for use with the Sphinx recognition system and takes into account its characteristics. Nevertheless the language model and pronunciation formats are the same as used by many, if not most, other research recognition systems. You should be able to use lmtool to generate knowledge bases for those systems as well.

· Building models from large corpora, either in vocabulary size or in the amount of training data available. “Large” is a relative term; you should think of it as some function of vocabulary size and the number of distinct n-grams in your corpus. Lmtool will tell you if it believes your submission is too large.

· When you need to have finer control over model generation, such as using different discounting schemes or the need to access intermediate products, such as count table.

lmtool is a web based tool that allows users to quickly compile two text-based components needed for using an ASR decoder. These are the language model and the lexical model (referred to variously as the 'pronouncing dictionary' or simply the ‘dictionary’). The remaining component is the acoustic model. Together these are sometimes referred to as the decoder knowledge base.

Both language modeling and pronunciation modeling are multi-step procedures and are typically embodied in a set of (shell) scripts. The purpose of lmtool is to hide this complexity and to simplify knowledge base generation as much as possible.

Contents

1) General Information

1.1) What is the lmtool?

1.2) Who should use lmtool?

1.3) What is lmtool good for?

1.4) What is lmtool not good for?

1.4.1) What are my alternatives?

1.5) Where did the lmtool come from?

2) Using the lmtool

2.1) Where do I start?

2.2) What does the "advanced" version of lmtool do?

2.3) What are some technical limitations of lmtool?

2.4) Why do I need to condition my corpus?

3) Lmtool language models

3.1) What is a language model?

3.2) Why does lmtool produce statistical language models?

3.3) What is the format of the language model file?

3.4) How does the lmtool language modeler work?

3.5) Can I get a copy of quicklm?

4) Lmtool dictionaries

4.1) How are lmtool pronunciations generated?

4.2) What are some of the technical limitations of pronunciation generation?

4.3) Why is lmtool giving me this totally wrong pronunciation?

4.4) What is the handdict?

And thine ears shall hear a word - A command or admonition. You shall not be left without spiritual guides and directors.

Behind thee - That is, says Vitringa, the voice of conscience, as an "invisible" guide, shall admonish you. The idea, however, seems to be that if they were ignorant of the way, or if they were inclined to err, they should be admonished of the true path which they ought to pursue. The idea is taken either from the practice of teachers who are represented as "following" their pupils and admonishing them if they were in danger of going astray (Grotius; or from shepherds, who are represented as following their flocks, and directing them when they wandered. The Jews understand this voice 'from behind' to be the כל בת bath kol - 'the daughter of the voice;' a divine admonition which they suppose attends the pious. The essential thought is, that they would not be left without a guide and instructor; that, if they were inclined to go astray, they would be recalled to the path of truth and duty. Perhaps there is the idea, also, that the admonition would come from some "invisible" influence, or from some unexpected quarter, as it is often the case that those who are inquiring on the subject of religion receive light from quarters where they least expected, and from sources to which they were not looking. It is also true that the admonitions of Providence, of conscience, and of the Holy Spirit, seem often to come from "behind" us. that is, they "recall" us from the path in which we were going, and restrain us from a course that would be fraught with danger.

When ye turn to the right hand ... - When you shall be in danger of wandering from the direct and straight path. The voice shall recall you, and direct you in the way in which you ought to g

And thine ears shall hear a word behind thee,.... Which may be said in reference to the backsliding and declining state of the people, Isaiah 30:11 and is thought by some to be an allusion to schoolmasters, who stand behind their scholars, or at their backs, to guide, teach, and instruct them; and by others to shepherds following their flocks, who, when they observe any of the sheep going out of the way, call them back; or to travellers, who, coming to a place where are several ways, and being at a loss which way to take, and inclining to turn to the right or left, are called to by persons behind them, and directed in the right way. This "voice behind" is by the Jews (e) interpreted of Bath Kol; and by others of the voice of conscience; but it rather intends the Spirit of God, and his grace; though it seems best to understand it of the Scriptures of truth, the word of God, the only rule of faith and practice; the language of which is,

saying, This is the way, walk ye in it; it directs to Christ the way, and who is the only way of life and salvation to be walked in by faith, and to all the lesser paths of duty and doctrine, which to walk in is both pleasant and profitable, and which is the right way; so the Targum paraphrases it,

"this is the right way;''

Text Box

The King Clown

1) General Information

1.1) What is the lmtool?

1.2) Who should use lmtool?

1.3) What is lmtool good for?

1.4) What is lmtool not good for?

1.4.1) What are my alternatives?

1.5) Where did the lmtool come from?

2) Using the lmtool

2.1) Where do I start?

2.2) What does the "advanced" version of lmtool do?

2.3) What are some technical limitations of lmtool?

2.4) Why do I need to condition my corpus?

3) Lmtool language models

3.1) What is a language model?

3.2) Why does lmtool produce statistical language models?

3.3) What is the format of the language model file?

3.4) How does the lmtool language modeler work?

3.5) Can I get a copy of quicklm?

4) Lmtool dictionaries

4.1) How are lmtool pronunciations generated?

4.2) What are some of the technical limitations of pronunciation generation?

4.3) Why is lmtool giving me this totally wrong pronunciation?

4.4) What is the handdict?

· Developers who are building systems and need to quickly configure their system for testing or initial deployment.

· Researchers concentrating on other aspects of speech recognition or spoken language systems and just need something that works.

Isaiah 30 Interlinear

Copy of Scrapbook Album

3588 [e]

kî

כִּ֥י

whenever

Conj

◄ Isaiah 30:21 ►

ḇōw,

ב֔וֹ

in it

Prep

2088 [e]

zeh

זֶ֤ה

This

Pro

､

559 [e]

lê·mōr;

לֵאמֹ֑ר

you saying

Verb

1980 [e]

lə·ḵū

לְכ֣וּ

walk

Verb

8041 [e]

ṯaś·mə·’î·lū.

תַשְׂמְאִֽילוּ׃

you turn to the left

Verb

310 [e]

mê·’a·ḥă·re·ḵā

מֵֽאַחֲרֶ֖יךָ

behind

Adv

1697 [e]

ḏā·ḇār,

דָבָ֔ר

a word

Noun

､

3588 [e]

wə·ḵî

וְכִ֥י

when

Conj

1870 [e]

had·de·reḵ

הַדֶּ֙רֶךְ֙

[is] the way

Noun

8085 [e]

tiš·ma‘·nāh

תִּשְׁמַ֣עְנָה

shall hear

Verb

､

541 [e]

ṯa·’ă·mî·nū

תַאֲמִ֖ינוּ

you turn to the right hand

Verb

241 [e] 21

wə·’ā·zə·ne·ḵā 21

וְאָזְנֶ֙יךָ֙

And your ears 21

Noun 21

This is a custom pronunciation dictionary that users can optionally upload from the lmtool-advanced page that allows the user to override pronunciations generated by the system. You may have several different reasons for doing this. For example, your corpus may contain foreign loan words that are absent from the dictionary and that are not rendered correctly by the LtoS. Or you may have terms that exceed the character-count limit for word tokens (35).

For more information please contact the maintained

1Woe, O Ariel, Ariel the city where David once camped! Add year to year, observe your feasts on schedule.

1Woe to Ariel, to Ariel, the city where David dwelt! add ye year to year; let them kill sacrifices.

1Woe to Ariel, Ariel, the city where David camped! Continue year after year; let the festivals recur.