libak  0.4.0
Data Structures | Functions
Language model
Models

Data Structures

struct  akLM
 Language model. More...

Functions

akLMak_lm_new_from_arpa_file (FILE *from, akDict *vocab, const akBool emptyvocab, const char *begin_sym, const char *end_sym, const char *ninf, char **err)
 Reads a language model from an ARPA file.
void ak_lm_free (akLM *lm)
 Frees memory.
void ak_lm_print (const akLM *lm, FILE *to, const akDict *words)
 Prints the language model.
void ak_lm_set_gsf (akLM *lm, const akFloat gsf)
 Sets the grammar scale factor.
void ak_lm_set_wip (akLM *lm, const akProb wip)
 Sets the word insertion penalty.
akLMak_lm_new_from_file (FILE *from, akLexicon *lexicon, const akBool emptylex, const akDict *syms, char **err)
 Reads a language model from a file.
akLMak_lm_new_from_wordnet_file (FILE *from, akDict *vocab, const akBool emptyvocab, const char *begin_sym, const char *end_sym, char **err)
 Reads a language model from a Wordnet file.

Function Documentation

void ak_lm_free ( akLM lm)

Frees memory.

Frees the memory allocated for the language model.

Parameters:
lmThe language model.
akLM* ak_lm_new_from_arpa_file ( FILE *  from,
akDict vocab,
const akBool  emptyvocab,
const char *  begin_sym,
const char *  end_sym,
const char *  ninf,
char **  err 
)

Reads a language model from an ARPA file.

This functions creates a new language model from a text description in ARPA format stored in the given file. A akDict is provided to register the words. If the vocabulary is empty new words are registered, else a word out of vocabulary is treated as an error. The tokens used to represent the special final word and -INF are required. The token for the special initial word is optional. The token used for the final word can also be used for the initial word.

Parameters:
fromFile where the text description is stored.
vocabThe dictionary where the words are registered.
emptyvocabIndicates if the provided dictionary is empty (new words must be registered) or not (new words are treated as errors).
begin_symThe token for the special initial word. Can be NULL.
end_symThe token for the special final word.
ninfThe token representing -INF.
errPointer to string variable. If not NULL an error message is allocated in the variable in case of error.
Returns:
The language model, or NULL in case of error.
akLM* ak_lm_new_from_file ( FILE *  from,
akLexicon lexicon,
const akBool  emptylex,
const akDict syms,
char **  err 
)

Reads a language model from a file.

This functions creates a new language model from a text description stored in the given file. The language model is supposed to be in the same format than the generated by ak_lm_print. A akLexicon is provided to register the words. If the lexicon is empty new words are registered in the lexicon, else a word out of vocabulary is treated as an error. If a dictionary is provided new registered words are considered to be coded in UTF-8 and split into characters, being an out of dictionary character an error. In other case, new words are registered whit an empty sequence of symbols. Hence, in this case the resulting akLexicon is a bad formed akLexicon, however, it can be used to print the language model.

Parameters:
fromFile where the text description is stored.
lexiconThe lexicon where words are registered.
emptylexIndicates if the provided lexicon is empty (new words must be registered) or not (new words are treated as errors).
symsA character dictionary used to register new words. If not NULL out of vocabulary characters are treated as errors, if NULL characters are ignored.
errPointer to string variable. If not NULL an error message is allocated in the variable in case of error.
Returns:
The language model, or NULL in case of error.
akLM* ak_lm_new_from_wordnet_file ( FILE *  from,
akDict vocab,
const akBool  emptyvocab,
const char *  begin_sym,
const char *  end_sym,
char **  err 
)

Reads a language model from a Wordnet file.

This functions creates a new language model from a text description in Wordnet format stored in the given file. A akDict is provided to register the words. If the vocabulary is empty new words are registered, else a word out of vocabulary is treated as an error. The tokens used to represent the special initial and final words are required. The token used for the final word can also be used for the initial word.

Parameters:
fromFile where the text description is stored.
vocabThe dictionary where the words are registered.
emptyvocabIndicates if the provided dictionary is empty (new words must be registered) or not (new words are treated as errors).
begin_symThe token for the special initial word.
end_symThe token for the special final word.
errPointer to string variable. If not NULL an error message is allocated in the variable in case of error.
Returns:
The language model, or NULL in case of error.
void ak_lm_print ( const akLM lm,
FILE *  to,
const akDict words 
)

Prints the language model.

This function writes in the given file the content of the language model, using the provided word dictionary. The provided word dictionary is supposed to contain all needed words, in another case an unexpected error could happen. The content is not written using the ARPA format nor wordnet format, instead a private text representation is used.

Parameters:
lmThe language model.
toFile where the model is written.
wordsDictionary containing the words.
void ak_lm_set_gsf ( akLM lm,
const akFloat  gsf 
)

Sets the grammar scale factor.

This function modifies the given language model by applying the given grammar scale factor. Note that if a previous grammar scale factor (gsf0) was applied, the resulting model will have a grammar scale factor of gsf0*gsf.

Parameters:
lmThe language model to modify.
gsfGrammar scale factor.
void ak_lm_set_wip ( akLM lm,
const akProb  wip 
)

Sets the word insertion penalty.

This function modifies the given language model by applying the word insertion penalty. Note that if a previous word insertion penalty (wip0) was applied, the resulting model will have a word insertion penalty of wip0+wip.

Parameters:
lmThe language model to modify.
wipWord insertion penalty.
 All Data Structures Variables