libak
0.4.0
|
Data Structures | |
struct | akLM |
Language model. More... | |
Functions | |
akLM * | ak_lm_new_from_arpa_file (FILE *from, akDict *vocab, const akBool emptyvocab, const char *begin_sym, const char *end_sym, const char *ninf, char **err) |
Reads a language model from an ARPA file. | |
void | ak_lm_free (akLM *lm) |
Frees memory. | |
void | ak_lm_print (const akLM *lm, FILE *to, const akDict *words) |
Prints the language model. | |
void | ak_lm_set_gsf (akLM *lm, const akFloat gsf) |
Sets the grammar scale factor. | |
void | ak_lm_set_wip (akLM *lm, const akProb wip) |
Sets the word insertion penalty. | |
akLM * | ak_lm_new_from_file (FILE *from, akLexicon *lexicon, const akBool emptylex, const akDict *syms, char **err) |
Reads a language model from a file. | |
akLM * | ak_lm_new_from_wordnet_file (FILE *from, akDict *vocab, const akBool emptyvocab, const char *begin_sym, const char *end_sym, char **err) |
Reads a language model from a Wordnet file. |
void ak_lm_free | ( | akLM * | lm | ) |
Frees memory.
Frees the memory allocated for the language model.
lm | The language model. |
akLM* ak_lm_new_from_arpa_file | ( | FILE * | from, |
akDict * | vocab, | ||
const akBool | emptyvocab, | ||
const char * | begin_sym, | ||
const char * | end_sym, | ||
const char * | ninf, | ||
char ** | err | ||
) |
Reads a language model from an ARPA file.
This functions creates a new language model from a text description in ARPA format stored in the given file. A akDict is provided to register the words. If the vocabulary is empty new words are registered, else a word out of vocabulary is treated as an error. The tokens used to represent the special final word and -INF are required. The token for the special initial word is optional. The token used for the final word can also be used for the initial word.
from | File where the text description is stored. |
vocab | The dictionary where the words are registered. |
emptyvocab | Indicates if the provided dictionary is empty (new words must be registered) or not (new words are treated as errors). |
begin_sym | The token for the special initial word. Can be NULL. |
end_sym | The token for the special final word. |
ninf | The token representing -INF. |
err | Pointer to string variable. If not NULL an error message is allocated in the variable in case of error. |
akLM* ak_lm_new_from_file | ( | FILE * | from, |
akLexicon * | lexicon, | ||
const akBool | emptylex, | ||
const akDict * | syms, | ||
char ** | err | ||
) |
Reads a language model from a file.
This functions creates a new language model from a text description stored in the given file. The language model is supposed to be in the same format than the generated by ak_lm_print. A akLexicon is provided to register the words. If the lexicon is empty new words are registered in the lexicon, else a word out of vocabulary is treated as an error. If a dictionary is provided new registered words are considered to be coded in UTF-8 and split into characters, being an out of dictionary character an error. In other case, new words are registered whit an empty sequence of symbols. Hence, in this case the resulting akLexicon is a bad formed akLexicon, however, it can be used to print the language model.
from | File where the text description is stored. |
lexicon | The lexicon where words are registered. |
emptylex | Indicates if the provided lexicon is empty (new words must be registered) or not (new words are treated as errors). |
syms | A character dictionary used to register new words. If not NULL out of vocabulary characters are treated as errors, if NULL characters are ignored. |
err | Pointer to string variable. If not NULL an error message is allocated in the variable in case of error. |
akLM* ak_lm_new_from_wordnet_file | ( | FILE * | from, |
akDict * | vocab, | ||
const akBool | emptyvocab, | ||
const char * | begin_sym, | ||
const char * | end_sym, | ||
char ** | err | ||
) |
Reads a language model from a Wordnet file.
This functions creates a new language model from a text description in Wordnet format stored in the given file. A akDict is provided to register the words. If the vocabulary is empty new words are registered, else a word out of vocabulary is treated as an error. The tokens used to represent the special initial and final words are required. The token used for the final word can also be used for the initial word.
from | File where the text description is stored. |
vocab | The dictionary where the words are registered. |
emptyvocab | Indicates if the provided dictionary is empty (new words must be registered) or not (new words are treated as errors). |
begin_sym | The token for the special initial word. |
end_sym | The token for the special final word. |
err | Pointer to string variable. If not NULL an error message is allocated in the variable in case of error. |
void ak_lm_print | ( | const akLM * | lm, |
FILE * | to, | ||
const akDict * | words | ||
) |
Prints the language model.
This function writes in the given file the content of the language model, using the provided word dictionary. The provided word dictionary is supposed to contain all needed words, in another case an unexpected error could happen. The content is not written using the ARPA format nor wordnet format, instead a private text representation is used.
lm | The language model. |
to | File where the model is written. |
words | Dictionary containing the words. |
void ak_lm_set_gsf | ( | akLM * | lm, |
const akFloat | gsf | ||
) |
Sets the grammar scale factor.
This function modifies the given language model by applying the given grammar scale factor. Note that if a previous grammar scale factor (gsf0) was applied, the resulting model will have a grammar scale factor of gsf0*gsf.
lm | The language model to modify. |
gsf | Grammar scale factor. |
void ak_lm_set_wip | ( | akLM * | lm, |
const akProb | wip | ||
) |
Sets the word insertion penalty.
This function modifies the given language model by applying the word insertion penalty. Note that if a previous word insertion penalty (wip0) was applied, the resulting model will have a word insertion penalty of wip0+wip.
lm | The language model to modify. |
wip | Word insertion penalty. |