java -classpath babbletower.jar Index dictionary_file encoding index_file index_depth
Note: An index files needs to be placed into the same directory as its accompanying
dictionary file, bearing the same name plus the ending .idx
.
The meaning of the parameters:
dictionary_file |
Name of the dictionary file to index. |
encoding |
text encoding of dictionary |
index_file |
Name of the file into which to save the index. If this file exists it will be overwritten without warning! |
index_depth |
The depth of the index. This is the maximum number of significant characters
that the indexer will use when indexing words. For example, when using a depth
of 4, the indexer 'looks' only at the first four letters of words, so the words
conference and confederation would be put into the same index entry.
This does however not mean that when looking up conference, you
would also get confederation as a search result. Lookup results do not
depend on the depth of an index. With the index depth you merely determine the space
vs. time tradeoff of index files: A 'shallow' index is smaller, but also slower, while a 'deep'
index is faster, but bigger. For a dictionary that mostly carries words written with Latin letters, i.e. a small alphabet, a depth of 6 or 7 is recommended. For example, there are quite a few words in English starting with conf, so a depth of 4 could lead to long lookup times when searching for a word with a very common prefix. On the other hand, dictionaries that carry only words from a language with a large 'alphabet', a smaller depth may be sufficient. For a monolingual Japanese dictionary for example, a depth of 4 should be sufficient. However, the impact of the depth also depends on the size of the dictionary. For smaller ones, a shallow index may still be fast enough.
|