A format definition looks as follows:
format.
name = /
fields/
field separators/
separator replacements/
fields | List the fields contained in a dictionary entry in the order in which they appear.
The marks for the four fields are:
|
field separators | List the separators that mark the beginnings of the fields defined in fields,
starting with the separator for the second field. (The first field starts at the beginning of
the dictionary entry.)
Note: If you want to specify the slash character |
separator replacements | You may want to replace field separators to improve the appearance of dictionary entries. For example, if fields in a dictionary are separated by tabulators, you could replace them with spaces. Put the replacement characters for the separators defined in field separators here, in the same order, one character per separator. To keep a particular separator, put the @ mark in its place. |
When defining the format for a dictionary, you need to inspect several entries to see which fields are contained, and how they are separated. Since this may all be a bit confusing, here an example:
Example
Following three entries taken from a German-English dictionary:
Freude {f} enjoyment Freudenfeuer {n}; Feuer im Freien bonfire Freunde {pl}; Bekannte {pl} friends
An appropriate format definition would be:
format.gereng = /ht/\t/@/This tells Babbletower that an entry from this dictionary contains a head entry and a translation. The translation is separated from the head entry by a tabulator. (It is safer to write a tabulator using the escape sequence
\t
, as shown in the example.)
It also states that the separator should not be replaced.
To also extract grammar remarks, e.g. the {f}
after Freude
,
indicating that this is a female noun, you could extend this format in the following way:
format.gereng = /het/{\t/@@/This defines that an entry has three fields now -the explanation field was inserted- and that this field starts with an open curly brace
{
. However,
looking at the second and third sample entry, we see that a grammar remark may be
followed by another head entry, and that an entry may also contain more than one
grammar remark. These entries would therefore be incorrectly separated in to:
head entry | explanation | translation |
---|---|---|
Freudenfeuer |
{n}; Feuer im Freien |
bonfire |
Freunde |
{pl}; Bekannte {pl} |
friends |
default |
This is the default dictionary format. Whenever there is a problem with a user
defined format, Babbletower will attempt to use this format instead. The
equivalent definition is:
|
edict |
This is a format for the popular edict Japanese-English dictionary
from the
Monash Nihongo ftp Archive. It's equivalent definition would be:
However, this format comes with some additional on-the-fly transcoding for
improved readability. |
edict_pda |
Same as edict , but with slightly different on-the-fly transcoding
for nicer display on small screens. |
kanjidic |
Format for kanjidic, another popular dictionary from the
Monash site. This format has no equivalent definition. The specifics
of this dictionary required a fully programmatic formatter. |