- Dictionary encodings
To allow Babbletower to correctly read your dictionary files, you need to tell it in the
setup for a dictionary what its encoding is. The documentation of electronic dictionaries
often states in what encoding the data is delivered. If you can not obtain this information, you
will have to experiment a bit. Try opening the dictionary -or preferably just a piece of a few lines-
in your web browser, then change the encoding setting of the browser until the entries from the
dictionary are displayed correctly. Below you will find a table with the names of encodings as used in
Java. These are the names you have to use when setting the encoding for a dictionary. Note
that these names are case sensitive.
- Encodings supported by your Java runtime
You will only be able to use a particular encoding if your Java runtime supports it.
While Java runtimes on desktops usually support all or most of the encodings listed below,
in particular runtimes for PDAs often only support a small number of encodings. To find out which
encodings are supported by your runtime, start Babbletower and open the Import or
Export file dialog in the Wordbox screen. At the bottom of this dialog is a drop down
list of all supported encodings.
If the encoding of your dictionary data is not among them, you will have to convert these
dictionaries into an encoding that is supported. To be on the safe side, it is recommended to convert
your dictionaries into UTF8
. This is not only an encoding guaranteed to be
supported on all Java platforms, it can also encode all Unicode characters, i.e.
in whatever language your dictionary files are, if you convert them into UTF8
,
there is no danger of ending up with mangled text.
Babbletower comes with a tool for converting text files between different encodings. Using this
tool, you can convert your dictionary data in a Java runtime that supports the original encoding
-e.g. your desktop computer.
Usage:
java -classpath babbletower.jar Converter in_file in_encoding out_file out_encoding
Note that you need to recompute the index for a dictionary after
you change its encoding.
Character Encoding |
Description |
ASCII
|
ASCII
|
ISO8859_1
|
ISO 8859-1
|
ISO8859_2
|
ISO 8859-2
|
ISO8859_3
|
ISO 8859-3
|
ISO8859_4
|
ISO 8859-4
|
ISO8859_5
|
ISO 8859-5
|
ISO8859_6
|
ISO 8859-6
|
ISO8859_7
|
ISO 8859-7
|
ISO8859_8
|
ISO 8859-8
|
ISO8859_9
|
ISO 8859-9
|
ISO8859_15_FDIS
|
ISO 8859-15 (Final Draft Information Standard, based on ISO8859-1)
|
Big5
|
Big5, Traditional Chinese
|
Cp037
|
USA, Canada(Bilingual, French), Netherlands, Portugal, Brazil, Australia
|
Cp1006
|
IBM AIX Pakistan (Urdu)
|
Cp1025
|
IBM Multilingual Cyrillic: Bulgaria, Bosnia, Herzegovinia, Macedonia(FYR)
|
Cp1026
|
IBM Latin-5, Turkey
|
Cp1046
|
IBM Open Edition US EBCDIC
|
Cp1097
|
IBM Iran(Farsi)/Persian
|
Cp1098
|
IBM Iran(Farsi)/Persian (PC)
|
Cp1112
|
IBM Latvia, Lithuania
|
Cp1122
|
IBM Estonia
|
Cp1123
|
IBM Ukraine
|
|
Cp1124
|
IBM AIX Ukraine
|
|
Cp1140
|
Cp037 with the euro
|
|
Cp1141
|
Cp273 with the euro
|
|
Cp1142
|
Cp277 with the euro
|
|
Cp1143
|
Cp278 with the euro
|
|
Cp1144
|
Cp280 with the euro
|
|
Cp1145
|
Cp284 with the euro
|
|
Cp1146
|
Cp285 with the euro
|
|
Cp1147
|
Cp297 with the euro
|
|
Cp1148
|
Cp500 with the euro
|
|
Cp1149
|
Cp871 with the euro
|
|
Cp1250
|
Windows Eastern European
|
|
Cp1251
|
Windows Cyrillic
|
|
Cp1252
|
Windows Latin-1
|
|
Cp1253
|
Windows Greek
|
|
Cp1254
|
Windows Turkish
|
|
Cp1255
|
Windows Hebrew
|
|
Cp1256
|
Windows Arabic
|
|
Cp1257
|
Windows Baltic
|
|
Cp1258
|
Windows Vietnamese
|
|
Cp1381
|
IBM OS/2, DOS People's Republic of China (PRC)
|
|
Cp1383
|
IBM AIX People's Republic of China (PRC)
|
|
Cp273
|
IBM Austria, Germany
|
|
Cp277
|
IBM Denmark, Norway
|
|
Cp278
|
IBM Finland, Sweden
|
|
Cp280
|
IBM Italy
|
|
Cp284
|
IBM Catalan/Spain, Spanish Latin America
|
|
Cp285
|
IBM United Kingdom, Ireland
|
|
Cp297
|
IBM France
|
|
Cp33722
|
IBM-eucJP - Japanese (superset of 5050)
|
|
Cp420
|
IBM Arabic
|
|
Cp424
|
IBM Hebrew
|
|
Cp437
|
MS-DOS United States, Australia, New Zealand, South Africa
|
|
Cp500
|
EBCDIC 500V1
|
|
Cp737
|
PC Greek
|
|
Cp775
|
PC Baltic
|
|
Cp838
|
IBM Thailand extended SBCS
|
|
Cp850
|
MS-DOS Latin-1
|
|
Cp852
|
MS-DOS Latin-2
|
|
Cp855
|
IBM Cyrillic
|
|
Cp857
|
IBM Turkish
|
|
Cp858
|
Cp850 with the euro
|
|
Cp860
|
MS-DOS Portuguese
|
|
Cp861
|
MS-DOS Icelandic
|
|
Cp862
|
PC Hebrew
|
|
Cp863
|
MS-DOS Canadian French
|
|
Cp864
|
PC Arabic
|
|
Cp865
|
MS-DOS Nordic
|
|
Cp866
|
MS-DOS Russian
|
|
Cp868
|
MS-DOS Pakistan
|
|
Cp869
|
IBM Modern Greek
|
|
Cp870
|
IBM Multilingual Latin-2
|
|
Cp871
|
IBM Iceland
|
|
Cp874
|
IBM Thai
|
|
Cp875
|
IBM Greek
|
|
Cp918
|
IBM Pakistan(Urdu)
|
|
Cp921
|
IBM Latvia, Lithuania (AIX, DOS)
|
|
Cp922
|
IBM Estonia (AIX, DOS)
|
|
Cp930
|
Japanese Katakana-Kanji mixed with 4370 UDC, superset of 5026
|
|
Cp933
|
Korean Mixed with 1880 UDC, superset of 5029
|
|
Cp935
|
Simplified Chinese Host mixed with 1880 UDC, superset of 5031
|
|
Cp937
|
Traditional Chinese Host miexed with 6204 UDC, superset of 5033
|
|
Cp939
|
Japanese Latin Kanji mixed with 4370 UDC, superset of 5035
|
|
Cp942
|
Japanese (OS/2) superset of 932
|
|
Cp948
|
OS/2 Chinese (Taiwan) superset of 938
|
|
Cp949
|
PC Korean
|
|
Cp950
|
PC Chinese (Hong Kong, Taiwan)
|
|
Cp964
|
AIX Chinese (Taiwan)
|
|
Cp970
|
AIX Korean
|
|
EUC_CN
|
GB2312, EUC encoding, Simplified Chinese
|
|
EUC_JP
|
JIS0201, 0208, 0212, EUC Encoding, Japanese
|
|
EUC_KR
|
KS C 5601, EUC Encoding, Korean
|
|
EUC_TW
|
CNS11643 (Plane 1-3), T. Chinese, EUC encoding
|
|
GBK
|
GBK, Simplified Chinese
|
|
ISO2022CN
|
ISO 2022 CN, Chinese
|
|
ISO2022CN_CNS
|
CNS 11643 in ISO-2022-CN form, T. Chinese
|
|
ISO2022CN_GB
|
GB 2312 in ISO-2022-CN form, S. Chinese
|
|
ISO2022JP
|
JIS0201, 0208, 0212, ISO2022 Encoding, Japanese
|
|
ISO2022KR
|
ISO 2022 KR, Korean
|
|
JIS0201
|
JIS 0201, Japanese
|
|
JIS0208
|
JIS 0208, Japanese
|
|
JIS0212
|
JIS 0212, Japanese
|
|
KOI8_R
|
KOI8-R, Russian
|
|
MS874
|
Windows Thai
|
|
MacArabic
|
Macintosh Arabic
|
|
MacCentralEurope
|
Macintosh Latin-2
|
|
MacCroatian
|
Macintosh Croatian
|
|
MacCyrillic
|
Macintosh Cyrillic
|
|
MacDingbat
|
Macintosh Dingbat
|
|
MacGreek
|
Macintosh Greek
|
|
MacHebrew
|
Macintosh Hebrew
|
|
MacIceland
|
Macintosh Iceland
|
|
MacRoman
|
Macintosh Roman
|
|
MacRomania
|
Macintosh Romania
|
|
MacSymbol
|
Macintosh Symbol
|
|
MacThai
|
Macintosh Thai
|
|
MacTurkish
|
Macintosh Turkish
|
|
MacUkraine
|
Macintosh Ukraine
|
|
SJIS
|
Shift-JIS, Japanese
|
|
UTF8
|
UTF-8
|
|