Ex-word Format Description from brijohn


[Click to open]
===== Dictionary File Formats (WIP) =====

This file is my current collection of notes on the various file formats
used in ex-word add-on dictionaries.

==== BMP Files ====

Standard Bitmap image files nothing special about them can be editedd in
any program capable of manipulating BMPs.

==== guide*.htm ====
These are normal html files used to display the guide/help information
for an add-on dictionary.

==== diction.htm ====
Small html file containing the title of the add-on dictionary.

** diction.htm template **
<code html>
<html>
<head>
<title>Add-on Dictionary Title Here</title>
<meta name="soft" content="OFF">
</head>
<body>
</body>
</html>
</code>

==== info*.htm ====
configuration files that specify various options for the dictionary. Despite
the extension these are not actual html files.

==== CJF Files ====
These are simple bitmap font files that appear to have a 256 byte header.

CJF files are simple bitmap font files that can contain one or more fonts
within a single file. They contain a 0x40 byte main header, followed by one
or more 0xC0 byte headers describing each font. Following the headers is the
actual bitmap data for the fonts contained in the CJF file.


=== CJF Header (0x40 bytes) ===
0x00-0x02: CJF (magic value)
0x03 : type (0x81 = single font, 0xC1 = multiple fonts)
0x04-0x07: unknown (so far only seen 0x40)
0x08-0x09: unknown
0x0A-0x1F: Filler (0xFF)
0x20-0x3F: font title - unused bytes padded with 0x20 (SJIS encoding used)

=== Font header (0xC0 bytes) ===

0x00-0x03: unknown (always seems to be 0x00 in single font files, and 0xFFFFFFFF in multi font files)
0x04-0x07: next font header offset (0xFFFFFFFF if no more headers)
0x08-0x09: start unicode code point
0x0A-0x0B: zero padding
0x0C-0X0D: end unicode code point
0x0E-0X0F: zero padding
0x10-0x17: unknown
0x18-0x19: height
0x1A-0x1B: width
0x1C-0x1D: bytes per glyph
0x1E-0x1F: flags (bitwise: 0x00 = normal, 0x01 = bold, 0x02 = italic)
0x20-0x27: Filler (0xFF)
0x28-0x33: unknown
0x34-0x37: offset where font data starts
0x38-0xBF: Filler (0xFF)

==== fileinfo.cji ====
This is a simple text file that lists the CJT and CJD files used by the
add-on dictionary.

=== CJD Files ===
These files are listed as <filename>,<filesize>,<unknown number>

The third field I'm not currently sure what it represents.

These files generally seem to be the data files used by the add-on though
the format used in the file may depend on the APL version being used by
the add-on.

=== CJT Files ===
These files are listed as <filename>,0,<bytes per entry>,<number of entries>

These files are usually lists of numbers containing <number of entries> that
are of length <bytes per entry>. These are typically lists of indexes and offsets.

If <bytes per entry> is -1 the file in question seems to be a string table
with each string seperated by a 0xFF character.

The second field always seems to be zero.

=== String tables ===

These tables are defined by a set of three files an index, offset and string file.

These string tables are not actually used for display purposes, instead it
seems they are used to match strings when you are searching for an entry.

From the Kenkyusha dictionary we have the following 3 files:
* sjstr.cjt
* sjoff.cjt
* sjidx.cjt

The string file contains a series of non-null terminated strings separated by
0xFF characters. The offsets file contains a list of offsets which tell the
offset of each string in the strings file. I'm not a hundred percent sure on
how the index file is used but I suspect it is used to map each string to an
entry in the dictionary.

==== comp.cjd ====

This appears to be the main dictionary file that contains all the actual entries.

Format on this seems to have a null terminated string follwed by entry data.
This is reapeated for each entry in the dictionary, The null terminated string
seems to be used for lookup rather then display.

The entra data for each entry does not seem to contain any valid sjis or ascii
strings, so i believe it is using some form of text compression/encoding. If I
modify byte values in the entry data it definately affects the text display for
that entry.

Also the dbadd.cjt seems to be a list of offsets to each entry in the comp.cjd file.


==== wct.cjd, head.cjd, tree.cjd ====

All dictionaries have these three files which are never actually listed in the
fileinfo.cji file. I believe these files are used in the text encoding/compression.

Casio Ex-word Test Menu

Here is the method to enter the CASIO dictionary TEST MENU that can be seen as a diagnostics mode.
1. Power off dictionary
2. Hold the go-back key (for newer modules it is near the four navigation keys older modules it is on the left), the page up key and the power key for 5 seconds until it beeps and the screen light on and pop-up a window shows the Model and the BIOS Version
3. Release the three keys and press the right navigation key two times, then press enter key then it will beep two times and enter the hidden TEST MENU

[Click to see pictures]