SIL Language Software Development
Materials and information for software developers and testers who might be interested in Alpha and Beta versions and open-source development.
SIL FieldWorks Script Support
From release 2.0, FieldWorks stores all strings in the database using Unicode, thus there is potential for supporting almost any language or script.
FieldWorks (2.0 and beyond) supports complex script keyboarding and rendering at the highest level in all its applications. It also supports several different approaches for converting legacy 8-bit data to Unicode. Support for Private Use Characters is built in; FieldWorks 3.0 provides a way to define these characters within FieldWorks.
Script support specifics
- Unicode encoding
- UTF-16 (allows for around 64,000 characters and covers the vast majority of writing systems in current use)
- Supplementary-plane characters that have been defined by the Unicode Consortium (Supplementary-plane characters are supported using surrogate pairs.)
- End-user modification of code points in the Private Use Area (PUA) of the Unicode standard (FieldWorks provides this support via ICU. SIL corporate PUA characters are supported by default.)
- Smart rendering using Uniscribe and user-definable complex rendering via Graphite
- Keyboard input via standard input method editors and Keyman
- Several different approaches for converting legacy 8-bit data to Unicode (Each writing system be assigned a different encoding converter, which can be defined or replaced at any time.)
Notes
- Do not expect a single font to cover all of Unicode or even all of UTF-16, or the UCS-2 subset. Fonts usually support only some portion of Unicode characters (glyphs).
- SIL fieldworkers typically customized variations of the Windows ANSI character set. They redefined some/many ANSI characters and made custom fonts to match these definitions. These were still limited to 222 characters. A significant drawback to this approach is that most software is designed to assume standardized definitions of characters. This approach, in fact, actually involves redefining Unicode characters within fonts. This approach is highly discouraged in software, such as FieldWorks, which is designed to work with Unicode. Therefore, data encoded in this manner should be converted to Unicode during import into FieldWorks.
Limitations to complex script support in Fieldworks 2.0 and 3.0
- FieldWorks 2.0 only supports left-to-right and right-to-left scripts. There is currently no support for vertical scripts.
- Single-reference fields (e.g., Status, Confidence, Event Type, Place of Birth) do not support Graphite, but they do support Uniscribe rendering.
- Drop-down lists (e.g., Status, Confidence fields, and List Item and Keyword in the List Chooser) do not support Graphite, but do support Uniscribe rendering. However, they only show a single font. So if there are mixtures of more than one language, they may not be readable.
- The List Chooser and Tree View in the Topics List Editor support Graphite, but do not allow multiple heights for items. Consequently, if there is a mixture of items of different heights, some may be clipped.
Information about the latest released version of SIL FieldWorks
