TNG charset

From TNG_Wiki
Jump to navigation Jump to search


TNG version: 11.0.0

Starting with TNG V11 a full install of TNG now defaults to charset UTF-8. If you are installing TNG for the first time, you should strongly consider keeping the default charset.

Two mods are known not to render accented characters correctly in an ISO-8859-? environment:

We may have reached the point where it makes more sense to convert latin1-encorded databases to utf8-encoded data, especially since:

  • PHP 5.6 appears to force UTF8 encoding for display. See then User Contributed Note at the bottom of Migration to PHP 5.6
  • JQuery internally uses encodeURIComponent where all data is transferred in UTF-8, therefore requiring a charset encoding conversion
  • MySql does another conversion for all passed data when the database collation sequence is not utf8_xxxxxxx_ci
  • some TNG language files are only available in UTF8 (Arabic, Greek and Russian)


TNG version: 10.1.3
TNG version: 8.0

The TNG default charset as defined in the readme.html for TNG V8 is still ISO-8859-1, however TNG now allows you to change it to UTF-8 by changing the English language to English-UTF8.

Should you use UTF-8 or ISO-8859-1?

In deciding which character encoding to use, you need to consider that 3 components must be in synch for it to work correctly:

  • database
  • TNG charset
  • gedcom

See Considerations_for_using_UTF-8

ISO-8859-1

The TNG default charset is defined in Setup > General Settings > Language which generates the config.php file entry for the $charset variable. TNG provides the config.php file with the default set to $charset = "UTF-8" in TNG V11 and later based on the readme.html install while it was set to $charset = "ISO-8859-1" in prior TNG versions.


UTF-8

TNG version: 11.0.0

Starting with TNG V11, the default charset on a full install is set to UTF-8.


TNG version: 10.1.3
TNG version: 8.0

In TNG v8, an enhancement was provided to the readme.html that allowed the user to specify UTF-8 or ISO-8859-1 in the setup so that an update of the config.php was made before you went to your admin menu using admin/index.php.


TNG version: 7.1.3
TNG version: 7.0

In TNG V7, if you want to setup UTF-8, you must manually edit your config.php and change the value of $charset to $charset = "UTF-8"; before you access the Admin Menu.

If you access your Admin Menu screen with the TNG provided config file before making the change to UTF-8, the default of $charset = "ISO-8859-1"; will be in effect. Therefore when you make the change from ISO-8859-1 to UTF-8, you will need to clear cookies and cache, or you may try switching languages a couple of times to see if there charset= is changed when you view source for your pages.

Pros

There are some pros to setting up UTF-8 when you initially install TNG:

  • it is easier to do it at initial setup time
  • MySQL 5 uses UTF-8 for its database encoding, although most ISP still default the new database creation to latin1
  • charset conversion between MySQL, AJAX, and other technical components are avoided

Cons

There are some cons to currently setting up UTF-8:


  • not all users especially Windows will know they need to save their cust_text.php file as UTF-8 encoded without a byte order marker (BOM)
  • Windows does not have a good text editor like the built-in Macintosh editor or Text Wrangler that allows the user to key accented characters within the editor.

If you are using the English version of Windows, you can select the US-International keyboard which allows keying accented characters.

  • not all desktop genealogy software allow a UTF-8 encoded export (we will need to identify which ones do and which ones do not)
RootsMagic3 exports in ANSI only
RootsMagic4+ exports in UTF-8 only
Legacy Family Tree exports in ANSEL, ANSI and UTF-8
PAF 5 exports in ANSEL, ANSI and UTF-8

Note that an ANSEL export that contains accented characters will not render correctly when imported in TNG

However, in the long term using UTF-8 is certainly the way to go. In TNG V11, it should be the only way to go if you are using accented characters in person or place names.

MySQL charset and collating sequence

If you are deciding what collation to use for your database, please read this page Selecting your Database Collation for TNG.

The MySQL Character Set Support article from the MySQL Language Reference explains what character sets and collations are, along with the multiple-level default system and the meaning of each individual character set and collation.

Related Links