UTF-8 Readme: Difference between revisions From Online Manual

Jump to: navigation, search
(→‎What's new in SMF 2.0.x with regard to UTF-8?: Is this a more clear explanation UTF-8 for SMF 2.0.x?)
Line 2: Line 2:


==What's new in SMF 2.0.x with regard to UTF-8?==
==What's new in SMF 2.0.x with regard to UTF-8?==
SMF has always supported multiple character sets. Each language package was written in a specific character set. Nothing has changed to the support of those character sets, but additional support for UTF-8 has been added. It is possible to convert your forum to UTF-8 or (in case of a new forum) install with UTF-8 support. If you have your forum in UTF-8 mode, both the database and website will be using UTF-8.
With version 2.0, SMF introduced full UTF-8 character set support. SMF 1.1.x supported ISO-8859 character sets, with limited support for non-ISO character sets for some languages.  With SMF 2.0 comes the option to run the forum with UTF-8 or without.  If you install a new SMF 2.0.x forum with UTF-8, or if you upgrade an existing SMF 2.0.x forum to UTF-8, then all posts will be stored on the database using the UTF-8 character set, and every web page will inform the browser it is using the UTF-8 character set.  For each language pack you decide to install for your forum, it will be necessary to choose the UTF-8 version of that character set.


The following character sets are currently used for SMF's language packages (both 1.1.x and 2.0.x):
If you choose not to use UTF-8 for your SMF 2.0.x forum, then you must choose the non-UTF-8 versions of all language packs you install with your forum. 


If you choose the wrong character set for any of your language packs, you will see some of what users often describe as "garbage characters" on the screen.  The solution is to use only the correct character-set version of the language packs you have chosen for your forum.
==Character sets available for 1.1.x==
These character sets are also available for SMF 2.0.x
<table class="bbc_table" width="100%"><tr><td width="30%">''' Character set '''</td><td>'''Language'''</td></tr><tr><td> big5</td><td>Chinese (traditional)</td></tr><tr><td> gbk</td><td>Chinese (simplified)</td></tr>
<table class="bbc_table" width="100%"><tr><td width="30%">''' Character set '''</td><td>'''Language'''</td></tr><tr><td> big5</td><td>Chinese (traditional)</td></tr><tr><td> gbk</td><td>Chinese (simplified)</td></tr>
<tr><td> ISO-8859-1</td><td>Albanian, Brazilian, Catalan, Danish, Dutch, English, Finnish, French, German, Portuguese, Norwegian, Spanish, Swedish, Italian, Indonesian, Malay, Galician</td></tr>
<tr><td> ISO-8859-1</td><td>Albanian, Brazilian, Catalan, Danish, Dutch, English, Finnish, French, German, Portuguese, Norwegian, Spanish, Swedish, Italian, Indonesian, Malay, Galician</td></tr>

Revision as of 12:20, 28 August 2012

UTF-8 is an encoding standard that can represent all Unicode characters. This allows it to show almost any writing system in the world.

What's new in SMF 2.0.x with regard to UTF-8?

With version 2.0, SMF introduced full UTF-8 character set support. SMF 1.1.x supported ISO-8859 character sets, with limited support for non-ISO character sets for some languages. With SMF 2.0 comes the option to run the forum with UTF-8 or without. If you install a new SMF 2.0.x forum with UTF-8, or if you upgrade an existing SMF 2.0.x forum to UTF-8, then all posts will be stored on the database using the UTF-8 character set, and every web page will inform the browser it is using the UTF-8 character set. For each language pack you decide to install for your forum, it will be necessary to choose the UTF-8 version of that character set.

If you choose not to use UTF-8 for your SMF 2.0.x forum, then you must choose the non-UTF-8 versions of all language packs you install with your forum.

If you choose the wrong character set for any of your language packs, you will see some of what users often describe as "garbage characters" on the screen. The solution is to use only the correct character-set version of the language packs you have chosen for your forum.

Character sets available for 1.1.x

These character sets are also available for SMF 2.0.x

Character set Language
big5Chinese (traditional)
gbkChinese (simplified)
ISO-8859-1Albanian, Brazilian, Catalan, Danish, Dutch, English, Finnish, French, German, Portuguese, Norwegian, Spanish, Swedish, Italian, Indonesian, Malay, Galician
ISO-8859-2Croatian, Hungarian, Polish, Romanian, Serbian (latin), Slovak, Polish, Czech
ISO-8859-3Esperanto
ISO-8859-5Serbian (cyrilic)
ISO-8859-9Turkish
tis-620Thai
UTF-8Chinese (simplified), Chinese (traditional), Japanese, Persian, Vietnamese, Urdu, Persian, Macedonian, Lithuanian
windows-1256Arabic
windows-1251Bulgarian, Russian, Ukrainian
windows-1253Greek
windows-1255Hebrew

As of SMF 1.1 RC3 you'll be able to also download each of those language packages in UTF-8 character set (Language packs).

Why would I need UTF-8?

There are a few reasons you might need UTF-8:

  • If you want to support multiple languages that use different character sets on your forum. For instance if you want to support both Russian and Turkish, you will need a character set that supports both. UTF-8 is then a logical choice.
  • If the software integrating with SMF uses UTF-8. In some cases such an integration can require character sets to match.
  • If you need better search results or improved sorting. In some cases searching and sorting by the database can be improved by chosing UTF-8 as your character set.

Why would I NOT need UTF-8?

If none of the aboe reasons apply to your forum, UTF-8 would probably not be very useful. Besides, it's a bit slower too.

Also keep in mind that you need at least MySQL 4.1 and SMF 1.1 RC3 to be able to use UTF-8 as default character set if you are using MySQL as your database scheme.

How to convert to UTF-8?'

  • Start with a backup of your database(!) Character set conversions are all but guaranteed to go right.
  • Go to Forum Maintenance > Convert the database and data to UTF-8
  • Select the character set your current data is in. The default setting for this is based on the character set of your default langauge file.
  • After pressing proceed, your database will be converted. Depending on the size of your database, the conversion process might stop temporarily from time to time to avoid overloading the server. If that was successful, your forum should be converted to UTF-8.
  • You'll be needing a new set of language files. All language files need to be UTF-8 compatible. Luckily all language packs for 1.1 RC3 are available for both the original character set and UTF-8, so simply download them and you should be ready to go.
  • Once all the UTF-8 language packs have been installed, convert the language settings of each user by running the following query:
UPDATE smf_members
SET lngfile = CONCAT(lngfile, '-utf8')
WHERE lngfile != ''
  • Also, change the default language in your admin center - Admin > Server Settings
  • Check to see if all your data was properly converted
  • If any of your posts contain HTML entities, you will want to convert those to UTF-8 as well...


Advertisement: