How do I get rid of a Byte Order Mark: Difference between revisions From Online Manual

Jump to: navigation, search
m (fix broken characters by copying UTF-8 from an older version.)
(minor corrections, and elaborations)
Line 1: Line 1:
In some browsers, you see the character '''''' somewhere in your forum, often in the upper left.
On some web pages, you may see the character sequence '''''' somewhere in your forum, often in the upper left (the very first thing output to the browser). Sometimes you may even see multiple instances of this.
This is a UTF-8 "Byte Order Mark". It got into one of your forum files when somebody editted and saved that file while editing in UTF-8 mode
This is a UTF-8 "Byte Order Mark" (BOM). It got into one or more of your forum files when somebody edited and saved that file while editing in UTF-8 mode. "UTF-8" is a specific "character encoding" mode, where a huge number of accented and non-Latin (e.g., Greek, Cyrillic, CJK, Arabic, etc.) characters may be represented with multibyte sequences. Contrast this with a single-byte encoding such as Latin-1 (ISO-8859-1), where there is a limit of 128 accented or non-Latin characters.
Microsoft editors are often blamed for adding this mark at the beginning of files.
 
Microsoft editors are often blamed for adding this mark at the beginning of files. While a BOM is ''legal'' in UTF-8 text files, it usually causes problems for most Web servers and/or browsers, which don't know to ignore this marker. Outputting a BOM may cause further problems, as HTTP headers are flushed with the first output to the browser, and if SMF hasn't finished setting up its desired headers at this time, you will get "cannot send headers" errors.


To solve this problem:
To solve this problem:
* Figure out which files have recently been edited
* Figure out which files have recently been edited
* One by one, open them in ANSI mode, not in UTF-8 mode (the mark may be hidden if you open the file in UTF-8 mode)
* One by one, open them in ANSI (single byte, e.g., Latin-1 or CP1252) encoding mode, not in UTF-8 mode (the mark may be hidden if you open the file in UTF-8 mode)
* Each time you find a file that begins with this character,
* Each time you find a file that begins with this character,
*# If your editor has an option to save the file without the Byte Order Mark, use this option to save the file. In many cases, this will fix the problem. Note that in some cases, this will cause "broken" UTF-8 characters in the file.  You can copy text from an older version of the file to replace the "broken" text.
*# If your editor has an option to save the file without the Byte Order Mark, use this option to save the file. In many cases, this will fix the problem
*# If the above option does not work, edit in ANSI-only mode, remove the mark, and save.
*# If the above option does not work, edit in ANSI-only mode, remove the mark (delete the three characters), and save
*# If you need to edit the file in UTF-8  mode, use an editor which will allow you to "save without Byte Order Mark"
*# If you need to edit the file in UTF-8  mode, use an editor which will allow you to "save without Byte Order Mark"
[[Category:FAQ]]
[[Category:FAQ]]

Revision as of 19:14, 26 March 2012

On some web pages, you may see the character sequence  somewhere in your forum, often in the upper left (the very first thing output to the browser). Sometimes you may even see multiple instances of this. This is a UTF-8 "Byte Order Mark" (BOM). It got into one or more of your forum files when somebody edited and saved that file while editing in UTF-8 mode. "UTF-8" is a specific "character encoding" mode, where a huge number of accented and non-Latin (e.g., Greek, Cyrillic, CJK, Arabic, etc.) characters may be represented with multibyte sequences. Contrast this with a single-byte encoding such as Latin-1 (ISO-8859-1), where there is a limit of 128 accented or non-Latin characters.

Microsoft editors are often blamed for adding this mark at the beginning of files. While a BOM is legal in UTF-8 text files, it usually causes problems for most Web servers and/or browsers, which don't know to ignore this marker. Outputting a BOM may cause further problems, as HTTP headers are flushed with the first output to the browser, and if SMF hasn't finished setting up its desired headers at this time, you will get "cannot send headers" errors.

To solve this problem:

  • Figure out which files have recently been edited
  • One by one, open them in ANSI (single byte, e.g., Latin-1 or CP1252) encoding mode, not in UTF-8 mode (the mark may be hidden if you open the file in UTF-8 mode)
  • Each time you find a file that begins with this character,
    1. If your editor has an option to save the file without the Byte Order Mark, use this option to save the file. In many cases, this will fix the problem
    2. If the above option does not work, edit in ANSI-only mode, remove the mark (delete the three characters), and save
    3. If you need to edit the file in UTF-8 mode, use an editor which will allow you to "save without Byte Order Mark"


Advertisement: