How do I get rid of a Byte Order Mark From Online Manual

Revision as of 12:26, 29 March 2012 by Illori (talk | contribs)
Jump to: navigation, search

On some web pages, you may see the character sequence  somewhere in your forum, often in the upper left (the very first thing output to the browser). Sometimes you may even see multiple instances of this. This is a UTF-8 "Byte Order Mark" (BOM). It got into one or more of your forum files when somebody edited and saved that file while editing in UTF-8 mode. "UTF-8" is a specific "character encoding" mode, where a huge number of accented and non-Latin (e.g., Greek, Cyrillic, CJK, Arabic, etc.) characters may be represented with multibyte sequences. Contrast this with a single-byte encoding such as Latin-1 (ISO-8859-1), where there is a limit of 128 accented or non-Latin characters.

Microsoft editors are often blamed for adding this mark at the beginning of files. While a BOM is legal in UTF-8 text files, it usually causes problems for most Web servers and/or browsers, which don't know to ignore this marker. Outputting a BOM may cause further problems, as HTTP headers are flushed with the first output to the browser, and if SMF hasn't finished setting up its desired headers at this time, you will get "cannot send headers" errors.

To solve this problem:

  • Figure out which files have recently been edited
  • One by one, open them in ANSI (single byte, e.g., Latin-1 or CP1252) encoding mode, not in UTF-8 mode (the mark may be hidden if you open the file in UTF-8 mode)
  • Each time you find a file that begins with this character,
    1. If your editor has an option to save the file without the Byte Order Mark, use this option to save the file. In many cases, this will fix the problem
    2. If the above option does not work, edit in ANSI-only mode, remove the mark (delete the three characters), and save
    3. If you need to edit the file in UTF-8 mode, use an editor which will allow you to "save without Byte Order Mark"

You can also take a look at File Check.php to show which files have a BOM.

N.B., the BOM is three specific bytes (xEF xBB xBF). If your system should somehow change bit ordering or byte ordering, it may produce different characters and/or be in a different order. Also, the specific characters displayed in browser or editor depend on the encoding in use.  is what you'll see for Latin-1 or CP1252, but may be different for other character sets. It all depends on what glyphs are assigned to those three byte codes.



Advertisement: