How do I get rid of a Byte Order Mark: Difference between revisions From Online Manual

Jump to: navigation, search
m (double numbering didn't look right; still unsure about whether you can see and remove the mark in ANSI mode or ANSI editor)
mNo edit summary
 
(8 intermediate revisions by 5 users not shown)
Line 1: Line 1:
In some browsers, you see the character '''''' somewhere in your forum, often in the upper left.
On some pages of your forum, you may see the character sequence ''''''.  If it is visible it is usually located on the upper left of a page. Sometimes you may even see multiple instances of this.
This is a UTF-8 "Byte Order Mark". It got into one of your forum files when somebody editted and saved that file while editing in UTF-8 mode
Microsoft editors are often blamed for adding this mark at the beginning of files.


To solve this problem:
This is a UTF-8 "Byte Order Mark" (BOM). It got into one or more of your forum files when somebody edited and saved that file while editing in UTF-8 mode. "UTF-8" is a specific "character encoding" mode, in which a large number of accented and non-Latin (for example, Greek, Cyrillic, CJK, or Arabic) characters may be represented with multibyte sequences. Contrast this with single-byte encoding, such as Latin-1 (ISO-8859-1), in which there is a limit of 128 accented or non-Latin characters.
 
Some editors are often blamed for adding this mark at the beginning of files. While a BOM is ''legal'' in UTF-8 text files, it usually causes problems for most web servers and/or browsers, which do not know how to ignore this marker. Outputting a BOM may cause further problems, as HTTP headers are flushed with the first output to the browser, and if SMF has not finished setting up its desired headers at this time, you will get "cannot send headers" errors.
 
To solve this problem, take the following steps:
* Figure out which files have recently been edited
* Figure out which files have recently been edited
* One by one, open them in ANSI mode, not in UTF-8 mode (the mark may be hidden if you open the file in UTF-8 mode)
* One by one, open them in ANSI (single byte, for example, Latin-1 or CP1252) encoding mode (the mark may be hidden if you open the file in UTF-8 mode).
* Each time you find a file that begins with this character,
* Each time you find a file that begins with this character, try the following:
*# If your editor has an option to save the file without the Byte Order Mark, use this option to save the file. In many cases, this will fix the problem.
*# If your editor has an option to save the file without the Byte Order Mark, use this option to save the file. In many cases, this will fix the problem.
*# If the above option does not work, edit in ANSI-only mode, remove the mark, and save.
*# If the above option does not work, edit in ANSI-only mode, remove the mark (delete the three characters), and save.
*# If you need to edit the file in UTF-8  mode, use an editor which will allow you to "save without Byte Order Mark"
*# If you need to edit the file in UTF-8  mode, use an editor which will allow you to "save without Byte Order Mark".
 
You can also take a look at [[File check.php - What is it and what does it do|File Check.php]] to show which files have a BOM.
 
Note that the BOM is three specific bytes (xEF xBB xBF). If your system should somehow change bit ordering or byte ordering, it may produce different characters and/or show them in a different order. Also, the specific characters displayed in the browser or editor depend on the encoding in use. '''''' is what you will see for Latin-1 or CP1252, but it may be different for other character sets. It all depends on which glyphs are assigned to those three byte codes.
[[Category:FAQ]]
[[Category:FAQ]]

Latest revision as of 12:37, 2 September 2015

On some pages of your forum, you may see the character sequence . If it is visible it is usually located on the upper left of a page. Sometimes you may even see multiple instances of this.

This is a UTF-8 "Byte Order Mark" (BOM). It got into one or more of your forum files when somebody edited and saved that file while editing in UTF-8 mode. "UTF-8" is a specific "character encoding" mode, in which a large number of accented and non-Latin (for example, Greek, Cyrillic, CJK, or Arabic) characters may be represented with multibyte sequences. Contrast this with single-byte encoding, such as Latin-1 (ISO-8859-1), in which there is a limit of 128 accented or non-Latin characters.

Some editors are often blamed for adding this mark at the beginning of files. While a BOM is legal in UTF-8 text files, it usually causes problems for most web servers and/or browsers, which do not know how to ignore this marker. Outputting a BOM may cause further problems, as HTTP headers are flushed with the first output to the browser, and if SMF has not finished setting up its desired headers at this time, you will get "cannot send headers" errors.

To solve this problem, take the following steps:

  • Figure out which files have recently been edited
  • One by one, open them in ANSI (single byte, for example, Latin-1 or CP1252) encoding mode (the mark may be hidden if you open the file in UTF-8 mode).
  • Each time you find a file that begins with this character, try the following:
    1. If your editor has an option to save the file without the Byte Order Mark, use this option to save the file. In many cases, this will fix the problem.
    2. If the above option does not work, edit in ANSI-only mode, remove the mark (delete the three characters), and save.
    3. If you need to edit the file in UTF-8 mode, use an editor which will allow you to "save without Byte Order Mark".

You can also take a look at File Check.php to show which files have a BOM.

Note that the BOM is three specific bytes (xEF xBB xBF). If your system should somehow change bit ordering or byte ordering, it may produce different characters and/or show them in a different order. Also, the specific characters displayed in the browser or editor depend on the encoding in use.  is what you will see for Latin-1 or CP1252, but it may be different for other character sets. It all depends on which glyphs are assigned to those three byte codes.



Advertisement: