langref: document UTF-8 BOM handling

The current compiler ignores the UTF-8 BOM if it is at the start of the file, and disallows it anywhere else. Document it in the Source Encoding section.
2026-02-21 16:54:52 +00:00 · 2023-01-16 19:14:43 +01:00 · 2023-01-16 19:14:43 +01:00 · 06e9b2c4e3
commit 06e9b2c4e3
parent 7b68b76326
1 changed files with 4 additions and 0 deletions
--- a/doc/langref.html.in
+++ b/doc/langref.html.in
@ -11480,6 +11480,10 @@ fn readU32Be() u32 {}
      but use of hard tabs is discouraged. See {#link|Grammar#}.
      </p>
      <p>
+      For compatibility with other tools, the compiler ignores a UTF-8-encoded byte order mark (U+FEFF)
+      if it is the first Unicode code point in the source text. A byte order mark is not allowed anywhere else in the source.
+      </p>
+      <p>
      Note that running <kbd>zig fmt</kbd> on a source file will implement all recommendations mentioned here.
      Note also that the stage1 compiler does <a href="https://github.com/ziglang/zig/wiki/FAQ#why-does-zig-force-me-to-use-spaces-instead-of-tabs">not yet support CR or HT</a> control characters.
      </p>