Ticket #58 (closed defect: fixed)

Opened 2 years ago

Last modified 2 years ago

UTF-16 buffering problem

Reported by: rtoy Owned by:
Priority: major Milestone:
Component: Unicode Version: 2012-04
Keywords: Cc:


The following code should not cause errors:

(with-open-file (s "test.txt" :direction :output :external-format :utf-16)
  (dotimes (i 300)
    (write-char (code-char i) s)))

(with-open-file (s "test.txt" :direction :input :external-format :utf-16)
  (dotimes (i 300)
    (let ((ch (read-char s nil nil)))
      (unless (= i (char-code ch))
	(format t "Error at ~D: ~S, ~4X~%" i ch (char-code ch))))))

Change History

Changed 2 years ago by rtoy

The issue is caused by the BOM (byte-order mark) that is inserted in the test file. This is ok, but when reading the file back in, the fast stream buffering code is confused because for all intents and purposes the BOM doesn't exist. But the buffering code needs to know that the BOM was there so that the internal buffers can be updated correctly.

The easiest solution is to disable the fast buffering code for utf16 and utf32. The BOM is not used for other encodings.

Changed 2 years ago by toy.raymond@…

  • status changed from new to closed
  • resolution set to fixed

commit f3db74d49bf24c108053873f06905dbb2ed3cebd Author: Raymond Toy <toy.raymond@…> Date: Wed Apr 18 23:53:31 2012 -0700

Fix ticket:58. Handle the BOM character for utf-16 and utf-32. This is a bit of a hack.

  • src/code/stream.lisp:
    • Check the state to see if a BOM was read. This critically depends on knowing the format of the state variable for utf16 and utf32 formats, but the stream code shouldn't have to know the state internals.

  • src/general-info/release-20d.txt
    • Update.
Note: See TracTickets for help on using tickets.