Opened 2 years ago

Closed 2 years ago

#58 closed defect (fixed)

UTF-16 buffering problem

Reported by: rtoy Owned by:
Priority: major Milestone:
Component: Unicode Version: 2012-04
Keywords: Cc:

Description

The following code should not cause errors:

(with-open-file (s "test.txt" :direction :output :external-format :utf-16)
  (dotimes (i 300)
    (write-char (code-char i) s)))

(with-open-file (s "test.txt" :direction :input :external-format :utf-16)
  (dotimes (i 300)
    (let ((ch (read-char s nil nil)))
      (unless (= i (char-code ch))
	(format t "Error at ~D: ~S, ~4X~%" i ch (char-code ch))))))

Change History (2)

comment:1 Changed 2 years ago by rtoy

The issue is caused by the BOM (byte-order mark) that is inserted in the test file. This is ok, but when reading the file back in, the fast stream buffering code is confused because for all intents and purposes the BOM doesn't exist. But the buffering code needs to know that the BOM was there so that the internal buffers can be updated correctly.

The easiest solution is to disable the fast buffering code for utf16 and utf32. The BOM is not used for other encodings.

comment:2 Changed 2 years ago by toy.raymond@…

  • Resolution set to fixed
  • Status changed from new to closed

commit f3db74d49bf24c108053873f06905dbb2ed3cebd Author: Raymond Toy <toy.raymond@…> Date: Wed Apr 18 23:53:31 2012 -0700

Fix ticket:58. Handle the BOM character for utf-16 and utf-32. This is a bit of a hack.

  • src/code/stream.lisp:
    • Check the state to see if a BOM was read. This critically depends on knowing the format of the state variable for utf16 and utf32 formats, but the stream code shouldn't have to know the state internals.

  • src/general-info/release-20d.txt
    • Update.
Note: See TracTickets for help on using tickets.