close Warning: Can't synchronize with repository "(default)" ("(default)" is not readable or not a Git repository.). Look in the Trac log for more information.

Opened 12 years ago

Closed 12 years ago

#58 closed defect (fixed)

UTF-16 buffering problem

Reported by: Raymond Toy Owned by:
Priority: major Milestone:
Component: Unicode Version: 2012-04
Keywords: Cc:


The following code should not cause errors:

(with-open-file (s "test.txt" :direction :output :external-format :utf-16)
  (dotimes (i 300)
    (write-char (code-char i) s)))

(with-open-file (s "test.txt" :direction :input :external-format :utf-16)
  (dotimes (i 300)
    (let ((ch (read-char s nil nil)))
      (unless (= i (char-code ch))
	(format t "Error at ~D: ~S, ~4X~%" i ch (char-code ch))))))

Change History (2)

comment:1 Changed 12 years ago by Raymond Toy

The issue is caused by the BOM (byte-order mark) that is inserted in the test file. This is ok, but when reading the file back in, the fast stream buffering code is confused because for all intents and purposes the BOM doesn't exist. But the buffering code needs to know that the BOM was there so that the internal buffers can be updated correctly.

The easiest solution is to disable the fast buffering code for utf16 and utf32. The BOM is not used for other encodings.

comment:2 Changed 12 years ago by toy.raymond@…

Resolution: fixed
Status: newclosed

commit f3db74d49bf24c108053873f06905dbb2ed3cebd Author: Raymond Toy <toy.raymond@…> Date: Wed Apr 18 23:53:31 2012 -0700

Fix ticket:58. Handle the BOM character for utf-16 and utf-32. This is a bit of a hack.

  • src/code/stream.lisp:
    • Check the state to see if a BOM was read. This critically depends on knowing the format of the state variable for utf16 and utf32 formats, but the stream code shouldn't have to know the state internals.

  • src/general-info/release-20d.txt
    • Update.
Note: See TracTickets for help on using tickets.