Strings in CMUCL are UTF-16 strings. That is, for Unicode code points greater than 65535, surrogate pairs are used. We refer the reader to the Unicode standard for more information about surrogate pairs. We just want to make a note that because of the UTF-16 encoding of strings, there is a distinction between Lisp characters and Unicode codepoints. The standard string operations know about this encoding and handle the surrogate pairs correctly.
:start
:end
:casing
¶:start
:end
:casing
¶The case of the string is changed appropriately. Surrogate
pairs are handled correctly. The conversion to the appropriate case
is done based on the Unicode conversion. The additional argument
:casing
controls how case conversion is done. The default
value is :simple
, which uses simple Unicode case conversion,
which is equivalent to the same function in the COMMON-LISP
package.
If :casing
is :full
, then full Unicode case conversion is
done where the string may actually increase in length.
:start
¶:end
:casing
:unicode-word-break
Given a string, returns a copy of the string with the first character
of each “word” converted to upper-case, and remaining characters in
the word converted to lower case. The value of :casing
is
:simple
, :full
or :title
for simple, full or title case
conversion, respectively. The default value for :casing
is
:title
. If :unicode-word-break
is non-Nil,
then the Unicode word-breaking algorithm is used to determine the word
boundaries. Otherwise, a “word” is defined to be a string of
case-modifiable characters delimited by non-case-modifiable chars.
The default for :unicode-word-break
is T
.
:start
:end
¶:start
:end
¶:start
:end
¶The case of the string is changed appropriately. Surrogate pairs are handled correctly. The conversion to the appropriate case is done based on the Unicode conversion. (Full casing is not available because the string length cannot be increased when needed.)
:start1
:end1
:start2
:end2
¶:start1
:end1
:start2
:end2
¶:start1
:end1
:start2
:end2
¶:start1
:end1
:start2
:end2
¶:start1
:end1
:start2
:end2
¶:start1
:end1
:start2
:end2
¶The string comparison is done in codepoint order. (This is different from just comparing the order of the individual characters due to surrogate pairs.) Unicode collation is not done.
:start1
:end1
:start2
:end2
¶:start1
:end1
:start2
:end2
¶:start1
:end1
:start2
:end2
¶:start1
:end1
:start2
:end2
¶:start1
:end1
:start2
:end2
¶:start1
:end1
:start2
:end2
¶Each codepoint in each string is converted to lowercase and the appropriate comparison of the codepoint values is done. Unicode collation is not done.
Removes any characters in bag
from the left, right, or both
ends of the string string
, respectively. This has potential
problems if you want to remove a surrogate character from the
string, since a single character cannot represent a surrogate. As
an extension, if bag
is a string, we properly handle
surrogate characters in the bag
.