Next: , Previous: , Up: Dictionary for Unicode Concepts   [Contents][Index]


13.3.3 Strings

Strings in CMUCL are UTF-16 strings. That is, for Unicode code points greater than 65535, surrogate pairs are used. We refer the reader to the Unicode standard for more information about surrogate pairs. We just want to make a note that because of the UTF-16 encoding of strings, there is a distinction between Lisp characters and Unicode codepoints. The standard string operations know about this encoding and handle the surrogate pairs correctly.

Function: string-upcase string &key :start :end :casing
Function: string-downcase string &key :start :end :casing
Function: string-capitalize string &key :start :end :casing

The case of the string is changed appropriately. Surrogate pairs are handled correctly. The conversion to the appropriate case is done based on the Unicode conversion. The additional argument :casing controls how case conversion is done. The default value is :simple, which uses simple Unicode case conversion. If :casing is :full, then full Unicode case conversion is done where the string may actually increase in length.

Function: nstring-upcase string &key :start :end
Function: nstring-downcase string &key :start :end
Function: nstring-capitalize string &key :start :end

The case of the string is changed appropriately. Surrogate pairs are handled correctly. The conversion to the appropriate case is done based on the Unicode conversion. (Full casing is not available because the string length cannot be increased when needed.)

Function: string= s1 s2 &key :start1 :end1 :start2 :end2
Function: string/= s1 s2 &key :start1 :end1 :start2 :end2
Function: string< s1 s2 &key :start1 :end1 :start2 :end2
Function: string> s1 s2 &key :start1 :end1 :start2 :end2
Function: string<= s1 s2 &key :start1 :end1 :start2 :end2
Function: string>= s1 s2 &key :start1 :end1 :start2 :end2

The string comparison is done in codepoint order. (This is different from just comparing the order of the individual characters due to surrogate pairs.) Unicode collation is not done.

Function: string-equal s1 s2 &key :start1 :end1 :start2 :end2
Function: string-not-equal s1 s2 &key :start1 :end1 :start2 :end2
Function: string-lessp s1 s2 &key :start1 :end1 :start2 :end2
Function: string-greaterp s1 s2 &key :start1 :end1 :start2 :end2
Function: string-not-greaterp s1 s2 &key :start1 :end1 :start2 :end2
Function: string-not-lessp s1 s2 &key :start1 :end1 :start2 :end2

Each codepoint in each string is converted to lowercase and the appropriate comparison of the codepoint values is done. Unicode collation is not done.

Function: string-left-trim bag string
Function: string-right-trim bag string
Function: string-trim bag string

Removes any characters in bag from the left, right, or both ends of the string string, respectively. This has potential problems if you want to remove a surrogate character from the string, since a single character cannot represent a surrogate. As an extension, if bag is a string, we properly handle surrogate characters in the bag.


Next: , Previous: , Up: Dictionary for Unicode Concepts   [Contents][Index]