NLS: update handling of Unicode
This patch (as1239) updates the kernel's treatment of Unicode. The
character-set conversion routines are well behind the current state of
the Unicode specification: They don't recognize the existence of code
points beyond plane 0 or of surrogate pairs in the UTF-16 encoding.
The old wchar_t 16-bit type is retained because it's still used in
lots of places. This shouldn't cause any new problems; if a
conversion now results in an invalid 16-bit code then before it must
have yielded an undefined code.
Difficult-to-read names like "utf_mbstowcs" are replaced with more
transparent names like "utf8s_to_utf16s" and the ordering of the
parameters is rationalized (buffer lengths come immediate after the
pointers they refer to, and the inputs precede the outputs).
Fortunately the low-level conversion routines are used in only a few
places; the interfaces to the higher-level uni2char and char2uni
methods have been left unchanged.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
diff --git a/fs/ncpfs/ncplib_kernel.c b/fs/ncpfs/ncplib_kernel.c
index 97645f1..0ec6237 100644
--- a/fs/ncpfs/ncplib_kernel.c
+++ b/fs/ncpfs/ncplib_kernel.c
@@ -1113,11 +1113,13 @@
if (NCP_IS_FLAG(server, NCP_FLAG_UTF8)) {
int k;
+ unicode_t u;
- k = utf8_mbtowc(&ec, iname, iname_end - iname);
- if (k < 0)
+ k = utf8_to_utf32(iname, iname_end - iname, &u);
+ if (k < 0 || u > MAX_WCHAR_T)
return -EINVAL;
iname += k;
+ ec = u;
} else {
if (*iname == NCP_ESC) {
int k;
@@ -1214,7 +1216,7 @@
if (NCP_IS_FLAG(server, NCP_FLAG_UTF8)) {
int k;
- k = utf8_wctomb(iname, ec, iname_end - iname);
+ k = utf32_to_utf8(ec, iname, iname_end - iname);
if (k < 0) {
err = -ENAMETOOLONG;
goto quit;