I have input data from a flat file in which a column contains English, Japanese, Chinese characters. I staging these values In the table column whose schema definition is VARCHAR2 (250CRAR), definition in main table column is VARCHAR2 (250) which I can not change. Therefore, I am doing a SUBSTR on this column. When I loaded the table, I
the table from SELECT *
... I found this error:
ORA-29275: Partial multibyte character
If I select other columns then there is no problem.
250 byte
column , "Text"> you should use SUBSTRB
. This function will output only full characters (you will not get the full Unicode character):
SQL & gt; Substab ('中华 人', 1, 9) ch9, 2 substrb ('中华 人', 1, 8) ch8, 3 substruments ('中华 人', 1, 7) ch7, 4 substrb ('中华 人', of Select 1, 6) C6, 5 substrate ('中华 人', 1, 5) Chrome 6 6 from dual; CH9 CH8 CH7 CH 6 CH 5 --------- -------- ------- ------ ----- 中华 人 中华 中华 中华中
Edit:
An interesting comment related to the actual length of the resulting string, and as a result, can be an invalid order of bytes in the string. Consider the following on an AL32UTF8 DB:
SQL & gt; Choose double length ('ÏÏÏ'), 2 length (substrate ('ÏÏÏÏÏÏ', 1, 5)), 3 dump ('ÏÏÏ'), 4 dump (substrate ('ÏÏÏÏÏÏ', 1, 5)) 5; Le le dump ('ÏÏÏ') dump (SUBBR ('ÏÏÏÏÏÏ', 1,5)) - - ------------------------- - ---------- ------------------------------- 6 5 types = 96 lanes = 6: 195,143,195,143,195,143 type = 1 lane = 5: 195,143,195,143,32
As you can see the last byte of the substrb
string, it is not special character but a valid character (In this character set, the first 128 characters are similar to the ASCIIIUs character set), it uses the recommended RTRIM in another answer, encodes the
In addition, I have found this letter that AL16UTF16:
sql> Length (n ''), 2 dump (n '') dump, 3 length (substrate (n '' ', 1, 3)) length_substust, 4 dump (substrate (n' '', 1, 3 Select))) Dump_substr 5 with double; LE DUMP LENGTH_SUBSTR DUMP_SUBSTR ---------- ----------------------- ------------- ----------------- 4 types = 96 lanes = 4: 1,8,1,8 2 types = 1 lane = 2: 1,8
In this case Oracle has chosen to cut the string after the second byte because there is no legal one-byte character in the AL16UTF16 character set. As a result, there are only 2 bytes instead of string 3.
It will require more testing and no rigorous performance but I am still standing with my first humps that substrb
will return a valid order of bytes which is valid for a character String encodes.
No comments:
Post a Comment