Murphy's Law and the deadly unicode  U+A0 character

16th May 2023

We were trying to recover a table in a database using table point in time recovery and unfortunately there was a hidden unicode character in the command string. The character was u+A0 which is in effect a white box. the command was of the type RECOVER TABLE SCHEMA.TABLE_NAMEU+AO until TIME . As per the oracle documentation "https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Database-Object-Names-and-Qualifiers.html#GUID-75337742-67F[…]-985F-741C93D918DA"  Nonquoted identifiers can only contain alphanumeric characters from your database character set and the underscore (_), dollar sign ($), and pound sign (#). So according to me this should have been flagged as an error during parsing and the recover should not have kicked off. But it did , ran for 8 hours restored the auxiliary instance and then failed spectacularly. Oracle support says that the unicode character is a valid character here but according to what I read in the docs it isn't and hence this should be a bug. The database character set is UTF-8 and this character which is non breaking space falls under latin-1 supplement which hold a lot of non alphanumeric stuff including punctuation .https://en.wikipedia.org/wiki/Latin-1_Supplement (edited) 

Wikipedia

Latin-1 Supplement

The Latin-1 Supplement (also called C1 Controls and Latin-1 Supplement) is the second Unicode block in the Unicode standard. It encodes the upper range of ISO 8859-1: 80 (U+0080) - FF (U+00FF). C1 Controls (0080–009F) are not graphic. This block ranges from U+0080 to U+00FF, contains 128 characters and includes the C1 controls, Latin-1 punctuation and symbols, 30 pairs of majuscule and minuscule accented Latin characters and 2 mathematical operators.

The C1 controls and Latin-1 Supplement block has been included in its present form, with the same character repertoire since version 1.0 of the Unicode Standard. Its block name in Unicode 1.0 was simply Latin1.


https://www.soscisurvey.de/tools/view-chars.php this site can be used to see unicode characters and was used by Oracle support to show that the unicode non breaking space was present in the rman command string.

As  per the oracle documentation unless you use quotation marks the unicode character U+AO should not be a valid character within a table name.

Nonquoted identifiers can only contain alphanumeric characters from your database character set and the underscore (_), dollar sign ($), and pound sign (#).