Saturday, February 21, 2015

NVARCHAR2, UTL-16 and Emails

Development is often the case of trying several paths through the forest, hoping to find one that leads out the other end. That was the start of my week.

Until we get our shiny new 12c database running on its shiny new box (and all the data shifted to it), we are living with a mix of databases. To begin with, the data we managed was mostly AU/NZ and Europeans stuff, and the character set is set accordingly. By which I mean one of those Eurocentric things and not UTF-8. We also have a bunch of columns in NVARCHAR2 with AL16UTF16 as the alternative character set.

I'm pretty sure the new database will start with UTF-8. But in the mean time I was responsible for trying to get emails out of the current database with data in various European and non-European character sets.  My paths through that forest went as follows...

  • It should just work. Let me test it.....Oh bugger.
  • Okay, maybe if I put "utf-8" in various bits of the message.
  • And switch the code so it uses NVARCHAR2 rather than defaulting to VARCHAR2.
  • Oh....UTF-16 isn't the same as UTF-8. I need to convert it somehow
  • So I can't put UTF-8 values in either my Eurocentric VARCHAR2 or UTF-16 NVARCHAR2.
  • And I have to get this through SMTP, where you can still see the exposed bones of 7-bit ASCII, 


AHA ! HTML Entities. That means I can get away with using ASCIISTR to convert the UTF-16 strings into a sequence of Hex values for each two-byte character. Then I stick a &#x in front of each character, and I have an HTML representation of the string !

It stinks of an ugly solution.
I think there should be a way of sending utf-16 in the content, but I couldn't get to it.

It doesn't help that email HTML is less capable than browser HTML, and has to support a variety of older clients (plus presenting an HTML email body inside of the HTML of a webmail client is always going to be awkward).