Saturday, April 04, 2015

A world of confusion

It has got to the stage where I often don't even know what day it is. No, not premature senility (although some may disagree). But time zones.

Mostly I've had it fairly easy in my career. When I worked in the UK, I just had the one time zone to work with. The only time things got complicated was when I was working at one of the power generation companies, and we had to make provision for the 23-hour and 25-hour days that go with Daylight Savings.

And in Australia we only have a handful of timezones, and when I start and finish work, it is the same day for any part of Australia. I did work on one system where the database clock was set to UTC, but dates weren't important on that application.

Now it is different. I'm dealing with events that happen all over the world. Again the database clock is UTC, with the odd effect that TRUNC(SYSDATE) 'flips over' around lunchtime. Now when I want to look at 'recent' entries (eg a log table) I've got into the habit of asking WHERE LOG_DATE > SYSDATE - INTERVAL '9' HOUR

And we also have columns that are TIMESTAMP WITH TIMEZONE. So I'm getting into the habit of selecting COL_TS AT TIME ZONE DBTIMEZONE . I could use sessiontimezone, but then the time component of DATE columns would be inconsistent.  This becomes just a little more confusing this time of year as various places slip in and out of Daylight Savings.

Now things are getting even more complicated for me.

Again, during my career, I've been lucky enough to be pretty oblivious to character set issues. Most things have squeezed in to my databases without any significant trouble. Occasionally I've had to look for some accented characters in people's names, but that's been it.

In the past few months, I've been working with some European data where the issues have been more pronounced. Aside from a few issues in emails, I've been coping quite well (with a lot of help from Google Translate). 

Now I get to work with some Japanese data. And things get complicated.

"The modern Japanese writing system is a combination of two character types: logographic kanji, which are adopted Chinese characters, and syllabic kana. Kana itself consists of a pair of syllabarieshiragana, used for native or naturalised Japanese words and grammatical elements, and katakana, used for foreign words and names, loanwordsonomatopoeia, scientific names, and sometimes for emphasis. Almost all Japanese sentences contain a mixture of kanji and kana. Because of this mixture of scripts, in addition to a large inventory of kanji characters, the Japanese writing system is often considered to be the most complicated in use anywhere in the world.[1][2]"Japanese writing system

Firstly I hit katakana. With some tables, I can get syllables corresponding to the characters and work out something that I can eyeball and match up to some English data. As an extra complication, there are also half-width characters which are semantically equivalent but occupy different codepoints in Unicode. That has parallels to upper/lower case in English, but is a modern development that came about from trying to fit the previously squarish forms into print, typewriters and computer screens.

Kanji is a different order of shock. Primary school children in Japan learn the first 1000 or so characters. Another thousand plus get taught in high school. The character set is significantly larger in total.

I will have to see if the next few months cause my head to explode. In the mean time, I can recommend reading this article about the politics involved in getting characters (glyphs ? letters ?) into Unicode.  I Can Text You A Pile of Poo, But I Can’t Write My Name

Oh, and I'm still trying to find the most useful character/font set I can have on my PC and  use practically in SQL Developer. My current choice shows the Japanese characters when I click in the field in the dataset, but only little rectangles when I'm not in the field. The only one I've found that does show up all the time is really UGLY. 

No comments: