Saturday, April 04, 2015

A world of confusion

It has got to the stage where I often don't even know what day it is. No, not premature senility (although some may disagree). But time zones.

Mostly I've had it fairly easy in my career. When I worked in the UK, I just had the one time zone to work with. The only time things got complicated was when I was working at one of the power generation companies, and we had to make provision for the 23-hour and 25-hour days that go with Daylight Savings.

And in Australia we only have a handful of timezones, and when I start and finish work, it is the same day for any part of Australia. I did work on one system where the database clock was set to UTC, but dates weren't important on that application.

Now it is different. I'm dealing with events that happen all over the world. Again the database clock is UTC, with the odd effect that TRUNC(SYSDATE) 'flips over' around lunchtime. Now when I want to look at 'recent' entries (eg a log table) I've got into the habit of asking WHERE LOG_DATE > SYSDATE - INTERVAL '9' HOUR

And we also have columns that are TIMESTAMP WITH TIMEZONE. So I'm getting into the habit of selecting COL_TS AT TIME ZONE DBTIMEZONE . I could use sessiontimezone, but then the time component of DATE columns would be inconsistent.  This becomes just a little more confusing this time of year as various places slip in and out of Daylight Savings.

Now things are getting even more complicated for me.

Again, during my career, I've been lucky enough to be pretty oblivious to character set issues. Most things have squeezed in to my databases without any significant trouble. Occasionally I've had to look for some accented characters in people's names, but that's been it.

In the past few months, I've been working with some European data where the issues have been more pronounced. Aside from a few issues in emails, I've been coping quite well (with a lot of help from Google Translate). 

Now I get to work with some Japanese data. And things get complicated.

"The modern Japanese writing system is a combination of two character types: logographic kanji, which are adopted Chinese characters, and syllabic kana. Kana itself consists of a pair of syllabarieshiragana, used for native or naturalised Japanese words and grammatical elements, and katakana, used for foreign words and names, loanwordsonomatopoeia, scientific names, and sometimes for emphasis. Almost all Japanese sentences contain a mixture of kanji and kana. Because of this mixture of scripts, in addition to a large inventory of kanji characters, the Japanese writing system is often considered to be the most complicated in use anywhere in the world.[1][2]"Japanese writing system

Firstly I hit katakana. With some tables, I can get syllables corresponding to the characters and work out something that I can eyeball and match up to some English data. As an extra complication, there are also half-width characters which are semantically equivalent but occupy different codepoints in Unicode. That has parallels to upper/lower case in English, but is a modern development that came about from trying to fit the previously squarish forms into print, typewriters and computer screens.

Kanji is a different order of shock. Primary school children in Japan learn the first 1000 or so characters. Another thousand plus get taught in high school. The character set is significantly larger in total.

I will have to see if the next few months cause my head to explode. In the mean time, I can recommend reading this article about the politics involved in getting characters (glyphs ? letters ?) into Unicode.  I Can Text You A Pile of Poo, But I Can’t Write My Name

Oh, and I'm still trying to find the most useful character/font set I can have on my PC and  use practically in SQL Developer. My current choice shows the Japanese characters when I click in the field in the dataset, but only little rectangles when I'm not in the field. The only one I've found that does show up all the time is really UGLY. 

Saturday, February 21, 2015

NVARCHAR2, UTL-16 and Emails

Development is often the case of trying several paths through the forest, hoping to find one that leads out the other end. That was the start of my week.

Until we get our shiny new 12c database running on its shiny new box (and all the data shifted to it), we are living with a mix of databases. To begin with, the data we managed was mostly AU/NZ and Europeans stuff, and the character set is set accordingly. By which I mean one of those Eurocentric things and not UTF-8. We also have a bunch of columns in NVARCHAR2 with AL16UTF16 as the alternative character set.

I'm pretty sure the new database will start with UTF-8. But in the mean time I was responsible for trying to get emails out of the current database with data in various European and non-European character sets.  My paths through that forest went as follows...

  • It should just work. Let me test it.....Oh bugger.
  • Okay, maybe if I put "utf-8" in various bits of the message.
  • And switch the code so it uses NVARCHAR2 rather than defaulting to VARCHAR2.
  • Oh....UTF-16 isn't the same as UTF-8. I need to convert it somehow
  • So I can't put UTF-8 values in either my Eurocentric VARCHAR2 or UTF-16 NVARCHAR2.
  • And I have to get this through SMTP, where you can still see the exposed bones of 7-bit ASCII, 


AHA ! HTML Entities. That means I can get away with using ASCIISTR to convert the UTF-16 strings into a sequence of Hex values for each two-byte character. Then I stick a &#x in front of each character, and I have an HTML representation of the string !

It stinks of an ugly solution.
I think there should be a way of sending utf-16 in the content, but I couldn't get to it.

It doesn't help that email HTML is less capable than browser HTML, and has to support a variety of older clients (plus presenting an HTML email body inside of the HTML of a webmail client is always going to be awkward).

Saturday, January 03, 2015

SQL with Friends ?

I'm a regular player of the WordsWithFriends game from Zynga. With some of my regular opponents, we have some side chat. That might be something as simple as letting them know you won't be playing for a few days, or a joke arising from an odd sequence of words.

Recently I'd been sent an URL as a chat message, with a picture from a holiday. It was quite a long URL, with a dubious few characters that may have been the number zero or the letter "O" etc. The chat doesn't allow copying, so rather than trying the variations manually, I took the geek road.

Starting with a Cheeky Monkey post, I learnt that the chat messages were probably in an SQLLite file for the application in a relatively inaccessible 'data/data' location on my Android phone or tablet. 

Stackoverflow told me that I can pull the information from there using the Android debugger's backup command (adb). You may need to install a bunch of stuff, such as an up-to-date Java JDK, to get that running. I'd done that before so it was pretty painless. You also need to enable USB debugging on your device.

Those backups are almost, but not quite, a TAR file. I grabbed a Java tool to convert my backup file into a regular TAR, and then unzipped them with 7-Zip.

It wasn't too hard to find the relevant db file that contained the chat messages. I've got a newer version of the game than the one Cheeky Monkey used, so I had to dig a bit more. My package was called "com.zynga.wwf2.free" rather than the older "com.zynga.words" (but I still had the data from the older version on my phone).

Once I found the right package, the db file was in the db directory as "wf_database.sqlite". sqlite3 was conveniently in the same toolkit as the android debugger.

Back to Stackoverflow for some quick sqlite info and I had a set of CREATE TABLE and INSERT statements.

I could have simply grepped for the URL, but being a database person I couldn't resist a final stage.

A few find/replaces were need to switch the DDL to Oracle syntax (different data type names and Oracle is constrained by the 30 character column names). I then imported the users (players), games, moves and chat messages into my XE database and came up with a query to extract the chat messages and the player who posted it.

I think the chat for a deleted game would be a lot harder to recover. While you don't need to root your device, you will need to enable debugging and authorise the backup and so you need regular access to the device. If you can't get past the lock screen, this won't help.

I mentioned that I still had the data files for the older version of the game. I mistakenly opened these first, and was surprised to find that the user data included email addresses for many entries. None were for my regular opponents, but some were for people I recall playing once or twice. I don't recall many of the users, who may have been people I played as a random pick, or may have been on a 'leaderboard'. The data for the newer version of the game only had the email address for my user. 

My player name (sydoracle) is pretty easy to track back to the "real" me, and I use a unique email address when I sign up to most services. But others might have been more concerned to find the email addresses were being shared, even in a concealed manner. 

Sunday, October 12, 2014

Latest Oracle allows SELECT without SELECT...FOR UPDATE

Digging through a backlog of Oracle blogs, I came across an gem in a presentation from AMIS (on Slideshare). Got to bullet point 5 on slide 63 and boom !

You all know that when you grant SELECT on a table to a user, they can do a SELECT FOR UPDATE, locking records in the table and preventing other updates or deletes. [Some client tools may do that in the background. ]

Well finally Oracle have cottoned on to that too, and there's a lighter-weight "READ" privilege in 12.1.0.2 which won't allow SELECT FOR UPDATE.

This will make DBAs very happy. Actually it won't. The natural state of a DBA is grumpy, at least when in the vicinity of a developer or salesman.


PS. Why would SELECT FOR UPDATE ever be a good idea for a user with no UPDATE privilege ?
If I had to guess, I'd say it went back to a 'pre-read consistency' model when you might use a SELECT FOR UPDATE to try to select data that wasn't being updated.

Sunday, July 20, 2014

Putting my DB / Apex install through the wringer

I was mucking around trying to get APEX on one of my PCs to be visible on the internet.

This was just a proof-of-concept, not something I intend to actually leave running.

EPG on Port 8080

I do other testing on the home network too, so I already had my router configured to forward port 80 to another environment. That meant the router's web admin had been shifted to port 8080, and it wouldn't let me use that. Yes, I should find a open source firmware, but OpenWRT says it is unsupported and will "brick the router" and I can't see anything for Tomato.

So I figured I'd just use any incoming router port and forward it to the PC's 8080. I chose 6000. This was not a good choice. Looks like Chrome comes with a list of ports which it thinks shouldn't be talking http. 6000 is one of them, since it is supposed to be used for X11 traffic so Chrome told me it was unsafe and refused to co-operate.

Since it is a black-list of ports to avoid, I just happened to be unlucky (or stupid) in picking a bad one. Once I selected another, I got past that issue.

My task list was:

Server
  1. Install Oracle XE 11gR2 (Windows 64-bit)
  2. Configure the EPG for Apex. I ran apex_epg_config.sql as, I had switched straight from the pre-installed Apex 4.0 to 4.2.5 rather than upgrading a version I had actively used. 
  3. Unlocked the ANONYMOUS database account
  4. Checked DBMS_XDB.GETHTTPPORT returned 8080 
(At this point, you can test that you have connectivity to apex on the machine on which XE / Apex is installed, through 127.0.0.1 and localhost).

Local Network
  1. Enabled external access by setting DBMS_XDB.SETLISTENERLOCALACCESS(false); 
(Now you can test connectivity from another machine on the same local network through whatever hostname and/or IP address is assigned to that machine, such as 10.x.x.x or 192.168.x.x)

Remote Network
  • I got a handy Dynamic DNS via NoIP because my home IP can potentially change (though it is very rare). [Yes, there was a whole mess about Microsoft temporarily hijackinging some noip domains, but I'm not using this for anything important.] This was an option in my router setup.
  • The machine that runs XE / Apex should be assigned a specific 192.168.1.nnn IP address by the router (based on it's MAC address). This configuration is specific to the router hardware, so I won't go into my details here. But it is essential for the next step.
  • Configure the port forwarding on the router to push incoming traffic on the router's port 8088 off to port 8080 for the IP address of the machine running XE / Apex. This is also router specific. 
When everything is switched on, I can get to my Apex install from outside the local network based on the hostname set up with noip, and the port configured in the router. I used my phone's 3G internet connection to test this. 

Apex Listener

My next step was to use the Apex Listener rather than the EPG. Oracle have actually retagged the Apex Listener as RDS (Restful Data Services) so that search engines can confuse it with Amazon RDS (Relational Database Service).

This one is relatively easy to set up, especially since I stuck with "standalone" mode for this test. 

A colleague had pointed me to this OBE walkthrough on Apex PDF reports via RDS, so I took a spin through that and it all worked seamlessly.

My next step would be a regular web server/container for RDS rather than standalone. I'm tempted to give Jetty a try as the web server and container for the listener rather than Tomcat etc, but the Jetty documentation seems pretty sketchy. I'm used to the thoroughness of the documentation for Apache (as well as Oracle).


Saturday, June 21, 2014

Literally speaking

Reading Scott Wesley's blog from a days ago, and he made a remark about being unable to concatenate strings when using the ANSI date construct.

The construct date '1900-01-01' is an example of a literal, in the same way as '01-01' is string literal and 1900 is a numeric literal. We even have use some more exotic numeric literals such as 1e3 and 3d .

Oracle is pretty generous with implicit conversions from strings to numbers and vice versa, so it doesn't object when we assign a numeric literal to a CHAR or VARCHAR2 variable, or a string to a NUMBER variable (as long as the content is appropriate). We are allowed to assign the string literal '1e3' to a number since the content is numeric, albeit in scientific notation.

So there are no problems with executing the following:
declare
  v number := '1e3';
begin
  dbms_output.put_line(v);
end;
/

However while 3d and 4.5f can be used as numeric literals, Oracle will object to converting the strings '3d' or '4.5f' into a number because the 'f' and 'd' relate to the data type (Binary Float and Binary Double) and not to the content.

Similarly, we're not allowed to try to use string expressions (or varchar2/char variables) within a date literal, or the related timestamp literal. It must be the correct sequence of numbers and separators enclosed by single quotes. It doesn't complain if you use the alternative quoting mechanism, such as date q'[1902-05-01]' but I'd recommend against it as being undocumented and superfluous.

Going further, we have interval literals such as interval '15' minute .In these constructs we are not allowed to omit the quotes around the numeric component. And we're not allowed to use scientific notation for the 'number' either (but again the alternative quoting mechanism is permitted). 

I've built an affection for interval literals, which are well suited to flashback queries.

select versions_operation, a.* 
from test versions between timestamp sysdate - interval '1' minute and sysdate a;

Confusingly the TIMESTAMP keyword in the query above is part of the flashback syntax, and you have to repeat the word if you are using a timestamp literal in a flashback query. 

select versions_operation, a.*

from test versions between timestamp timestamp '2014-06-21 12:50:00' 
                   and sysdate a


Saturday, June 07, 2014

Apex theme fun


Sometimes you are working with an off-the-shelf product and find something odd, and you're not quite sure whether it is a bug, a feature or whether you've lost the plot.

I use Oracle's Application Express, and was digging into the included theme_18. The templates refer to classes "t18success" and "t18notification"



And then I go looking into the CSS and see hash / ID selectors.

#t18Success{margin:5px auto;font-size:12px;color:#333;background:#DAEED2;width:600px;background-repeat:no-repeat;padding:5px;border:1px #95C682 solid;border-right:none;border-left:none;}

#t18Notification{margin:5px auto;padding:5px;font-size:12px;color:#333;text-align:center;vertical-align:top;border:1px #ffd700 solid;border-right:none;border-left:none;background-color:#ffffcc;width:600px;}

For added confusion, HTML class names are case-sensitive, but CSS selectors are case-insensitive, so the case differences may or may not be relevant.

The application looks nicer if I change the CSS to class selectors, and then I get coloured, dismissable boxes rather than hard to read, unstyled messages. I could probably get the same effect by changing the id="MESSAGE" in the templates, but that seems riskier. At least with the CSS, I am confident that I am just changing the appearance and it shouldn't affect the logic.

Digging deeper, the CSS for more than a dozen of the built-in themes utilise the ID selector "#notification-message" in the CSS. About half a dozen have only a class selector, and another three have both (with the prefix of t followed by the theme number). Finally three just have the ID selector with the theme prefix.

My gut feel is that they switched from the ID to the class selectors stopping in various places on the way. And some of those places aren't very pretty.

I'm not sure I see the benefit in having the theme number embedded in templates and selectors. The template tells it which theme CSS file to get, and as long as the template and CSS are consistent, the use of the theme number just seems to add more place you have to edit when you want to customise a theme.

This was all checked on a fresh Apex 4.0 instance because I just installed the new Windows 64-bit version of Oracle Express Edition. I'll do an upgrade of that default to the latest 4.2 this weekend too.