Saturday, April 12, 2014

Unique identifiers - but what do they identify

Most of the readers of this blog will be developers, or DBAs, who got the rules of Normalisation drummed into them during some phase of the education or training. But often we get to work with people who don't have that grounding. This post is for them. Feel free to point them at it.

Through normalisation, the tendency is to start with a data set, and by a methodical process extract candidate keys and their dependent attributes. In many cases there isn't a genuine or usable candidate key and artificial / surrogate keys need to be generated. While your bank can generally work out who you are based on your name and address, either of those could change and so they assign you a more permanent customer or account number.

The difficulty comes when those identifiers take on a life of their own. 

Consider the phone number. When I dial my wife's phone number, out of all the phones in Australia (or the world), it is her's alone that will ring. Why that one ? 

In the dark ages, the phone number would indicate a particular exchange and a copper wire leading out of that exchange hard wired to a receiver (or a set of receivers in the case of Party Lines).  Now all the routing is electronic, telephones can be mobile and the routing for calls to a particular number can be changed in an instant. A phone number no longer identifies a device, but a service, and a new collection of other identifiers have risen up to support the implementation of that service. An IMEI can identify a mobile handset and the IMSI indicates a SIM card from a network provider, and we can change the SIM card / IMSI that corresponds to a phone number, or swap SIM cards between handsets. Outside the cellular world, VOIP can shunt 'phone number' calls around innumerable devices using IP addresses. 

Time is another factor. While I may 'own' a given phone number at a particular time, I may give that up and someone else might take it over. That may get represented by adding dates, or date ranges to the key, or it can be looked at as a sequence. For example, Elizabeth Taylor's husband may indicate one of seven men depending on context. The "fourth husband" or "her husband on 1st Jan 1960" would be Eddie Fisher.

Those without a data modelling background that includes normalisation may flinch at the proliferation of entities and tables in a relational environment. As developers and architects look at newer technologies some of the discipline of the relational model will be passed over. Ephemeral transactions can cluster the attributes together in XML or JSON formats with no need for consistency of data definitions beyond the period of processing. Data warehousing quickly discarded relational formats in favour of 'facts' and 'dimensions'. 

The burden of managing a continuous and connected set of data extending over a long period of time, during which the identifiers and attributes morph, is an ongoing challenge in database design.

Sunday, March 09, 2014

Pre-digested authentication

A bit of a follow-up to my previous post on Digest authentication.

The fun thing about doing the hard yards to code up the algorithm is that you get a deeper level of understanding about what's going on. Take these lines:

    v_in_str := utl_raw.cast_to_raw(i_username||':'||i_realm||':'||i_password);
    v_ha1 := lower(DBMS_OBFUSCATION_TOOLKIT.md5(input => v_in_raw));

Every time we build the "who we are" component for this site, we start with exactly the same hash made up of the username, realm (site) and password. This is a batch routine, which means somewhere we would store the username and password for the site - whether that is a parameter in a scheduling tool, coded into a shell script or OS file, or somewhere in the database. If you've got the security option for Oracle, you can use the Wallet, with its own security layers.

But digest authentication gives us another option. Since we actually use the hashed value of the user/site/password, we can store that instead. The receiving site has no idea the code doesn't actually know the REAL password.

Now turn that over in your head. We can call the web service as this user WITHOUT knowing the password, just by knowing the hash. I don't know about you, but it makes me a little bit more worried when I hear of user details being leaked or hacked from sites. It's all very well reassuring us the passwords are hashed and can't be reverse engineered (assuming your own password can't be brute-forced). But depending on the security mechanism, a leak of those hashes can be dangerous. If a hacked provider advises people to change their passwords, take their advice. 

'Basic' authentication doesn't have the same weakness. In that environment the provider can store the password hash after applying their own 'secret sauce' mechanism (mostly a salt). When you authenticate, you send the password, they apply the secret sauce and compare the result. You can't get away without knowing the password, because all the work is done at their end.

There's no secret sauce for digest authentication, and there can't be. Even if the provider had the password in the clear, there's no way they can be sure the client has the password since all the client needs is the result of the hash. The provider must store, or be able to work out, the result of that hash because they need to replicate the final hash result using both the client and server nonces. They can store either that same user/realm/password hash as is, or they can encrypt it in a reversible manner, but a one-way hash wouldn't be usable.

In short, digest authentication means that our batch routine doesn't need to 'know' the actual password, just a hash. But it also makes those hashes a lot more dangerous.

I'm an amateur in this field. I checked around and it does seem this is a recognized limitation of digest authentication. EG: This Q&A and this comparison of Digest and Basic.

Saturday, March 08, 2014

PL/SQL, UTL_HTTP and Digest Authentication

For the first time in what seems like ages, I've actually put together a piece of code worth sharing. It's not that I haven't been working, but just that it has all been very 'in-house' specific.

However I had a recent requirement to use a web service that makes use of Digest Authentication. If you have look at the UTL_HTTP SET_AUTHENTICATION subprogram, it only addresses Basic authentication (and, apparently, Amazon S3 which looks intriguing).

In Basic authentication, the username and password get sent across as part of the request. Going through SSL, that doesn't seem too bad, as it is encrypted over the transfer and the certificates should ensure you are talking to the legitimate destination. However if that destination has been compromised, you've handed over your username and password. In an ideal world, the server shouldn't need to know your password, which is why database should only have hashed versions of passwords. 

Outside of SSL, you might as well just print the username and password on the back of a postcard.

In Digest authentication, you get a more complex interaction that keeps the password secret. You ask for a page, the server responds with an "Authentication Required" plus some bits of information including a nonce. You come up with a hashed value based on the server nonce, your own nonce and a hash of your username and password and send it back with the next request. The server has its own record of your username/password hash and can duplicate the calculations. If everyone is happy, the server can fulfill your request and nobody ever actually needs to know the password.

Our server used SSL, and thanks to Tim's article on SSL and UTL_HTTP, it was a simple set up. I've done it before, but that was in the days when it seemed a lot hard to get certificates OUT of a browser to put them in your Oracle Wallet.

The Interwebs were a lot less forthcoming on a PL/SQL implementation of Digest authentication though. The closest I got was this discussion, which can be summed up as "This may be complex, but I do not see these offhand as being impossible to do in PL/SQL....No Digest configured web server nearby or I would definitely have had a bash at this"

A read through the Wikipedia article, and I came up with the code below:

Firstly, after the initial request, go through the header to get the 'WWW-Authenticate' item. Take the value associated with that header, and pass it to the "auth_digest" procedure. 

    l_max := UTL_HTTP.GET_HEADER_COUNT(l_http_response);
    l_ind := 1;
    l_name := '-';
    while l_ind <= l_max AND l_name != 'WWW-Authenticate' LOOP
      UTL_HTTP.GET_HEADER(l_http_response, l_ind, l_name, l_value);
      IF  l_name = 'WWW-Authenticate'
      AND l_http_response.status_code = UTL_HTTP.HTTP_UNAUTHORIZED THEN
        -- Unauthorized. Using the Authorization response header, we can come up with the
        -- required values to allow a re-request with the authentication/authorisation details
        l_http_request := UTL_HTTP.BEGIN_REQUEST(l_server||l_method);
        auth_digest (io_http_request => l_http_request, i_auth_value => l_value,
          i_username => nvl(i_username,'xxxx'), i_password => nvl(i_password,'xxxx'), 
          i_req_path => l_method, i_client_nonce => null);
        dbms_output.put_line($$PLSQL_LINE||':Get Response from authenticated request');
        l_http_response := UTL_HTTP.GET_RESPONSE(l_http_request);
        dump_resp (l_http_response);
        dump_hdr (l_http_response);
      END IF;
      l_ind := l_ind + 1;


The auth_digest starts with an extraction of the 'valuables' from that value string. I've used regular expressions here. I spent time working with grep, awk and perl, and regexes are habit forming.

  procedure extract_auth_items
    (i_text in varchar2,
    o_realm out varchar2, o_qop out varchar2, o_nonce out varchar2, o_opaque out varchar2) is
    o_realm   := substr(regexp_substr(i_text, 'realm="[^"]+' ),8);
    o_qop     := substr(regexp_substr(i_text, 'qop="[^"]+'   ),6);
    o_nonce   := substr(regexp_substr(i_text, 'nonce="[^"]+' ),8);
    o_opaque  := substr(regexp_substr(i_text, 'opaque="[^"]+'),9);

  end extract_auth_items;

Next is the 'meat' where the values are combined in the various hashes. Yes, there's a hard-coded default client nonce in there that, by a strange coincidence, matches on in the wikipedia article. That's how this stuff gets developed, by following through a worked example. Just like school.

  function digest_auth_md5_calcs
      (i_username     in varchar2, i_password     in varchar2, i_req_path      in varchar2,
      i_realm         in varchar2, i_server_nonce in varchar2,
      i_qop           in varchar2 default 'auth',
      i_client_nonce  in varchar2 default '0a4f113b',
      i_req_type      in varchar2 default 'GET',  i_req_cnt IN NUMBER default 1)
  return varchar2 is
    v_in_str    varchar2(2000);
    v_in_raw    raw(2000);
    v_out       varchar2(60);
    v_ha1       varchar2(40);
    v_ha2       varchar2(40);
    v_response  varchar2(40);
    v_in_str := i_username||':'||i_realm||':'||i_password;
    v_in_raw := utl_raw.cast_to_raw(v_in_str);
    v_out := DBMS_OBFUSCATION_TOOLKIT.md5(input => v_in_raw);
    v_ha1 := lower(v_out);
    v_in_str := i_req_type||':'||i_req_path;
    v_in_raw := utl_raw.cast_to_raw(v_in_str);
    v_out := DBMS_OBFUSCATION_TOOLKIT.md5(input => v_in_raw);
    v_ha2 := lower(v_out);
    v_in_str := v_ha1||':'||i_server_nonce||':'||lpad(i_req_cnt,8,0)||':'||
    v_in_raw := utl_raw.cast_to_raw(v_in_str);
    v_out := DBMS_OBFUSCATION_TOOLKIT.md5(input => v_in_raw);
    v_response := lower(v_out);
    return v_response;
  end digest_auth_md5_calcs;

And this is the full auth_digest bit

  procedure auth_digest
    (io_http_request  in out UTL_HTTP.REQ,  i_auth_value    in varchar2,
    i_username        in varchar2,          i_password      in varchar2,
    i_req_path        in varchar2,          i_qop           in varchar2 default 'auth',
    i_req_cnt         in number default 1,  i_client_nonce  in varchar2 default null)
    l_realm         varchar2(400);
    l_qop           varchar2(30);
    l_server_nonce  VARCHAR2(400);
    l_opaque        varchar2(100);
    l_response      varchar2(40);
    l_value         VARCHAR2(1024);
    l_client_nonce  varchar2(30);
    -- Apply the username / password for Digest authentication
    extract_auth_items (i_auth_value,
                    l_realm, l_qop, l_server_nonce, l_opaque);
    IF i_client_nonce is not null then
      l_client_nonce := i_client_nonce;
      l_client_nonce := lower(utl_raw.cast_to_raw(DBMS_OBFUSCATION_TOOLKIT.md5(
    END IF;
    l_response := digest_auth_md5_calcs
      (i_username => i_username, i_password    => i_password,     i_req_path => i_req_path,
      i_realm     => l_realm,    i_server_nonce => l_server_nonce,
      i_client_nonce => l_client_nonce);
    --i_qop default to auth, i_req_type default to GET and i_req_cnt default to 1
    l_value := 'Digest username="' ||i_username          ||'",'||
               ' realm="'          ||l_realm             ||'",'||
               ' nonce="'          ||l_server_nonce      ||'",'||
               ' uri="'            ||i_req_path          ||'",'||
               ' response="'       ||l_response          ||'",'||
               ' qop='             ||i_qop               ||',' ||
               ' nc='              ||lpad(i_req_cnt,8,0) ||',' ||
               ' cnonce="'         ||i_client_nonce      ||'"'
    IF l_opaque is not null then
      l_value := l_value||',opaque="'||l_opaque||'"';
    END IF;
    UTL_HTTP.SET_HEADER(io_http_request, 'Authorization', l_value);

  end auth_digest;

A package with the code is available from my CodeSpace page, or directly here. There's a lot of debug 'stuff' in there. The code I'm using is still tailored to my single specific need, and I've stripped specific values from this published variant. You'll need to hard-code or parameterize it for any real use. I may be able to do a 'cleaned-up' version in the future, but don't hold your breath.

Sunday, February 09, 2014

Chrome's HOST-RULE flag

I'm currently working in an Apex environment, and we will soon be running some of our apex applications off a second domain name mapped to the same host.

For me this has mean a little playing around with VirtualHost in the httpd.conf which is something I don't get to do very often. But I am having to wait on others to actually set up the new DNS entry to map the domain name to the server.

If this was a brand new server with its own IP, then I could simply use the IP address. But since it is sharing a server and request using the IP goes straight to the old 'default' VirtualHost.

On the server itself I can touch up the hosts file to hard-code the entry. But the only browser I can use on the server is very dated and nearly unusable. And on my PC, I don't have admin rights to amend the PC hosts file.

Fortunately we do get chrome on our machines (tucked away as chrome-frame, but chrome nonetheless). And chrome has a handy little command line flag called "--host-rules". 

Using: --host-rules="MAP" I can map the new hostname to its planned IP address and bypass the DNS lookup on the PC. Very handy when you are waiting for the DNS entry or if you want to override an existing name to point to a test or dev environment.

Wednesday, October 30, 2013

Oracle 11g XE installed on Windows 8.1

I count myself lucky if a post here gets a few hundred views. I have two posts that have hit 4000+ views, and a third edging towards 3000.

In third place is a reference to foreign keys referencing unique constraints rather than a primary key.

In second place, and heading towards its second birthday, was my desire for SQL Developer to have a "keep awake" function that stops sessions being killed. It still doesn't but will restore connections. 

In first place, was an almost throwaway article from when I migrated my netbook from XP to Windows 8. I installed Oracle 11g XE on it, and apparently a lot of people are (or were) interested in that. I guess people get Windows 8 on a machine and wonder if it is worth trying an XE install. The XE install on Windows is trivially simple:

  1. Download
  2. Unzip
  3. Run the setup program
  4. Choose a directory
  5. Enjoy a cup of coffee while it churns through. You have time for a biscuit too.

That article is nearing its first anniversary. In honour of that occasion, I upgraded to Windows 8.1. Oracle XE broke. Specifically I couldn't find the services that ran the listener or the database or anything Oracle related. A repair install of XE didn't fix things. Disclaimer: I don't use Oracle a lot on that machine, and it possible that something other than the 8.1 upgrade broke it.

I uninstalled it (that would have been the 'repaired' install), blew away all the database files as I had nothing I wanted to keep, and re-installed (instructions above - my biscuit was a Tim Tam, but people in the UK may substitute a p-p-p-penguin). It seems to work fine now.

So Oracle 11g XE works on Windows 8.1, as far as I can see. Bear in mind that XE doesn't have any real support anyway, so the difference between a 'supported' and 'unsupported' configuration is purely imaginary. I don't do anything like RMAN backups and restores, let alone between OS upgrades. I can't see why they wouldn't work, but I'm not a DBA.

Sunday, October 13, 2013

'Medalling' in Humour

Yesterday I competed, and came third, in the Humorous Speech contest of the Hawkesbury Division of Toastmasters. I'm pretty chuffed with that.

Partly, I'll blame Yuri for this. He recommended Toastmasters as a way of getting some extra public speaking skills. Luckily for me, there's a group that meets one lunchtime a week at work, which makes attendance relatively easy. Much easier than trying to fit in an evening on top of my kids' busy social and sporting calendars. I started going just before Christmas and became a paid up member in January this year.

Next, I'll blame Neil Sequeira who prompted me to enter the club contest a month ago on the basis that attendees regularly laughed at my a nice way. I won that, and then it dawned on me that I'd be going into the Area level contest. Our club was hosting that event, so I had a slight 'home ground' advantage, but I was still in front of a bunch of people I didn't know, most of whom have been honing their speaking skills for YEARS.

I won that, which meant going up to the Division level contest last night. That was in a church hall, unfamiliar territory for me. We were speaking from an elevated stage, and with a headset microphone. Getting into the big leagues.

I was a bit ruffled because my trip there was delayed with my phone unaccountably deciingd it couldn't find any GPS signal, and refusing to tell me where I was or how to get where I was supposed to be. My destination was the other side of Middle Harbour so my regular tactic of pointing the car vaguely in the right direction and hoping for the best was foiled by its inability to fly across the water. Resorting to my trusty and dusty Gregory's Street Directory I made the 40 minute journey in a mere 80 minutes.

My speech starts with the other Gary Myers, multi-time champion of Summernats (which I've mentioned before in my blog ) and land speed record holder. Oh, and candidate for one of the NSW Senate spots for Federal parliament. He didn't win, but a candidate for the same party did get a spot for Victoria. I suspect this came as somewhat a surprise to him, as the Senate voting system is complex and highly unpredictable. An unemployed ex-sawmill worker who didn't own a suit and has kangaroo poo chucking as a hobby will be an interesting addition to the chamber.

I was more than  happy with my third place finish in the contest. The winner, who also took the Table Topics contest top prize, was excellent. And second place went to a Dave the Happy Singer who took the mick out of homoeopathy and similar rubbish, so I won't criticise that. I get a small trophy and a certificate. And an update to my LinkedIn profile.

And, for posterity, my certificates:

Sunday, September 22, 2013

Tracking email receipts through images

I pinched a technique from EMail marketeers last week. It goes by the cool name of a web beacon or web bug and is used to detect when someone reads (or at least opens) an email.

Our application is like a work flow system and it sends out a whole bunch of emails. I'm a little sceptical about who reads them, especially since I've got a bunch of rules that automatically ignore about 20-30 emails a day.

For a new email alert to a distribution list, I decided to code in a beacon. 

First for each email sent, it grabs a unique identifier. You can use a plain sequence, but I opted for a GUID. It is harder for anyone to guess other values for GUIDs and it is easy for me to tell which ones come from the test environment and which ones from production.

      v_guid_vc varchar2(32);
      v_guid_vc := sys_guid();

Then, the email HTML body included an image link

v_body := v_body || 
  'img src="http://'||v_host||'/PKG_MAIL.image?i_val='||v_guid_vc||''' ';

These mails go out to internal users, and Outlook is the primary email tool. The technique doesn't work for users reading their mail through Outlook webmail or the iOS email. The host is behind our firewall and will be unreachable from a home email account. But our company standard Outlook config will happily try to render that image when the user opens the email at their desk. And when that happens, it tries to get the image content by calling PKG_MAIL.image and telling us the GUID we associated with that email.

When we get that image request, we record the GUID, timestamp, IP address, user agent and cookie data. If the user happens to be logged into our system at that time, the cookie will tell us who the reader is. If not, the IP address will give us a clue. [When someone logs into the application, we record the userid and IP address, so we can see if someone logged in from that address recently.] Once we've recorded that data, we return a standard image content, irrespective of the GUID passed in.

No one has to click any link for the picture rendering / email read to be logged.

Rather than sending one email to ten recipients (with the same GUID), you will be better off sending ten variations with their own GUIDs. If you are sending the email to a specific individual, you may not need that excess logging (unless you want to track forwarded emails too). 

Personal email clients, such as Thunderbird and GMail, tend not to open remote images by default, offering a vague "privacy" warning. This is what they are warning you about. It means that Dominos probably know that I read their email vouchers about 5 minutes before ordering a pizza. And they will know whether I am using Thunderbird or Chome, and a bunch of other stuff about me. But, cheaper pizza !