Friday, April 18, 2014

I Love Logs

It occurred to me a few days ago, as I was reading this article on DevOps, that I might actually be a DevOps.

I think of myself as a developer, but my current role is in a small team running a small system. And by running, I mean that we are 

  • 'root' and 'Administrator' on our Linux and Windows servers
  • 'oracle / sysdba' on the database side, 
  • the apex administrator account and the apex workspace administrators,
  • the developers and testers, 
  • the people who set up (and revoke) application users and 
  • the people on the receiving end of the support email
Flashbacked to Jeff Smith's article on Developers in Prod. But the truth is that there's a lot of people wearing multiple hats out there, and the job titles of old are getting a bit thin. 

The advantage of having all those hats, or at least all those passwords, is that when I'm looking at issues, I get to look pretty much EVERYWHERE. 

I look at the SSH, FTP and mailserver logs owned by root. The SSH logs generally tell me who logged on where and from where. Some of that is for file transfers (some are SFTP, some are still FTP), some of it is the other members of the team logging on to run jobs. The system sends out lots of mail notifications, and occasionally they don't arrive so I check that log to see that it was sent (and if it may have been too big, or rejected by the gateway).

Also on the server are the Apache logs. We've got these on daily rotate going back a couple of years because it is a small enough system that the logs sizes don't matter. But Apex stuffs most of those field values into the URL as a GET, so they all get logged by Apache. I can get a good idea of what IP address was inquiring about a particular location or order by grepping the logs for the period in question.

I haven't often had the need to look in the Oracle alert logs or dump directories, but they are there if I want to run a trace on some code. 

In contracts, I'm often looking at the V$ (and DBA_) views and tables. The database has some audit trail settings so we can track DDL and (some) logons. Most of the database access is via the Apex component, so there's only a connection pool there.

The SELECT ANY TABLE also gives us access to the underlying Apex tables that tell us the 'private' session state of variables, collections etc. (Scott Wesley blogged on this a while back). Oh, and it amazing how many people DON'T log out of an application, but just shut their browser (or computer) down. At least it amazed me. 

The apex workspace logs stick around for a couple of weeks too, so they can be handy to see who was looking at which pages (because sometimes email us a screenshot of an error message without telling us how or where it popped up). Luckily error messages are logged in that workspace log. 

We have internal application logs too. Emails sent, batch jobs run, people logging on, navigation menu items clicked. And some of our tables include columns with a DEFAULT from SYS_CONTEXT/USERENV (Module, Action, Client Identifier/Info) so we can automatically pick up details when a row is inserted.

All this metadata makes it a lot easier to find the cause of problems. It isn't voyeurism or spying. Honest. 

Saturday, April 12, 2014

Unique identifiers - but what do they identify

Most of the readers of this blog will be developers, or DBAs, who got the rules of Normalisation drummed into them during some phase of the education or training. But often we get to work with people who don't have that grounding. This post is for them. Feel free to point them at it.

Through normalisation, the tendency is to start with a data set, and by a methodical process extract candidate keys and their dependent attributes. In many cases there isn't a genuine or usable candidate key and artificial / surrogate keys need to be generated. While your bank can generally work out who you are based on your name and address, either of those could change and so they assign you a more permanent customer or account number.

The difficulty comes when those identifiers take on a life of their own. 

Consider the phone number. When I dial my wife's phone number, out of all the phones in Australia (or the world), it is her's alone that will ring. Why that one ? 

In the dark ages, the phone number would indicate a particular exchange and a copper wire leading out of that exchange hard wired to a receiver (or a set of receivers in the case of Party Lines).  Now all the routing is electronic, telephones can be mobile and the routing for calls to a particular number can be changed in an instant. A phone number no longer identifies a device, but a service, and a new collection of other identifiers have risen up to support the implementation of that service. An IMEI can identify a mobile handset and the IMSI indicates a SIM card from a network provider, and we can change the SIM card / IMSI that corresponds to a phone number, or swap SIM cards between handsets. Outside the cellular world, VOIP can shunt 'phone number' calls around innumerable devices using IP addresses. 

Time is another factor. While I may 'own' a given phone number at a particular time, I may give that up and someone else might take it over. That may get represented by adding dates, or date ranges to the key, or it can be looked at as a sequence. For example, Elizabeth Taylor's husband may indicate one of seven men depending on context. The "fourth husband" or "her husband on 1st Jan 1960" would be Eddie Fisher.

Those without a data modelling background that includes normalisation may flinch at the proliferation of entities and tables in a relational environment. As developers and architects look at newer technologies some of the discipline of the relational model will be passed over. Ephemeral transactions can cluster the attributes together in XML or JSON formats with no need for consistency of data definitions beyond the period of processing. Data warehousing quickly discarded relational formats in favour of 'facts' and 'dimensions'. 

The burden of managing a continuous and connected set of data extending over a long period of time, during which the identifiers and attributes morph, is an ongoing challenge in database design.