Wednesday, December 22, 2010

Is it really 10 inches ?

Actually the measurements in question are 9.7 and 10.1 which are the respective screen sizes, in inches, of an iPad and my Asus EeePC . And it is screen size, or how screen sizes are measured, that I'm talking about.

If you remember the rules about squaws on hippopotamuses, we can work out the actual width and height of the screens, but crucially only if we know their ratio. The iPad has a 9.7 inch diagonal screen at a 1024 x 768 ratio (often stated as 4:3 but I'll go with the numerical value of 1.33). My EeePC has a 10.1 inch diagonal screen with an odd 1024 x 600 ratio. That equates to a 1.7 ratio (somewhere between a 16:10 and a 16:9).

This gives the EeePC a size of 8.72 x 5.1 and the iPad 7.75 x 5.83. The squarer size means the iPad is actually about 2% larger in area than the EeePC despite the iPad having the smaller diagonal measure.

Thanks to silisoftware for supporting those calculations.

So a couple of observations.

Firstly, as a media consumption device, the iPad has not been optimized for widescreen movies but more for 'page' sized content. Steve Jobs is a smart guy, especially on user experience. He didn't make that choice by flipping a coin. My bet is he wasn't working out what people would use the device for either. He decides what he is going to sell, and people decide whether to buy it or not, and he wanted a portable device not a encumberance.

Secondly the size of the iPad screen includes the 'keyboard'. Googling "ipad virtual keyboard" images, it looks like the keyboard takes up most of the bottom half of the screen, at least in landscape mode. Where an application requires text entry (name, email...) a 1024x768 screen on a netbook is very different to the same size screen on a tablet. And re-jigging things when it is switched between landscape and portrait is another can of worms.

The smaller the screen, the larger the impact of any virtual keyboard. You can read characters that are a lot smaller than a 'key' you press for typing. 

When Cary Millsap was presenting to the Sydney Oracle Meetup last week, one of the things he said that if the same value for a measure can result from two completely different user experiences, you are measuring the wrong thing. He was speaking about how an average may mask extreme variances, but the same applies to this situation. While the screen size of a netbook and an iPad are similar, the experiences can differ considerably.

So can people stop talking about 10 inches (and that includes those people spruiking herbal products).

Thursday, December 09, 2010

Globally and Locally Structured Databases

I've been reading another article on NoSQL.

This one focuses on the 'No Join' aspect which the author relates to schema-less data.

Normalisation and joins don't go hand-in-hand though. If you are old enough, you might have had an address book (or a filofax). In there, you would write addresses organized by name. Under "F" you'd write "FLINTSTONE, FRED and WILMA" followed by an address in Bedrock. That's pretty normalised. You haven't got one entry for Fred and another for Wilma both with the same address. You might write Fred's birthday next to his name, and Wilma's next to hers, and have another date for their wedding anniversary.

Implicitly you've recognised that some items belong the the 'family' entity and others to the 'family member' entity. But you've made them part of a single object that provides the joining mechanism.

SQL databases use tables to represent entities, requiring joins to pull back related entities (which leads to concepts like primary keys). I started out on an IDMS network database which didn't have tables, but rather explicit relationships between entities. You would start with a parent, follow a relationship down to a child and then to siblings (either all of them, or as many as you wanted to find). You might then follow a different relationship from that child to another kind of parent. Still normalised, but a different linking mechanism.

There is a difference between what I'd term 'globally structured data' and 'locally structured data'. SQL databases have a global structure in that a consistent structure is forced on the entire database. One the other hand Document databases allow each document to follow its own structure. Key-value stores don't have any explicit structure, leaving it up to the application to interpret.

Where you have orders consisting of order lines, a SQL database will have one Orders table and one Order Lines table, and all the Orders would have the same attributes (for example Customer Name), and all the Order Lines would have the same attributes. A document database may have some orders that have a "Customer Name", but others that have "Customer Surname" and "Customer First Name" and yet others that have "Customer Corporate Name". This puts more emphasis on the application to work with the potential varieties.

So what is the difference when it comes to the 'agility' of the implementations ?

In the run up to Christmas, your firm decides to change your business model. Rather than forcing people to make separate orders for each delivery address, you now allow each item to be delivered to a different location. In a normalised database you need to add the DELIVERY_LOCATION to the ORDER_LINE entity. You then update all existing orders so that the DELIVERY_LOCATION from the ORDER is copied down to the individual child ORDER_LINE records and finally drop the DELIVERY_LOCATION from the ORDER table.

In a document database you can change the application and start recording your new delivery location on your Order Lines. At the database layer, you don't have to worry that the structures are not consistent. The problem is you still have to write code that copes with delivering items to the right place, and that handles orders that just have a delivery location at the order level as well as orders that have it at the order item level. Your application logic now has additional complexity, plus you need test cases that cope with both sets of conditions. To make it easy, you may have template versions for the documents, so that 'v1.1' Orders have the single delivery location but the newer 'v2.0' Orders have it at the lower level. 

Worse yet, your application will have bugs in it. Maybe not today, maybe not tomorrow but one day, and you'll regret the consequences for the rest of your life. Because you'll be stuck with some 'v1.1' Orders that have a delivery location on the Order Lines, or perhaps a 'v2.0' with no delivery location at all, or something that claims to be a 'v1.3' style which isn't documented anywhere but is something Barney had to write at five o'clock on Christmas Eve.

It is perfectly possible for you to run a 'migration' on your document database so that all your data is at the same version. This basically enforces the same situation as a 'globally structured database' like SQL....except you need to get all the code right because you won't get enforcement support from the database layer itself.

Another solution might be to delete orders once they are delivered. After a few months, those 'v1.1' documents won't be an issue any more. At least in five years time you won't have code that is built to deal with 20 different versions.

Generally, the problems of being Agile in an RDBMS are not to do with SQL or relational concepts. They are a consequence of being unable to restructure existing data (perhaps you can't work out how, or don't have the data you need). But they are ultimately problems in migration from the old business model to the new model, not database problems.

To be even handed, schema changes are still not seamless, especially with more primitive database engines, or when dealing with high-availability / multi-server / replicated databases. But that's a 'state of the art' issue, not a fundamental conceptual issue.

Monday, December 06, 2010

Congratulations, Jeff (and Neils, Peter and Elic)

Kellyn previosuly confessed to her DBA crushes.
I'm going to confess to being a PL/SQL Stalker. It's okay, I've confessed before and he didn't object :)

This time, Jeff got one step beyond in the Q3 playoffs for the PL/SQL Challenge. Congratulations (but I'll get you next time, and your little dog too). I'll take comfort in the thought that the top ten only had one player from England, mitigating slightly Australia's performance at the cricket in Adelaide. Distracting the poms with the UKOUG may have helped. [Yes, I did migrate from England to Australia. I'm allowed to make cheap remarks either way.]


In this current quarter's rankings, he's at 20th, while I'm a bit lower down at 27th. Okay, there are four from the UK in the top 30, but I'll gloss over that fact.

One reason for my stalking is that Jeff is a blogger working with PL/SQL in Australia, albeit a couple of thousand miles away (about 4000 kilometers by road). We did meet at an AUSOUG conference a few years ago.



The other place we cross paths is StackOverflow. As shown here, we are both members of that select group of people with the PL/SQL 'bronze' badge, and the Oracle 'silver' badge. Actually Tony Andrews, Vincent Malgrat and Andrew Clarke (APC) have also got their Oracle 'gold' badge. I might start stalking them too, and maybe Justin Cave as well. Be warned, if you stalk users in StackOverflow and find yourself agreeing with them, you'll probably vote them even higher.



An interesting feature of StackOverflow is that they do regular data dumps of all the questions, answers and comments. They've also got that installed on SQL Server Azure so you can actually run your own queries against the data. I've written one to keep track of the exact scores (as of the dump date) on a specific tag.  There I can see Jeff has 74 upvotes for the tag 'oracle10g' while I've only got 47.

But I am beating him in the 'oracle' and 'plsql' tags. So there !

Wednesday, November 24, 2010

Warning about whitespace and to_timestamp_tz

Quick post prompted by a stackoverflow question.


select to_timestamp_tz('11-OCT-10 04.59.59.999999999 AM -05:00',
                        'DD-MON-RR HH.MI.SSXFF AM TZR') ws,
       to_timestamp_tz('11-OCT-10 04.59.59.999999999 AM  -05:00',
                        'DD-MON-RR HH.MI.SSXFF AM TZR') no_ws
from dual

Do you notice the extra space in the second column, preceding the -05:00 timezone ?
Oracle notices it too, but isn't very intelligent in accommodating it.

WS
-------------------------------------------
NO_WS
-------------------------------------------
11-OCT-10 04.59.59.999999999 AM -05:00
11-OCT-10 04.59.59.999999999 AM +05:00


That is the result on XE and on the 11.2.0.2 edition on apex.oracle.com

Thursday, November 18, 2010

If an article is posted on the Internet and Google doesn't index it, would it have an impact ?

I've been following posts about copyright in the last few weeks.

Tim Hall and Brent Ozar both take issue with plagarists stealing their content.
This is an expansion of a comment I made on a post by Jake at AppsLab

Firstly, I'm personally not worried about theft of my content. But then I'm not in the league of Tim or Brent. When I get around to it, I'll rejig the blog layout so that each page has this Create Commons licence.

But the wider issue is a technical one. The internet is a lot about digitizing content, duplicating it and distributing it. The music and movie industries have battled this for a while, and not done too well. But, through the CodingHorror blog, I see that YouTube is applying technical solutions.

Basically when material is uploaded, they run it through some algorithm to try to determine if it is copyright. So I figured, how could this apply to text-based content. The key is Google, who happen to own YouTube. Yes, there is Bing and Blekko and Wolfram Alpha. There was Cuil, which died quietly a few months back. But Google is search.

It would be quite feasible for Google to implement some plagiarism block. As I indicate in the title, if internet content isn't indexed by Google, it comes pretty darn close to not existing in any practical terms.

How I'd envisage it working is this:

I write a blog piece and run it through a Google Signature Recorder. It picks out, or has my assistance in picking out, some key phrases. Then, as they index the internet world they check out for other sites with those key phrases.

If they find one, they report it to me. I can either mark it as 'OK, licenced, whatever' or I hit the veto button. If I do that, they pull the offending pages from their search index. Worst case, if they find 80% of a site is offending content, they could decide to remove the entire site from the index.

Removing the offending site from their index doesn't hurt the search facility. The original content is still there. They can even do 'magic rewrites' to count pointers and links to offending content as pointers to the original for PageRank purposes.

What might their motivation be for this ?

With Youtube, where they are hosting the service, it may be part legal necessity. But they host Blogger and Google Docs too. More than that though, this could be a service they could sell. If my content is valuable enough for me to want to protect it, I can afford to pay Google $50 a year to monitor it. The advantage of making this a paid service is that, if there is a challenge (maybe Fred says I've registered his content), there is a financial paper trail to follow.

Ultimately, as YouTube isn't obliged to host your videos, Google isn't obliged to index your content.

Wednesday, November 10, 2010

Application Express: Workspaces and Schemas

Time for another chapter, or at least verse, in the Apex security model.

With Apex, the relationship between Workspace and Schema is flexible. A single schema can have multiple workspaces against it, and a single workspace may be assigned multiple schemas. I’ve only worked on a couple of small Apex jobs, and have kept a one-to-one mapping between workspace and schema. That seems simpler to me.

In this example, the GARY workspace has two schemas, GARY and GROWL.




Going down a level, when you create an application under a workspace with multiple schemas, you pick one of them as the base schema for the application. That schema will be the ‘effective database user’ in terms of privileges.




Things are a bit different in the Workshop. The workshop belongs to the workspace and doesn’t have a single schema. Instead you get the drop-down list near the top-right corner. You choose the schema (out of the allocated schemas) and it runs as that user. Except, perhaps, in Nuno’s installation where he is seeing some odd results. 










Note that in both above cases the CURRENT_SCHEMA and CURRENT_USER are the same. When I tried an ALTER SESSION SET CURRENT_SCHEMA=…, it wouldn’t ‘stick’ for the next query. I could only get it to work if I include it in the same call. I assume that the mechanism which sets the current schema when it sets the parsing schema






As an amusement, I tried fiddling with the HTML to see if I could run SQL as a schema not linked to the workspace. It didn’t work and just raised an application warning (ORA-20000, rather than a built-in warning) that stated I didn’t have the authority on that schema.





Finally, I did some playing with DBMS_TRACE.TRACE_ALL_CALLS and identified WWV_DBMS_SQL as the ‘missing link’ between Apex and DBMS_SYS_SQL. The system catalog shows that APEX_040000 has execute permission on WWV_DBMS_SQL and that package has a dependency on DBMS_SYS_SQL

Thursday, November 04, 2010

The classic Oracle user security model

Following from comments on my last post, I thought I'd put into words my thoughts on the Oracle security model.

Oracle has database users with passwords. Each database user is, at least nominally, a schema with its own objects. Each user can be granted system privileges (such as CREATE SESSION), object privileges (such as INSERT into a particular table) and roles. Roles are a bunch of system and object privileges that come into effect for that user's sessions but which can't be leveraged for objects such as stored procedures in the user's schema.

I think that there are problems with this.

Firstly, having the database being the 'point of authentication' really only works if one single database underlies all the applications in the enterprise. In practice there's a bunch of other applications too, down to email and calendars, and you really want a unified authentication mechanism. Single sign-on or LDAP or something dedicated to the job. And of course Oracle provides for an array of external authentications (discussed here) and the simple 'password in the database' will one day be totally redundant.

Secondly, the one-to-one relationship between users and schemas doesn't quite work. If I have two end users, Fred and Barney, doing the same jobs with all the same rights and privileges, then I will never want FRED owning some tables in his schema and BARNEY having others in his schema. I don't even want the possibility of FRED having access to an object that BARNEY doesn't. Because of the potential for that, every sql you look at in V$SQL has a parsing schema id. Because there is the chance that the grants authorizing FRED can be revoked, requiring the query to invalidated and re-parsed, the SQL has to be separate from BARNEY's identical SQL on the same tables.

If you worked in the client/server days, you can probably remember seeing dozens of identical SQLs differentiated only by the logged in user. A pain.

Thirdly, the client/server days also had this idea of having a database session for every logged in user. It seems a bit scary these days and architects will come in and shout about how you can't scale to a million users in that sort of model. Its not that big a deal though, at least for most business apps. The number of users for your business application hasn't changed that much since the 80s. Smaller businesses may have ten or twenty, large businesses maybe a few hundred or thousands. Where an application is 'front and centre' of some-ones job, they will be using it steadily. If there is some sort of 'state', it needs preserving, whether that is in a database tier or application tier. 

The problem with this model is if you extend your application outside your business, to your customers or the world in general. Then you might get thousands of users, perhaps only using the application for a few seconds, perhaps not logging on and off properly so resources can't be released immediately.

Some form of connection pooling with generic users is generally considered the way to go. Looking down a step, this is pretty much what a database does anyway, with the all the writes to data and log files typically being done by a few processes under an abstracted 'oracle' user. 

I understand why the Apex security model is separate from the classic Oracle model. It generally follows the typical model for an application technology. Indeed, my opinion is that Application Express is an 'applications' technology or framework, not a database one. It just happens to be tied closer to the database.

What I don't like is the use of DBMS_SYS_SQL. This is a powerful, and undocumented, package. There's bits of code in Apex that aim to prevent the creation of workspaces on a number of standard schemas. I'd prefer it the other way, that schemas should need to be explicitly permitted to be used as Apex schemas. Network ACLs came into Oracle with 11g so that those packages that communicated over the internet/intranet (UTL_HTTP etc) be more tightly controlled, down to individual sites if necessary. A similar setup could be established to control access through DBMS_SYS_SQL.

Monday, November 01, 2010

EXPIRED AND LOCKED...but still DANGEROUS

Prompted by James Koopmann's post about default Oracle accounts.

"The good thing here is that many of these accounts, after a normal installation are expired and locked"

Being blunt, "EXPIRED & LOCKED" does not mean safe, unbreakable, unusable or ignorable.

EXPIRED means that you cannot use the account password to access the account. My trick from last week would still allow access through a proxy account.

LOCKED is a bit better, as is lacking a 'CREATE SESSION' privilege. But even if FRED is locked, nothing stops user BARNEY using any available grant to run a procedure owned by FRED, and if that procedure is the default DEFINER RIGHTS, then it runs with an privilege granted to FRED. This is why SQL injection vulnerabilities in code owned by these LOCKED schemas are still dangerous.

But there's more...
Lets have some fun with APEX.
Brand new user, nothing up my sleeves.
  create user exp_test identified by exp_test default tablespace users quota unlimited on users;
  grant create table to exp_test;
  grant create trigger to exp_test;
  grant create procedure to exp_test;
  grant create sequence to exp_test;
  grant create view to exp_test;

Five basic privileges and not even as many as recommended in the documentation.

Those privileges are sufficient to create an Apex workspace for the schema though. A Workspace has an ADMIN user and you can log into that workspace as the ADMIN user. You can have other users (developers or plain, apex-authenticated users) on the workspace too.

You can then take your apex-authenticated admin or developer and go into APEX's SQL Worksheet and execute SQL as the schema user. Try it. No CREATE SESSION required. Now try

  alter user exp_test account lock;
  alter user exp_test password expire ;


Still accessible through Apex. No worries about the account being locked or passwords being expired.

I've done that using XE with Apex 4.0 and the embedded PL/SQL Gateway. Obviously the account you use to connect to the database needs to be unlocked, with CREATE SESSION and an unexpired password.

But beyond that Apex uses magic (or rather, I believe, DBMS_SYS_SQL). You can log out of one apex application running in a workspace on schema FRED, log into another running against BARNEY and it's all using the same set of database sessions.  The apex builder/administration application is all running in Apex as well, on the same connections, with all its majestic power.

So an expired and locked account still has enough oomph to bite your legs off.

Friday, October 29, 2010

Temporary access to database accounts and password reuse

Eddie Awad recently shared the "Factsheet about Oracle database passwords" from Red Database Security.

One of the items mentioned is the fact sheet is how to temporarily change a user's password, by getting the value from DBA_USERS (or SYS.USER$ in 11g) and using the IDENTIFIED BY VALUES clause.

It doesn't cover the situation where your policy prohibits password reuse. If you try to reset it back, then you'll get the "ORA-28007: the password cannot be reused" exception. You can set password_reuse_max to unlimited, then set the password and then reset the policy. But that's all a bit of a hack and it is something you shouldn't be doing anyway.

If a DBA has an urgent need to log into someone else, then the DBA should give himself proxy rights


ALTER USER target_usr GRANT CONNECT THROUGH dba_user;

then connect as


CONNECT dba_user[target_user]/dba_password@conn


 which is much tidier. Though the DBA should revoke that grant afterwards.
ALTER USER target_usr REVOKE CONNECT THROUGH dba_user;


And one more thing missing from the fact sheet is that, if you have enabled the functionality to prevent password reuse, obviously Oracle has to store the old password hashes to validate that they are not reused. Those old hashes can be seen in SYS.USER_HISTORY$. While they are not current passwords, if someone does get hold of them, they may provide a clue as to what passwords someone is using.

Sunday, October 24, 2010

Fun with GROUP BY

What's wrong with this ?

CREATE TABLE dog 
(id number, name varchar2(10), breed varchar2(10), birth_date date);

select name, breed, count(*)
from dog
group by name;

Yup, you get an ORA-0979 "Not a group by expression" triggered by the "breed" identifier in line 1.

If you execute it through DBMS_SQL...well, you can't. It actually fails at the parse phase.

declare
  v_num integer;
  v_ret number;
begin
  v_num := dbms_sql.open_cursor;
  dbms_sql.parse(v_num,
     'select name, breed, count(*) from dog group by name',dbms_sql.native);
  v_ret := dbms_sql.execute(v_num);
  dbms_sql.close_cursor (v_num);
end;
/

But you can embed the SQL in a procedure and it compiles successfully.

create or replace procedure get_dog is
begin
  for c_rec in (select name, breed, count(*) from dog group by name) loop
    null;
  end loop;
end;
/


Although it fails on execution, the compilation of the procedure has succeeded. It has worked out that the table and columns exist with appropriate privileges. With PLSQL_WARNINGS it will even do some datatype checking. But this demonstrates that the compiling a procedure does NOT do a full parse of the SQL statement. One reason, perhaps, for this is that parsing the SQL statement would mean it ending up in shared memory (eg visible through v$sql) and the compilation of a large package might have a big impact on shared resources.

The 'lite parse' performed by PL/SQL can be done using less shared resources and a reduced impact on other sessions. The drawback is that, you may occasionally get an SQL statement in your PL/SQL code that can never be successfully executed. However you should discover that quickly enough in your testing phase....

PS. Can anyone come up with other SQL 'parse' errors that are not caught when the statement is statically compiled into PL/SQL ? My 'test' would be that the statement fails during a DBMS_SQL.PARSE, but can be included (without recourse to dynamic SQL) in a procedure that can be compiled without error.

Friday, October 22, 2010

Advice from the Ghost-who-walks

On my journey in to work, I read a newspaper. A little old-fashioned perhaps, but both my parents worked as journalists and newspapers have been a 'given' as far back as I can recall.

I also read the comic bits, including Dilbert and the venerable Phantom, the ghost-who-walks. He is currently investigating whether an innocent person might have been locked in a prison. His search was cut short but today's piece, on his way out, he thinks to himself "I can't write that in the chronicles, that I checked every cell but one."

This is where quality, and expectations of quality, play an important role in what we produce. If we are working on a project or system that is known to be "trouble", there's the temptation to skip checking that last prison cell, to skimp on design or testing, to re-use/misuse an existing column or variable rather than create a new one, to not tidy away deprecated columns.

But when the standards are high, even or especially when self-imposed, you'll go to the extra effort to complete the job. When you've written your test cases BEFORE writing the code, you will test every one and you'll make sure they pass. If you do your test cases after writing the code, there'll be the temptation to test only what you know will pass. There's the temptation not to test a couple of edge cases you're not convinced will work, justifying it by the time saved that the consoling thought that "they probably won't crop up anyway".

So aspire to be like the Phantom, and hold yourself to higher standards.

Wednesday, October 06, 2010

Fun with ROWNUM

Conventional wisdom has it that the only predicate that is effective with ROWNUM is in the form of ROWNUM < (positive integer). Okay, a <= works too. Or a "BETWEEN 1 AND ...", but you get the idea.

In practical terms, that is true. But in totally impractical terms, you can get some interesting effects with a predicate like "rownum in (3,4)" if you use it in conjunction with an OR. Here is a simple demo. Firstly, load up a table with some data.


create table test
  (id number not null, val varchar2(30) not null);
  
create index text_id_ix on test (id, val);
create index text_val_ix on test (val, id);


insert into test (id, val) values (1,'Zebra');
insert into test (id, val) values (2,'Toucan');
insert into test (id, val) values (3,'Rhino');
insert into test (id, val) values (4,'Monkey');
insert into test (id, val) values (5,'Llama');
insert into test (id, val) values (6,'Frog');
insert into test (id, val) values (7,'Emu');
insert into test (id, val) values (8,'Dog');
insert into test (id, val) values (9,'Cat');


Next query it through a simple table scan.


select /*+ NO_INDEX (t) */               * from test t 
where rownum in (3,4) or val between 'Cat' and 'Emu';



        ID VAL
---------- -----
         7 Emu
         8 Dog
         9 Cat




Exactly what you'd expect without the ROWNUM predicate. But now tell it to use the index on the value column...


select /*+ INDEX_FFS (t text_val_ix)*/   * from test t 
where rownum in (3,4) or val between 'Cat' and 'Emu';


        ID VAL
---------- --------
         9 Cat
         8 Dog
         7 Emu
         6 Frog

So, CAT, DOG and EMU match the BETWEEN predicate. However, the scan continues onto the remaining rows in the index and, because CAT, DOG and EMU have been allocated ROWNUMs one, two and three, then the next candidate (FROG) can be assigned ROWNUM four which is passed by the ROWNUM predicate. Since it is an OR, then the row is passed into the eventual result set even though it doesn't satisfy the other criteria.

So why didn't we get four rows on the FULL SCAN ? Well, that is because the EMU, DOG and CAT rows were the last rows inserted into the table. Once the full scan has picked up those three, there aren't any more rows left in the table to be assigned ROWNUM four.

With ROWNUM, the physical ordering of rows in a table and the choice of query path can affect which rows are returned. However by using a really DAFT predicate, you can actually affect the NUMBER of rows returned too.

It's not the only way to have the same query on the same data set return different numbers of rows though.A predicate using DBMS_RANDOM can do it or the use of the SAMPLE clause. Both are intentionally random though. Function based indexes on non-deterministic functions can have similar effects, as I blogged previously.

Tuesday, October 05, 2010

Ex-this, Ex-that and Ex-the other

No, I didn't go to OpenWorld. It's just too far away from me, and registration/flights/accommodation and all that just make it all too expensive.

While I would have enjoyed meeting people, I'm not sure I missed much else. All this Exadata and Exalogic is very nice if you've got a few million lying around, but my set of Tarot cards isn't seeing any of that in my near future.

But in the 'Ex-other' category, Bradley D. Brown revealed that there is some life in the Express Edition, with the 11g version in the works. Looks like they'll leave ApEx as a separate installable. Wonder if they will reconsider including the JVM in the database. By not removing it, they'll keep the core software closer to the mainstream code-line.

Incidentally, my Oracle Express install is the only item that consistently shows up in my Secunia PSI as being insecure. There's plenty of stuff (mostly .Net framework and Flash) that pops in and out as it becomes insecure and is then patched. When I first installed it, it pointed out a whole bunch of stuff that was dead, dying or lying in wait, prompting a fairly major cleanout and upgrade. This is the 'family' machine, so it has all manner of stuff installed and Secunia makes it a lot simpler to keep the bits up-to-date.

My little netbook got hit by a drive-by-download a couple of weeks back. One web-site I visited had briefly got itself exploited. Not vandalised (which implies something visual), or hacked (which implies information leakage) but probably some XSS attack which threw some nasty trojan through my Chrome browser but which (fortunately for me) got caught by Microsoft Security Essentials.

I did just have a week up at Port Stephens though. Away from Oracle, computers, twitter and all that. I did drop out of the plsqlchallenge rankings so haven't made the playoff this quarter. Maybe next time. With eight players from Australia in the playoff against just five from the UK, can I just say, "Aussie, Aussie, Aussie...Oi, Oi, Oi". Maybe we'll have to ceremoniously burn an Oracle manual, and use the ashes as a trophy.
Sandboarding on Stockton Dunes

I do have a few more days off, and I've got a couple more posts planned for this week.

Wednesday, August 18, 2010

Honey, I swallowed the webcam.

One of the 'features' of the election campaign here has been the National Broadband network. Apparently this will give us fibre directly to the home for over 90% of premises, with speeds of up to 1Gbps. Actually most households would still only get the 100Mbps previously offered. I'm still on 512kbps. Hardly the fast and the furious, but it suffices.

Sucking on a 1 Gbps pipe would allow an average download of a couple of terabytes a month. My current usage is about 5 Gig per month, just a few percent of that. I have occasionally watched a Doctor Who on ABC's iView, which seems to be about 250-300Mb for an hour of TV. Three hours per Gig makes it about 4 months of content in a terabyte, so even if my whole family each watched a different channel 24 hours a day, seven days a week, we'd still be under a terabyte. Not seeing the need yet.

Anyway, giving the consumers such a massive pipe to guzzle from means moving the bottleneck upstream. Partly to the pipes from the US (and to a lesser extent, Europe and Asia), and partly to the content hosters here. The latter will mostly be the Television companies, since video is the bandwidth hog and they've got the biggest store of 'watchable' video. I say 'watchable' advisedly. A lot of it is completely dire, but at least you would be able to watch "Hey, Hey it's Saturday" rather than on Wednesday when it is broadcast. Why it is broadcast on Wednesday, or why you'd want to watch it any day of the week is a different question.

If you've read up on Exadata, the key there was to removing bottlenecks. All of them. Everywhere. The only way that works with the NBN is really peer-to-peer, which is often gaming and file-sharing. But one application for the NBN will be doctor consultation. Fire up the web-cam and get your rash looked at by a professional rather than just anyone on ChatRoulette. Though I don't go to the doctor much, the kids do and often with throat or ear infections. I don't think they show up too well on a webcam but maybe we will get those tiny fibre-optic cameras I see on the medical dramas. Should make Skype a bit more intimate. Actually my Dad got a shock when the Skype on his computer fired up when he wasn't expecting it and I don't think he'd want it any more intimate.

The old lady who lives next to us doesn't have a computer. We could let her use ours, but since I've got kids, I'm quite adamant that the computer (and especially anything with a webcam) is out in the open, and I don't think she'd be comfortable with that. Nor would we to be frank.

Still the only other use for that sort of bandwidth that I could think of was a big network of CCTV cameras on each corner. But you couldn't imagine a Labo(u)r government installing anything like that, could you ?

Monday, August 16, 2010

Fun with passwords

Back in December, a site called RockYou was hacked and the passwords of 30+ million accounts were extracted (apparently by an SQL injection attack ). A file was released containing the passwords, without the related account identifiers (though the full content has been witnessed).  If you wander round the Torrent sites, you should be able to find the password file. Given that it was extracted by hacking, the legality of downloading it and possessing it should be considered in light of whatever jurisdiction you happen to live in or pass through.

Firstly, lets get the "Don't store passwords, store hashes" out of the way. Good policy, but not applicable here. Apparently the site connected to MySpace and similar and needed the passwords for that. OpenId and Twitter authentication are better solutions to that problem in general. Of course Oracle has a similar situation for Database Links requiring passwords. They are stored encrypted (not hashed) because they actually need to use the password to make the connection.

Onto the file itself. The most common passwords have been reported on here. The Washington Post reports that the most common passwords are, well, what you'd expect common passwords to be. That's the reason they are common. Lets face it, a lot of people will sign up to web accounts with no intention of going back to the site, or using it for anything important, and will use a simple throwaway password.

Just because they use a simple password on one site, doesn't mean they use for their banking. In fact I'd expect the ones using the more complex passwords are the ones who re-use the same one for their banks, and given this exposure, are likely to be regretting it.

Some aspects actually appear quite heartening. Out of the 32 million records, there were nearly 8 million unique passwords. So for most people their password uniqueness would be better than a "1 in a million". Sure there were some more common than others. The Premier League was well represented with liverpool, chelsea and arsenal all pretty popular, though manchesterunited was less so, presumably because of the time it takes to type...especially for Man U supporters  [who are known to move their lips while they read :)].

The real problem comes because now those 8 million passwords are much less effective. Consider, running orabf on my handy little netbook (built for low power consumption rather than for CPU intensive tasks) it can chug through 125,000 password attempts a second. That would be around a minute to test all 8 million potential passwords against a username/hash value from DBA_USERS. The word you are looking for is INSIGNIFICANT.

So if I've got your username and hashed password, and it is one of those 8 million, then consider it cracked. Which means you are not only trying to avoid a few dozen 'obvious' passwords, but a list of 8 million including ones with alphanumerics plus special characters.

In theory you can put that list in a dictionary and use Oracle to forbid passwords on the list. But that is going to make it pretty hard for your users to find valid passwords they can remember. ["Frenchtoast...taken...JackDaniels...taken...DamnComputer...taken...DBA_I_know_where_u_live....Bingo!] Then they'll start with passwords that are personal, so won't be on your list. Their car registration number perhaps, or mobile number. Maybe with a suffix or prefix to make it long enough to be valid. While it will be hard for you to lock out, it will actually be easier for anyone who knows them, especially someone they work with.

Friday, August 13, 2010

Buy your DBA a beer

Last week, Lisa Dobson remarked on the essential skill of knowing when to call Oracle Support.

A question on StackOverflow here prompts me to say, there is equally a time when you should talk to the people you work with FIRST.

Look, there is a time to call support. There are times to post something to a forum site. But to say "Our application is running slow. Have the DBAs flipped the GO SLOWER switch on us ?" suggests that either some places have some seriously scary DBAs, or some serious barriers between the DBAs and developers.

It is Friday (here, at least). If you are developer, go buy a DBA a beer. If you are a DBA, buy a developer a beer. In England, this is called buying a round. In Australia, it is called a Shout. And in Sydney, I can recommend the beers at the Harts Pub.

Oh, and if you a developer or DBA in Sydney, come to the Sydney Oracle Meetup and we can all share a beer.

PS. There may be DBAs who haven't yet been driven to drink, or at least don't like beer. I can also recommend the Chocolate Room (cnr sussex & bathurst streets) as a treat.

Wednesday, July 28, 2010

Methods, questions and answers.

Today is one of those days when you've got a couple of recent blog posts which intersect beautifully.

Cary Millsap is here discussing why the right method is more important than the right answer.
Steven Feuerstein inadvertently opened up a can of worms regarding the seemingly simple task of extracting a list of a user's tables from the data dictionary.

The fallout from Steven's question in the PL/SQL Challenge is that the term 'table' is fuzzy and has become fuzzier over the years. Views, object tables, XML tables, Materialized Views, tables that have been dropped but are sitting in the recyclebin. The various answers (except for one) offered for that question in the PL/SQL all worked in SOME circumstances, but not necessarily all circumstances.

You can appreciate the confusion if you have had to explain to someone that, if you grant privileges on a procedure, the data dictionary shows that in USER_TAB_PRIVS.

Personally, I'm more attracted to DBMS_METADATA.GET_DDL these days. I just wish there was a (built-in) function that took the object name (and optionally schema) and worked out the object type for itself.

Anyway, the only thing I can add to the discussion is that testing is the process of proving that your method is applicable to a defined set of situations. The wider the testing, the more situations you can be confident of being addressed successfully.

In pure mathematics and some scientific disciplines, you can devise theoretical proofs that prove the wider applicability of a method without individual testing. That's harder in implementation. Maths can prove that multiply X by two, then divide by two, you get X as the result. In a physical implementation, you might have to deal with rounding imprecision or maximum permitted values or buffer overflows....

Friday, July 16, 2010

Creating objects in tablespaces without any quota

Say you've got a schema with privileges on multiple tablespaces and you want to make sure that objects get created in the appropriate one. One way to do that is make the user's default tablespace one that they don't have a quota on.

That way, if they try to create an object without specifying the tablespace, it picks the default one and errors because there is no quota. Or at least it used to.

Apparently in 11gR2 it doesn't work this way any more. With deferred segment creation, you can create objects in tablespaces where you don't have any quota, and it won't fail until a row gets inserted.

I can see another 'gotcha' for this if you are progressing code from DEV/TEST. The creation DDL make succeed by sticking the table in a USERS tablespace (which isn't expected to be used, except maybe for some trivial temporary tables) whereas before it would fail and red-flag it for re-assignment to the appropriate tablespace.

Wednesday, July 07, 2010

Puzzle me this

Like Jeff Kemp, I got yesterday's PL/SQL challenge wrong too.

The question rested on what identifier/alias Oracle would assign to a column derived from an expression if no explicit alias is supplied.

If you have a cursor with such a column, and you base a variable on a ROWTYPE of that cursor, you really want to know the identifier to use. Steve says "The problem is that this expression has not been assigned an alias. As a result, there is no way in my PL/SQL code to refer to that expression". Jeff says "I had this stupid idea that Oracle would step in and magically create an alias for the column".

Actually, Oracle does assign an alias, or at least an identifier. It is just a little more complicated than any of the offered choices (though in this example I've switched the table used so that I don't need to create any special objects)..

DECLARE
   CURSOR profitable_internet_stocks
   IS
      SELECT SUM (num_rows) FROM user_tables;
   few_and_far_in_between   profitable_internet_stocks%ROWTYPE;
BEGIN
   OPEN profitable_internet_stocks;
   FETCH profitable_internet_stocks
   INTO few_and_far_in_between;
   CLOSE profitable_internet_stocks;
   dbms_output.put_line(few_and_far_in_between."SUM(NUM_ROWS)");
END;  

The following block shows how to use DBMS_SQL.DESCRIBE_COLUMNS to determine the column names assigned by Oracle.



DECLARE
  PROCEDURE temp (p_query IN VARCHAR2) IS
    col_cnt   INTEGER;
    rec_tab   dbms_sql.desc_tab2;
    v_tab     dbms_sql.varchar2s;
    v_cursor  NUMBER;
  BEGIN
    v_cursor := dbms_sql.open_cursor;
    dbms_sql.parse(v_cursor, p_query, 1);
    dbms_sql.describe_columns2(v_cursor, col_cnt, rec_tab);
    dbms_sql.close_cursor(v_cursor);
    FOR v_pos in 1..rec_tab.LAST LOOP
      dbms_output.put_line(rpad(rec_tab(v_pos).col_name,32));
    END LOOP;
  END;
BEGIN
  temp('SELECT SUM (num_rows) FROM user_tables');
END;
/

Of course it is generally better to use an alias when you have the choice.


Could Oracle be 'smarter' in allocating an identifier ? Probably not. There's nothing wrong with the identifier it came up with except that it has to be expressed as a quoted identifier. If the expression had been a simple numeric value (eg SELECT 1 FROM user_tables) then there isn't an appropriate identifier that it can come up with that doesn't require quoting.

Incidentally, it is the SQL engine, rather than the PL/SQL engine, that assigns these identifiers.
The following shows that the "SUM(NUM_ROWS)" identifier is automatically assigned to the first column as well as being used as the alias for the second. As such, Oracle can't work out what column is being referred to in the outer select.

select "SUM(NUM_ROWS)" from
(SELECT sum(num_rows), 1 as "SUM(NUM_ROWS)" FROM user_tables)

Saturday, July 03, 2010

Extreme Querying Presentation

I gave a presentation to the Sydney Oracle meetup last week. It was primarily focussed on Oracle's Analytic functions, but also mentioned the MAX () KEEP DENSE_RANK functionality (which I was reminded about a few months back by a performance comparison here) and the ROLLUP part of GROUP BY. Morgan's Library has a good demonstration.

That's the tricky bit about presenting SQL stuff. You don't want slides full of queries that end up being too small to read, and no-one would remember them anyway. So you talk more about what they can and can't do (and a few gotchas). I'm not sure I got the ideal compromise, but I stick more details in the Notes part of the powerpoint presentations.

The presentation is available in the Files area of the Sydney Oracle Meetup website as both Powerpoint 2007 and a PDF of the Notes pages.  And on my slideshare account as a Powerpoint 2003 (the upload of the pptx failed for some obsure reason). And through the Presentations link on my homepage. Feel free to post critiques too.

In the end I rushed a bit and finished a bit earlier than I thought. I need to get some more presentation practice in. The company I work for used to do a couple of presentations a week internally, partly to spread knowledge but also a good way of gaining presentation experience. Unfortunately those stopped about a year back, and I'm waiting to see if the new boss (who came in a couple of months ago) shows any interest in putting them back on.

I'm on the committee for the Meetup and am on the lookout for meeting / presentation ideas. Feel free to add suggestions. I know most of you aren't in the Sydney area, but remember that I'll post any presentations I give publically, and I'm happy for anyone else to take the material (and build on it) for their own presentations (unless in a blatant commercially exploitative way). So let rip into any topics which you think could do with some presentation material.

Thursday, June 24, 2010

Interesting Times

One a day before the Socceroos faced they final match in the World Cup (they won, but still did't make the next round), interesting times were occurring in the political arena back home.
Australia's Prime Minister Kevin Rudd  (seen here, eating his own earwax) is about to be deposed. Australia follows the Westminster system of government. Local areas elect a member of parliament, and the parliament chooses the Prime Minister from within its ranks. Party politics inevitably gets involved so the Prime Minister is invariably the leader of the party with the most seats in the House.

The Labor party currently has the most seats, at least in the main House of Representatives. We will ignore the Senate, elected by a peculiar form of proportional representation and described by one former Prime Minister as "unrepresentative swill".

Having decided that the Prime Minister, elected less than three years ago after overturning a long running Liberal party administration, isn't the best choice to lead the party to the next election, the deputy leader is challenging for the job. Kevin Rudd isn't going quietly. He'd like at least a chance at the election he would have called within the next six months.

In an hour or so, we will find out whether Australia will get its first woman PM in Julia Gillard .

We've had women as party leaders before. The most recent example has been the Democrats party, now practically dead after committing the unmentionable act of actually keeping an election promise. Back in 1998, The Liberals proposed a wide-ranging Goods and Sevices Tax (GST) . The Democrats, under Meg Lees , campaigned on a GST that excluded food . When they got the balance of power (ie their votes were needed to secure passage of the Bill), they forced that amendment. The shock of a party actually following through on an election promise was such that practically no-one voted for them again. After a brief sojurn under a different woman leader (Natasha_Stott_Despoja ) , they vanished from parliament.

So the topic of conversation this morning will be either the Socceroos or politics. As a Pom (with dual citizenship) I may stick with the former.At least to encourage the Aussies to support the team playing Germany which flogged the Aussies 4-0, which was responsible for the poor goal difference that kept the Aussies from second spot. Oh, and the team playing the Germans next will be England.

Well its either that or support the Kiwis who still have a chance of qualifying if they can beat Paraguay. Though Nuno can support Portugal, I guess.

Friday, June 18, 2010

Stackoverflow and StackExchange

Like Tony Andrews, I am a big fan of the StackOverflow Q&A engine. I'm just not as good as him :)

If you haven't tried it, go ahead. Here are the questions tagged Oracle .

  • I like that it uses OpenId so I don't have another username/password to remember.
  • I like that code formatting is so simple. 
  • I like the simply numbered 'reputation' and the baadges you can collect.
  • I like the fact that all the questions, answers and comments are Creative Commons licensed and you can download the whole lot.
  • I like the RSS feeds based on tags.

The only drawback is that the site is aimed at programming. It has a sister site, serverfault.comwhich is more for admins. But that is mostly for system admins. There is a third site in the trilogy called superuser.com  aimed at, well, superusers. I guess that Apex websheets might end up fitting in there.

In database work there is a lot that can fall in either a developer or DBA camp. Performance tuning is the prime example. There is a limit to what can be achieve by tweaking SQL and at some point it becomes beneficial to address more fundamental issues through indexes or storage options. And some places have DBAs or 'the Oracle guy' who does everything, and at the other end of the spectrum who may have someone who's sole domain is backup/recovery with a separate 'architect' who has visions dancing in his head (maybe of data models but possibly of sugar plums).

So is there room for a similar site dedicated to Oracle or dedicated to databases in general ? One site to rule them all ?

The people behind stackoverflow are opening up stackexchange 2.0. People get to vote for potential new sites. And there are two proposals up and running, one for Oracle and one for Databases. If you are interested in either of these, follow the link and click the FOLLOW button.

Then, if you want, come up with some sample questions. You can get some ideas from other forums or what the people around you always end up asking.

Once a proposal is sufficiently defined, and has enough committed supporters, it can be launched and then its just a matter of driving questioners to it.

I should point out that OracleOverflow started up under the StackExchange 1.0 model (ie paid for, in this case by RedGate who are better know in the SQL Server field) but didn't really get the momentum. I don't believe this site meets the conditions for migration. If some of the premium members of the database community (Oak table members, ACEs, high-profile bloggers, oracle-l members etc) can voice their support for this, then it shouldn't have any problem getting off the ground.

Spread the word please.

Wednesday, June 02, 2010

Long intervals, PL/SQL Functions and subtypes.

I was doing some work with the interval day to second datatype recently. I hit a problem and thought it worth documenting my solution.

Generally, when you create a PL/SQL function or parameter, the scale/precision/size of the parameters (including the return value of a function) is undefined.

For example, you say FUNCTION show_price RETURN NUMBER or FUNCTION show_name RETURN VARCHAR2. You are not allowed to have FUNCTION show_price RETURN NUMBER(10,2) or FUNCTION show_name RETURN VARCHAR2(20), and the length/precision/scale is supposedly derived from the calling environment.

This new fangled INTERVAL DAY TO SECOND datatype is a bit odd, in that it has a default precision of 2 for the days.  That is, it can only store a number of days up to two digits (99 days). So if you have a function with a INTERVAL DAY TO SECOND return type, it will balk at a value of 9999 hours (over 400 days), and throw a ORA-01873 precision error, even if the precision of the variable in the calling environment is sufficient.

DECLARE
  v_int INTERVAL DAY (4) TO SECOND(0);
  FUNCTION hhmm_to_interval return INTERVAL DAY TO SECOND IS
    v_hhmm INTERVAL DAY (4) TO SECOND(0);
  BEGIN
    v_hhmm := to_dsinterval('PT9999H');
    RETURN v_hhmm;
    --
  END hhmm_to_interval;
BEGIN
  v_int := hhmm_to_interval;
end;
/
ERROR at line 1:
ORA-01873: the leading precision of the interval is too small
ORA-06512: at line 7
ORA-06512: at line 11

You are not allowed to specify the precision directly as part of the datatype returned by the function.

DECLARE
  v_int INTERVAL DAY (4) TO SECOND(0);
  FUNCTION hhmm_to_interval return INTERVAL DAY (4) TO SECOND IS
    v_hhmm INTERVAL DAY (4) TO SECOND(0);
  BEGIN
    v_hhmm := to_dsinterval('PT9999H');
    RETURN v_hhmm;
    --
  END hhmm_to_interval;
BEGIN
  v_int := hhmm_to_interval;
end;
/
ORA-06550: line 3, column 49:
PLS-00103: Encountered the symbol "(" when expecting one of the following:
to

You can use a PL/SQL SUBTYPE however.

DECLARE
  subtype t_int is INTERVAL DAY (4) TO SECOND(0);
  v_int t_int;
  FUNCTION hhmm_to_interval return t_int IS
    v_hhmm t_int;
  BEGIN
    v_hhmm := to_dsinterval('PT9999H');
    RETURN v_hhmm;
    --
  END hhmm_to_interval;
BEGIN
  v_int := hhmm_to_interval;
  dbms_output.put_line(v_int);
end;
/
+0416 15:00:00

PL/SQL procedure successfully completed.

The documentation is a bit skimpy here though. It states that where a parameter is based on a subtype, a NOT NULL constraint on the subtype is applied to the parameter. It also states that for numbers and VARCHAR2 it does NOT apply the size/scale/precision constraints (though it will apply a range constraint for a numeric type).

I added a comment for the documentation with the hope that the INTERVAL datatypes get a specific mention.

A comment by stezgr on the same issue raised on the stubbisms blog pointed out a built-in DSINTERVAL_UNCONSTRAINED subtype for this. When searching on that, Google brings up Steven Feuerstein's Oracle PL/SQL Programming book which describes using that built-in as the solution.

Friday, May 28, 2010

A quick quiz for a Friday.

What is the largest number you can get from a three character string ?

999 ?
Not even close.

How about 9 billion ?

select to_number('9e9') from dual;

TO_NUMBER('9E9')
--------------------
9,000,000,000

Any advance on that ?

Ambiguity resolved.

Another case where the order of the tables within the FROM clause has an impact.

Two tables, with the same column names (and data types).

> desc t2
   Name      Null?    Type
   --------- -------- ---------------
1  ID                 NUMBER
2  VAL                VARCHAR2(10)

> desc t1
   Name      Null?    Type
   --------- -------- ---------------
1  ID                 NUMBER
2  VAL                VARCHAR2(10)





Put together a SELECT * query. Now, which table has the value 'Blue' and which is 'Red'.
 
> select * from (select * from t2 join t1 on t1.id = t2.id);
QCSJ_C000000000400000 QCSJ_C0000 QCSJ_C000000000400001 QCSJ_C0000
--------------------- ---------- --------------------- ----------
                 1.00 Blue                        1.00 Red


It appears to be the order of the tables in the where clause (so the first two columns are from T2 and the latter from T1). Sensibly, that happens irrespective of the path the optimizer takes, so a change in plans won't suddenly change the results.

Yes, it is another reason NOT to use a SELECT *.
That said, do you notice the automatic column aliasing ?
SQL*Plus has truncated some of the column headers, and the actual column names (and datatypes) are :
QCSJ_C000000000400000           NUMBER
QCSJ_C000000000400002           VARCHAR2(10)
QCSJ_C000000000400001           NUMBER
QCSJ_C000000000400003           VARCHAR2(10)

It only happens with the ANSI join, with the old-style join throwing an error.

select * from (select * from t2 , t1 where t1.id = t2.id)
       *
ERROR at line 1:
ORA-00918: column ambiguously defined