Saturday, January 30, 2010

To foreign realms and back again

No travel involved, but I had a few exotic characters in a column which needed resolution.

Firstly, I'll have a look for affected rows. I excluded any entries that only contain characters that I knew were acceptable.

select dump(a,1016), a, b
from
 (select regexp_replace(COLUMN,'[[:alnum:]/''%()> -.:=;[]','') a,
         COLUMN b
  from TABLE)
where a is not null
order by a;

The DUMP told me what the characters were, with the '16' option showing them in Hex rather than decimal, and the added 1000 to show the characterset (though that wasn't strictly necessary). Then I can use those HEX values in utl_raw.cast_to_varchar2 to make them into a string that I can use in a REPLACE.

In this demonstration I'll use those 'Smart Quote' characters that come from Word etc.

update TABLE
set COLUMN =
   translate(COLUMN,utl_raw.cast_to_varchar2('9192'),'""');
where COLUMN like '%'|| utl_raw.cast_to_varchar2('9192')||'%';


You can use DUMP(col,1001) and CHR(145)||CHR(146) [or whatever your characters are]. The advantage of UTL_RAW.CAST_TO_VARCHAR2 is that multiple characters can be included in a single value, which is handy for scripts or dynamic SQL and is more compact when dealing with longer strings.

Thursday, January 21, 2010

Book review - SQL Developer 2.1

Those of us who are old enough will remember a time when you paid for a software package and it came in a box containing a handful of floppy disks and a user manual. Not some web-link to a bunch of HTML pages, but an actual book, made from real paper that had previously been the home of a Norwegian Blue.

Those days are long gone. There was a tendency for people not to read them (hence the RTFM phrase which preceded JFGI) and it obviously saved the publishers money if they didn't print them. It also made the package a lot lighter. So the book gradually migrated to a PDF on the installation CD-ROM. But if you wanted the documentation to actually match the product, then you couldn't finalize the manual until after you finish coding and adding a delay there impacts your release date. So the PDF got reduced to a 'readme.txt' which pointed you to a web-site. Some software houses provide a decent set of documentation, often available for free download. Others have a forum for asking questions which may or may not get answered one day.

So, without a manual, we just jump in, hammer away at the keyboard and mouse until it breaks and then Google for a way to fix it. Sometimes it helps to take a step back.

I've spent a few days reviewing Packt Publishing's SQL Developer book, courtesy of the publishers. This book very much fits in the mould of the manuals that used to come with software. While it is not an Oracle Press book, it is by Sue Harper who is in charge of the product at Oracle, so it feels like the 'official line'. I remember attending a workshop she gave several years ago when she was spruiking ADF and J2EE to Forms Developers.

Overall the book would be excellent for an individual or team new to SQL Developer, especially those moving from good old SQL*Plus. The bit on the PL/SQL debugger would be especially useful to developers. [New year's resolution - DBMS_OUTPUT is not the only way to debug.] It struck me that there's room for more information on migrating scripts from SQL*Plus to SQL Developer, but maybe there's a blog post or two in that. If you are moving from TOAD or similar, then you may not get out as much from it, but it will lower the frustration levels in locating comparable functions.

I've been using the tool since it came out in beta as Raptor (and still prefer that name).I still got a few nuggets out and its pushing me to do more with customised reports and other simple extensions (ie those that don't involve Java). There's not as much in it for the experienced user, but maybe get your boss to buy a copy for the team and pass it around. At the least, when you employ the next newbie, they can read that and not interrupt you to ask how to generate a table script !

I agree with Lewis's review that the omission of details on the Unit Testing component is unfortunate. I also skipped the Data Modelling chapter as, at $3000 for an individual license, it is not something I expect to come across any time soon.

You can read a sample chapter here. It will give you a feel for the writing, but it is too limited to be of help to a total newbie and will not tell anyone experienced anything new, so it isn't the best taster. It does include a precis paragraph on each chapter though, which is worth reading through.
 

Tuesday, January 19, 2010

Still Reading

I'm working my way through Oracle PL/SQL Programming, Fifth Edition. I'm sort of of on the last chapter of the book, except I'm not.

If you check the preview on Google Books, you'll see it as "Edition: 5 - 2009 - 1186 pages" with Chapter 26 (on Object Orientation) finishing at page 1130 followed by the Appendix. That's what is in my hardcopy version.

The O'Reilly catalog has it at 1232 pages with chapters 27 (Calling Java from PL/SQL ) and 28 (External Procedures) included.

A mention in the book says that those chapters are available at the book's website, but I haven't located them yet. The book includes an offer that means it can be accessed for 45 days on Safari-online, which may be the intention. I'll look at that later this week.

Also in my intray is Packtpub's SQL Developer 2.1 book which they have kindly sent me for review. I'm part way through that and will post my thoughts next week.

Lite serialisation in 11gR2 with WAIT_ON_PENDING_DML

Back in September I posted about some new 11gR2 features.
One I fancied was WAIT_ON_PENDING_DML. Chet has already posted about one aspect of it on Oraclenerd, but this walkthough will be a bit different.

It caters for the scenario when,  every hour (or day) you want to look at rows inserted or updated since you last checked.The trick is to avoid missing entries that have been inserted before you start but not committed.

Firstly, create the necessasary tables
drop table test purge;
drop sequence test_seq;
create table test 

  (id number, val varchar2(40), created_on date, insert_sid number);
create sequence test_seq;

Now the first procedure. This simulates a user starting a transaction, waiting a few seconds (eg trying to upsell the customer with a 'Do you want fries with that') before committing.

create or replace procedure test_users (p_num in number)is
begin
  for i in 1..20 loop
    insert into test values 

      (test_seq.nextval,  to_char(sysdate,'DD/Mon/YYYY hh24:mi:ss'), 
       sysdate, sys_context('USERENV','SID'));
    dbms_lock.sleep(seconds => p_num);
    commit;
  end loop;
end test_users;
/

Now the complex demonstration procedure.
This starts up and reports on pending transactions on the table. Then it waits until all those transactions have committed (using the new routine). it finishes by reporting what it saw in the table and the outstanding transactions.


create or replace procedure test_sum is
  v_date date;
  v_flag boolean;
  v_num  number;
  v_fm   varchar2(30) := 'DD/Mon/YYYY hh24:mi:ss';

  cursor c_1 is
    select val, insert_sid from test 
    where created_on <= v_date 
    order by insert_sid, created_on ;

begin
  v_date := sysdate;
  dbms_output.put_line('Transactions at start');
  for i in (select start_time from v$transaction) loop
      dbms_output.put_line(i.start_time);
  end loop;
  dbms_output.put_line(rpad('Wait on pending at',32)||to_char(sysdate,v_fm));
  v_flag := DBMS_UTILITY.wait_on_pending_dml(

          tables => 'TEST',timeout => 6000, scn => v_num);
  dbms_output.put_line(rpad('Wait completed on pending at',32)

                                 ||to_char(sysdate,v_fm));
  IF v_flag THEN
    dbms_output.put_line('Transactions after wait');
    for i in (select start_time from v$transaction) loop
      dbms_output.put_line(i.start_time);
    end loop;
    dbms_output.put_line('Inserts after after wait');
    for i in c_1 loop
      dbms_output.put_line(rpad(i.insert_sid,5)||i.val);
    end loop;
  else
    dbms_output.put_line('Timeout');
  end if;
  dbms_output.put_line('Finished at '||to_char(sysdate,v_fm));
end test_sum;
/


I set off a simulation of three users hitting the table. They all work at different speeds, just so they are not always in sync when committing (to be more realistic).

declare
  v_num number;
begin
  dbms_job.submit(v_num, 'begin test_users(7); end;');
  dbms_job.submit(v_num, 'begin test_users(11); end;');
  dbms_job.submit(v_num, 'begin test_users(13); end;');
  commit;
end;
/

I then run the test_sum procedure and see the results:

Transactions at start
01/11/10 17:51:10
01/11/10 17:51:09
01/11/10 17:51:14
Wait on pending at              11/Jan/2010 17:51:15
Wait completed on pending at    11/Jan/2010 17:51:27
Transactions after wait
01/11/10 17:51:21
01/11/10 17:51:23
Inserts after after wait
36   11/Jan/2010 17:50:48
36   11/Jan/2010 17:51:01
36   11/Jan/2010 17:51:14
52   11/Jan/2010 17:50:48
52   11/Jan/2010 17:50:59
52   11/Jan/2010 17:51:10
53   11/Jan/2010 17:50:48
53   11/Jan/2010 17:50:55
53   11/Jan/2010 17:51:02
53   11/Jan/2010 17:51:09
Finished at 11/Jan/2010 17:51:27

When it started waiting at 17:51:15 there were three transactions in progress with the last starting just a second earlier. Twelve seconds later that transaction commits and the wait ends. We see that by now there are two new transactions so there hasn't been any exclusive lock preventing our other users from working. Session 53, firing every seven seconds, would have started a transaction at 17:51:16 and committed it at 17:51:23 before starting that transaction.

In my example I set the 'date/time' of interest before the wait_on_pending_dml call. Then, after it has returned, I only pick out transactions dated on or prior to that date. Since, in this example, the created date/time is sysdate I could store this date in a table and run the job on a schedule to pick out records added between the two runs. The date logic would ensure I only see each record once and the wait_on_pending_dml means that I would never miss a record because it has been inserted but not committed.

I can do all that without taking a lock on the table or forcing any serialization or additional waiting onto my user sessions. This was all possible before 11gR2, with repeated checks on v$transaction. This just makes things a lot simpler.

It should also be possible with sequence numbers, rather than date/timestamps, but you'd have to be careful to avoid caching of the sequence number in either RAC (where nodes may have their own cache and so use sequences out of order) or in the application layer.

Wednesday, January 13, 2010

Disabling specifc PL/SQL Warning messages

I've been reading through Oracle PL/SQL Programming, Fifth Edition, which I won when Steven Feuerstein came to the Sydney Oracle Meetup back in August.Lucky me.

One useful nugget I picked out was to do with PL/SQL Warnings. These were a great innovation in 10g where the compiler threw out messages when you were using dodgy code.

I've tried to be good and have all the warnings enabled. One thing bugged me though, and that was irritating messages that warned "PLW-07203: parameter 'P_VAL' may benefit from use of the NOCOPY compiler hint". I know that, unless there's a large volume of data involved, then the memory benefits of the hint aren't that great. There's probably a bit of a performance benefit if we don't need to duplicate the variable, but again so small that I've never needed to worry about it.

Reading the book, I found you could actually disable the messages for individual errors. My login.sql has now been amended to
  ALTER SESSION SET PLSQL_WARNINGS='ENABLE:ALL','DISABLE:07203';

It is documented, but it isn't particularly clear. Quick demo

create or replace package test_warn is
    procedure caller;
end;
/

create or replace package body test_warn is
  procedure called (p_val in out varchar2) is
  begin
      if to_number(p_val) < 5 then
          select to_char(sysdate) into p_val from dual;
      end if;
  end called;
  --
    procedure caller is
        v_val varchar2(200) := '0';
    begin
        for i in 1..20 loop
            called (v_val);
            dbms_output.put_line(v_val);
            v_val := to_char(i);
        end loop;
    end caller;
    --
end;
/
 
ALTER SESSION SET PLSQL_WARNINGS='ENABLE:ALL';
ALTER PACKAGE test_warn compile;
show errors package body test_warn
Errors for PACKAGE BODY TEST_WARN:
LINE/COL ERROR
-------- -----------------------------------------------------------------
2/21     PLW-07203: parameter 'P_VAL' may benefit from use of the NOCOPY
         compiler hint

ALTER SESSION SET PLSQL_WARNINGS='ENABLE:ALL','DISABLE:07203';
ALTER PACKAGE test_warn compile;
show errors package body test_warn

No errors.

Thursday, January 07, 2010

Cardinality Estimates are not Probabilities

Reading a post by Charles Hooper I had to express mis-givings around the following paragraph. Discussions are continuing as comments on that post.

"One of the problems with putting date values in number columns is this – if you select the range from 200810 to 200903, the optimizer will likely make the assumption that 200810 is just as likely of a number as 200808, 200812, 200814, 200816, 200820, 200890, 200900, etc. Some of those year/month combinations are simply not possible. In such a case, the optimizer should over-estimate the number of rows returned from that range when the column data type is NUMBER, and should be reasonably close when the column data type is DATE, since the optimizer knows that 200814 (2008-14-01), 200816 (2008-16-01), 200820 (2008-20-01), 200890 (2008-90-01), 200900 (2009-00-01), etc. could never be dates (and would be completely out of the serial sequence of dates). By putting the date type data into a DATE column, you have essentially added a constraint to the database to prevent invalid dates from being added. Additionally, date math, such as finding the number of days between 200802 and 200803 (compared to 200702 and 200703) is very simple – the answer is not 1 in both cases, but rather 29 and 28, respectively."

On the last point, while there may only be 28 or 29 days, that is still over 2.4 million seconds and there may be an entry for any second in that date range.

My understanding is that the optimizer makes the assumption, which is generally valid, that the database actually contains the data you are looking for. It is not worried about possible (or even impossible) values, just recorded values. Anything you want to know is in the database.

When you make a query, the optimizer is not determining the probability of a match, but an estimate of the number of rows returned ASSUMING there is a match. For that it uses the number of distinct values divided by the number of rows.

If I have a date column in a table and query that table for a specific value of that column, the optimizer will never assume that it will return zero rows, or even less than one row. It doesn't matter if there are three or three million rows, or if there are two or two million distinct values. In fact,if I apply ten different predicates to that select, it will still always assume at least one match. Even if the predicates are mutually exclusive, it makes the same assumption.

Demo Time

drop table test_150 purge;
create table test_150 (id number(6,3));

insert into test_150 values (1);
insert into test_150 values (150);
insert into test_150 select dbms_random.value(1,150) from dual ;
insert into test_150 select dbms_random.value(1,150) from dual ;
insert into test_150 select dbms_random.value(1,150) from dual ;

insert into test_150 
select a.* from test_150 a, 
   (select 1 from dual connect by level < 30);
exec dbms_stats.gather_table_stats(user, 'TEST_150',
      estimate_percent=>100, 
      method_opt=> 'FOR ALL COLUMNS SIZE 1');

select to_char(id,'999.999'), count(*) 
from test_150 group by id order by id;

TO_CHAR(    COUNT(*)
-------- -----------
   1.000       30.00
  60.781       30.00
  84.342       30.00
 111.778       30.00
 150.000       30.00

The optimizer knows there are five distinct values, ranging from 1 to 150, and 150 rows in total. With no histograms it will assume an even distribution and every values occurs 30 times. That's true in this case (but that shouldn't make any difference).

set autotrace on
select count(*) from test_150 where id = 17.17;

Execution Plan
-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     1 |     4 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |          |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| TEST_150 |    30 |   120 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Between 1 and 150, there are another 148 integers. Going down to three decimal places, that is 150,000 potential values. Ignoring all that it says that, in a seeming defiance of the rules of probability, we will have 30 rows with that precise value. It is ignoring probability on the assumption that if you are asking about a particular value, it is because it is in there.

select count(*) from test_150 where id in (2,3,4,5);

Execution Plan
-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     1 |     4 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |          |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| TEST_150 |   120 |   480 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------

In this case it assumes that all four values are in there, so 120 rows will be returned.
We can try something silly.

select count(*) from test_150 where id in (2,3,4,5,6,7);

Execution Plan
-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     1 |     4 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |          |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| TEST_150 |   150 |   600 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------

It knows that there are 150 rows in the table, so obviously no more than that can be returned (at least as far as estimations are concerned).

Check constraints still obey the 'there must be one' rule. Rather than override the estimate, it simply puts in a filter that stops that path being executed (and incidentally allowing the cost to be ignored).

alter table test_150 add constraint tst_150_pos check (id > 0);

select count(*) from test_150 where id < -5;

Execution Plan
--------------------------------------------------------------------------------
| Id  | Operation           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |          |     1 |     4 |     0   (0)|          |
|   1 |  SORT AGGREGATE     |          |     1 |     4 |            |          |
|*  2 |   FILTER            |          |       |       |            |          |
|*  3 |    TABLE ACCESS FULL| TEST_150 |    29 |   116 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(NULL IS NOT NULL)
   3 - filter("ID"<(-5))
  
An extension to this occurs with joins. Try this query

select to_char(count(*)) from c, test_150 t where t.id = 295 ;
Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |     1 |     4 |  1718   (6)| 00:00:21 |
|   1 |  SORT AGGREGATE       |          |     1 |     4 |            |          |
|   2 |   MERGE JOIN CARTESIAN|          |  1906K|  7448K|  1718   (6)| 00:00:21 |
|*  3 |    TABLE ACCESS FULL  | TEST_150 |     1 |     4 |     3   (0)| 00:00:01 |
|   4 |    BUFFER SORT        |          |  2367K|       |  1715   (6)| 00:00:21 |
|   5 |     TABLE ACCESS FULL | C        |  2367K|       |  1715   (6)| 00:00:21 |
----------------------------------------------------------------------------------

The id value is out of bounds, but it still shows up as 1 row. But the 2.3 million rows in table "C" are reduced by about 20%. In effect it appears like the optimizer thinks that 0.8 rows will be returned. These optimizer calculations probably vary between versions (this was XE).


Between


It thought that the following would give 120 rows
select count(*) from test_150 where id in (2,3,4,5);
However if you try the following:
select count(*) from test_150 where id between 2 and 5;
-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     1 |     4 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |          |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| TEST_150 |    34 |   136 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------


A different answer and totally inconsistent with the estimate for the earlier IN query which could only ever return fewer rows. Firstly it has assumed that 30 entries will match the value '2'. Then, with 150 entries between 1 and 150, it assumes an entry each '1', so it adds a potential match for 2, 2+1 (3), 2+2 (4) and 2+3 (5). It adds those 4 potential matches to the 30 to get 34.


select distinct id from test_150 where id between 2 and 16;
-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     2 |     8 |     4  (25)| 00:00:01 |
|   1 |  HASH UNIQUE       |          |     2 |     8 |     4  (25)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| TEST_150 |    45 |   180 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------


You can see it now expects 45 rows matching with 2 distinct values for id (whereas for a top of 15, it would only have 44 matches and 1 unique value).

PS. Depending on your database version, the optimizer may be completely different. I've used XE for this.

PPS. If you've read this far, go and get Jonathan Lewis' optimizer book.

Wednesday, January 06, 2010

A twitter data structure using linked lists


I recently did a guest post regarding data modelling on Chet's ORACLENERD blog. My original question was how would you normalize a tweet. Following the comments from there, I'm going to explore some an implementation in a Navigational Database rather than an SQL/RDBMS. Irrespective of whether or not you use a RDBMS for persistance, if you want to rapidly search a large amount of data you need to think about data structures.

Access Paths

A tweet requires different 'access paths' for several use cases. Firstly, I might want to see all the tweets made by an individual (such as oraclenerd). Secondly, I might want to see all the tweets that are replies to my user (syd_oracle) or that mention me. Finally, I may want tweets that relate to a particular subject (such as #oracle). I'll assume we want to access the most recent tweets first.

Thinking data structures, all of these start off with knowing a key, or a starting point for the query. Therefore they can simply be managed as a linked list.


Access by tweeting user

For the first scenario, the sending user is the starting key. The user oraclenerd has 3 tweets [this is just an example :) ]. For arguments sake, we will give them numbers 123, 179 and 201. The 'oraclenerd' user entry can have a 'sent' pointer set to tweet 201, and tweet 201 points to the sender's earlier tweet 179 which, in turn, points to 123 where the list ends. 

When he tweets "Hey, I've got EBS running", we create tweet 207 and link that to the earlier sent tweet, number 201, then repoint the 'oraclenerd' sent list pointer away from 201 to 207. If the case of a critical failure between the creation of 207 and the repointing of 'oraclenerd', upon restart we will have 'oraclenerd' still pointing at 201. While 207 may exist in the persistance layer, nothing points to it and it will never get accessed so it is effectively dead. In an ACID database, if we applied the change as a transaction, the creation of 207 would have been rolled back. If we don't really care about consistency we can forget it, and perhaps have a weekly cleanup job.

That all works easily because oraclenerd will only make one tweet at a time. We can scale by spreading our load, maybe alphabetically with node 1 having all the tweeters 'a-m' and node 2 having 'n-z'. More tweeters and we set up a new node and split 'a-m' into 'a-g' and 'h-m' (though realistically it could be some hash function for balance, or perhaps new tweeters on new nodes). We haven't had to worry about inter-node or concurrency issues...yet.


Replies and Mentions

The 'replies' are a bit more complicated. One approach is this:

'oraclenerd' tweets "@syd_oracle, what is the weather like ?". 

Firstly we create the tweet (number 211). Remember in the first example, I had tweet '207' point to 'oraclenerd's earlier tweet of '201' ?  Now '211' has to link to the previous tweet number '207' but it also has to have another pointer to the previous reply to syd_oracle because I want to see all the replies to me. The code needs to look up the latest reply to 'syd_oracle' (eg tweet number 150), and have the 'reply' pointer of 211 link to 150. Then we update my 'replies' linked list to include the tweet 211.

If we get a failure before that last step, then it can show up in his list but not mine. Untidy, but not a big issue. A worse scenario is that we are on different nodes and we get into a problem about having a single point of truth, and failure, for the tweet. If the node storing oraclenerd's tweets fails, then when I look at my replies, my user entry points to an unavailable tweet and, because that is the only point I have for my linked list, I can't see any of my earlier replies.

When we get into 'mentions' we see that solution design for 'replies' was all wrong, or at least it can't simply be extended to 'mentions'. While a tweet can only come from a single sender, and be a reply to one other user, it can have many other users mentioned (to a maximum of about 40 given the length of a tweet and the need for the separators, flags for indicating the mention and the username). Add in multiple nodes and we'd be lucky if anything works.

Rise of the Clones

So we throw away that 'replies' design. This time, when we create tweet 211 for oraclenerd, we also create a clone of that tweet for syd_oracle (say 215). syd_oracle's 'replies' list is pointed to 215 and 215 is pointed to its previous reply of 150. If oraclenerd's node goes down, my '215' tweet clone is still there on my node so I'm not affected. What's more, this extends easily to 'mentions'. Each user record simply has pointers to their last sent message, their last reply received and their last mention. It is pretty similar for topics too, though the 'topic' record only has one linked list.
 

Once you get into the design, you can see that sending a single tweet might create a number of tweet clone records. This report indicates that the count of 7 billion plus tweets is based on a tweet duplicated for each follower.

The concepts for modelling followers would be similar to those for mentions. Each user would have a linked list of followers. Every time they tweet, the tweet would be cloned for each follower in a similar manner to the other clones for mentions. The clone would be linked to the person following the tweeter so they would have a list of all tweets for users they are following.


[Note: You don't have to clone the full 140 character tweet. You can have a single version of the text and clone pointers to it. That just makes the example a bit more complicated though. You might have some duplicates of the text for node resilience and load balancing purposes.]


Locking

New tweets update the records of the sender, users mentioned, topics and follows. These updates, rather than the inserts, are where we get into concurrency issues. In the example, I had a reply of '150' and then the following happened.

  1. Process reads syd_oracle's latest reply (150).
  2. Process creates the tweet '215', pointing to 150.
  3. Process updates syd_oracle's latest reply to point to '215'.
What happens if I'm dealing with two replies at the same time ? Pretty unlikely for me as I'm not that busty and computers are pretty fast, but possible for a hot topic like a musician's demise.

What can happen is

  1. Process 1 reads #elvis_dead's latest tweet (150).
  2. Process 1 creates the tweet '215', pointing to 150.
  3. Process 2 reads #elvis_dead's latest tweet (150).
  4. Process 2 creates the tweet '218', pointing to 150.
  5. Process 1 updates #elvis_dead's latest tweet to '215'
  6. Process 2 updates #elvis_dead's latest tweet to '218'
Without some form of locking, tweet 215 has got lost in the rush. Is that important ? That's up to you, but that sort of race condition is near impossible to achieve repeatedly in a test run,  probably won't show up in production until you've got a heavy workload and will be tricky to reproduce.


For followers, there is also scope for lost tweets if you follow multiple people who tweet at the same time as they both try to insert their tweet into your list. There are also complications when you 'unfollow' someone, as you need to walk their list of followers (on their node) and get deleted, plus delete their tweets from your clone army. Deletes in linked lists are fairly simple. If 207 points to 201 which points to 179, then you can point 207 straight to 179 and delete 201. But again, without locking, you can get trapped if someone deletes 207 as you are trying to repoint it to 179.

Consistency

If one node has to be shutdown, or crashes, and is restored from a backup with some loss of information, then you will get inconsistencies. If you are using a persistance layer that doesn't have a locking mechanism, you will get inconsistencies. If you don't have a robust queuing mechanism to keep things running when nodes go down, you'll get inconsistencies.

An interesting overview on the tradeoff between consistency, availability and partitioning (or sharding/replication) is here (once you get past the first few paragraphs about punk rock) and even links to a blog post about Twitter and SQL