Comments on Sydney Oracle Lab: Why we learn maths

Chen: I did not say that p-values are calculated o...

2011-05-11T15:15:05.579+10:00

Chen:
I did not say that p-values are calculated on the sample as a whole.
Like I said: google the term. It's worth it.

@Gwen, Thanks. I've done a follow up about th...

2011-05-08T16:53:04.983+10:00

@Gwen,

Thanks. I've done a follow up about the death of delete. I'll have to stretch my maths out to come up with a more extensive description of where update concurrency starts to break things.

@Hemant Happy for the UI (or a mid-tier layer) to...

2011-05-08T16:51:25.476+10:00

@Hemant

Happy for the UI (or a mid-tier layer) to calc an average as TOTAL/COUNT and maybe show "AVG 6.5 from 12 reviews" or whatever.

Agree that small number of samples has little meaning. More so if people are volunteering opinions rather than a really random sample.

DBMS_STATS determining averages/min/max from sampling would make an interesting post. But I don't think my maths is up to it. Perhaps Craig Shallahamer might take it up.

@Chet. Performance should ALWAYS be a consideratio...

2011-05-08T16:45:57.812+10:00

@Chet. Performance should ALWAYS be a consideration. General rule of thumb would be where the work involved in maintaining the summary is outweighed by the work saved by not recalculating for each query.

But that is pretty vague as the former includes work in development / maintenance rather than just work by the DB engine.

Volume of data, rapidity of change, consistency all come into play.

Excellent post. Especially the point about the que...

2011-05-07T09:12:52.711+10:00

Excellent post. Especially the point about the queues, I've been trying to get people to do this forever.

Since its a bit of a non-DBA or noob-friendly post, I think more of an explanation on why inserts are not a concurrency issue but updates could help some readers. I know why, but I'm not sure its completely intuitive.

And noons:

P-value is the probability that if what you want to say about your data is wrong, you still got the results you did in your experiment by pure chance.
The lower the probability of getting your results by chance even if you are wrong, the more significant your results are.

and as far as I know - p-values are calculated on analyzed data (averages and variance), not on the sample as a whole.

Good points, folks. One of the things I had to ...

2011-05-06T18:57:59.196+10:00

Good points, folks.

One of the things I had to get my head around during the Statistics and Probability semester in uni was the notion of "p-values". Goggle it.

In a nutshell: they show us the degree of probability that a given sample is significant before we publish any ratios based on that sample.

Rarely calculated nowadays, where simply doing a ratio is considered by newspapers as a "statistic".

"I've got 10 reviews with a total rating ...

2011-05-06T12:02:32.419+10:00

"I've got 10 reviews with a total rating of 70" (for "Thor")

But what if "Fast and Furious" has a total rating of 40 but only 5 reviewers ?

Would people look at "70 and 10" versus "40 and 5" (i.e. a pair of figures each) when comparing the ratings for two movies ? Or would it be easier to look at "7" and "8" ? If you present two figures ("70" and "10") for the same movie, most people would read only 1 figure ("70").

Then, again if only 1 person has given a rating of "10" to "Source Code", should we accept that this film has an average rating of "10" ?

So, an average makes sense only after a minimum (threshold) number of ratings have been entered.

If a movie has not been rated by at least 10 people, I would not present the average (or total) rating score at all.

I've done something similar to the movie ratin...

2011-05-06T01:33:03.305+10:00

I've done something similar to the movie ratings, only it's financial transactions.

On the account line, store Current Balance and any other things pertinent.

From a purist perpective, what if performance were never a consideration? Maybe a small data set or you have something like Exadata? Would you then go the Parent/Child way and roll it up each time it was looked up?