Wednesday, July 20, 2011

Turning it around

I wanted to share an insight into the REVERSE operation inspired by a stackoverflow question.

The question was about using the built-in REVERSE function to find the characters at the end of a string. Sticking the REVERSE keywork in the CREATE INDEX statement wasn't doing the job. Adam Musch responded to say that REVERSE should be used solely to avoid contention and that the optimizer doesn't do range scans on reverse key indexes.

Markus Winand suggested a manual function-based index rather than just using the REVERSE keyword in the CREATE INDEX statement. While that might work sometimes, it isn't a solution I'd recommend.

With a quick experiment you can see one reason why.

select 'é', dump('é'), dump(reverse('é')) from dual;

'É'DUMP('É')DUMP(REVERSE('É'))
éTyp=96 Len=2: 195,169Typ=96 Len=2: 169,195

A multi-byte character has the BYTES reversed. As such, the character is not actually preserved and cannot be matched. LIKE, BETWEEN and regular expressions matches against another string (column or literal) become meaningless. The only thing that can be done is to reverse another string and do a direct byte-to-byte comparison.

That means you might be okay if you do a

SELECT * FROM table
WHERE reverse(column) LIKE reverse(:var)||'%'

Unless you can come up with a failure case....

1 comment:

Joel Garry said...

I can't help but wonder if some multibyte character set would have characters that reverse to the equivalent of line feed or wildcard characters.

word: expandl