Sunday, January 31, 2016

Multisessioning with Python

I'll admit that I pretty constantly have at least one window either open into SQL*Plus or at the command line ready to run a deployment script through it. But there's time when it is worth taking a step beyond.

One problem with the architecture of most SQL clients is they connect to a database, send off a SQL statement and do nothing until the database responds back with an answer. That's a great model when it takes no more than a second or two to get the response. It is cumbersome when the statement can take minutes to complete. Complex clients, like SQL Developer, allow the user to have multiple sessions open, even against a single schema if you use "unshared" worksheets. But they don't co-ordinate those sessions in any way.

Recently I needed to run a task in a number of schemas. We're all nicely packaged up and all I needed to do was execute a procedure in each of the schemas and we can do that from a master schema with appropriate grants. However the tasks would take several minutes for each schema, and we had dozens of schemas to process. Running them consecutively in a single stream would have taken many hours and we also didn't want to set them all off at once through the job scheduler due to the workload. Ideally we wanted a few running concurrently, and when one finished another would start. I haven't found an easy way to do that in the database scheduler.

Python, on the other hand, makes it so darn simple.
[Credit to Stackoverflow, of course]

proc connects to the database, executes the procedure (in this demo just setting the client info with a delay so you can see it), and returns.
Strs is a collection of parameters.
pool tells it how many concurrent operation to run. And then it maps the strings to the pool, so A, B and C will start, then as they finish D,E,F and G will be processed as threads become available.

I could my collection was a list of the schema names, and the statement was more like 'begin ' + arg + '.task; end;'

#!/usr/bin/python

"""
Global variables
"""

db    = 'host:port/service'
user  = 'scott'
pwd   = 'tiger'

def proc(arg):
   con = cx_Oracle.connect(user + '/' + pwd + '@' + db)
   cur = con.cursor()
   cur.execute('begin sys.dbms_application_info.set_client_info(:info); end;',{'info':arg})
   time.sleep(10)   
   cur.close()
   con.close()
   return
   
import cx_Oracle, time
from multiprocessing.dummy import Pool as ThreadPool 

strs = [
  'A',  'B',  'C',  'D',  'E',  'F',  'G'
  ]

# Make the Pool of workers
pool = ThreadPool(3) 
# Pass the elements of the array to the procedure using the pool 
#  In this case no values are returned so the results is a dummy
results = pool.map(proc, strs)
#close the pool and wait for the work to finish 
pool.close() 
pool.join() 

PS. In this case, I used cx_Oracle as the glue between Python and the database.
The pyOraGeek blog is a good starting point for that.

If/when I get around to blogging again, I'll discuss jaydebeapi / jpype as an alternative. In short, cx_Oracle goes through the OCI client (eg Instant Client) and jaydebeapi takes the JVM / JDBC route.