So I have a program that calculates the pay for contractors that deliver logs to sawmills. Every week a given mill might receive 500 to 1000 loads. For each load, the mill might have to make separate payments to the trucker, the logging contractor, the landowner, etc. Let's say for this example that the mill receives 1,000 loads and must make payments to 6 separate vendors, for a total 6,000 transactions. Calculating the payment on each transaction is not simple and requires side calculations to contend with things like overloaded trucks, volume conserions, and rate schedule lookups based on the species, product type, location, terrain. etc.
My program takes all the rate schedule information and marries it up to each load and calculates the transaction amount. The proecessing time for a pay run of this size is about 15 minutes, or about .15 seconds per transaction (= .9 seconds per load). Not a long time, except that when clerks make data entry mistakes and/or rate mistakes, the routine has to be run multiple times. The time can really add.
The good thing about this problem is that the pay for one load is independent from another load. This is where I got the grand idea of making the pay routine a multi-threaded adventure. I figure if I have a quad-core machine, I could make use of the extra power and spin up to say 4 threads at a time and greatly reduce the calculation time.
The pseudo-code for the base calculation function goes something like this:
Sub CalcPay
Dim i as integer
Dim ldRS as recordset
ldRS = LoadRecordset("SELECT * FROM LOADS WHERE LOAD_DATE BETWEEN '2014-05-01' AND '2014-05-07'") 'Load up a recordset of loads
Do Until ldRS.EOF = False
Dim aLoad as clsLoad
aLoad = new clsLoad
aLoad.Load(ldRS("ID").value) 'Load up the data from the load recordset into a clsLoad Object
iRet = CalcLoadPay(aLoad) 'Calc the Load Pay
ldRS.Movenext
Loop
End Sub
Sub CalcPay(aLoad clsLoad)
...
Do a Bunch of Calculations
....
id = getNextIndex
UpdateDatabaseTableForPay(id, aLoad)
End Function
Function getNextIndex() as Integer
Dim aRs as recordset
Dim nextID as integer
aRs = LoadRecordSet("SELECT NEXT_ID FROM TBL_INDICES WHERE MYINDEX = 'PAY_CALCS'") 'Get the Next ID
nextID = aRS(0).Value
sql = "UPDATE TBL_INDICES SET NEXT_ID = NEXT_ID + 1 WHERE MYINDEX = 'PAY_CALCS'" 'Update the Indices Table
aConn.Execute
End Function
So, what we have here is a main calculator routine tht loops over the list of loads. For each load in load up an object with all the data for the load. The routine then calls a calculator that does all the pay calculations. When the calcutions are done, it then writes the result to the database. The write the the database takes three steps:
1. Get the next transation ID.
2. Update the tables that stores the transaction IDs (in this example, the table is named TBL_INDICES).
3. Write out the pay data using the transaction ID as the primary key.
The program runs as is. But, I want to make it run faster using a threaded approach.
The changes to the main routine are as follows:
Imports System.Threading
Sub CalcPay
Dim i as integer
Dim ldRS as recordset
Dim parmams as clsParams 'See Below
Dim loadCollection as Collection 'collection of loads currently being processed
loadCollection = new Collection
ldRS = LoadRecordset("SELECT * FROM LOADS WHERE LOAD_DATE BETWEEN '2014-05-01' AND '2014-05-07'") 'Load up a recordset of loads
Do Until ldRS.EOF = False
Dim aLoad as clsLoad
aLoad = new clsLoad
aLoad.Load(ldRS("ID").value) 'Load up the data from the load recordset into a clsLoad Object
payThread = New Thread(AddressOf ProcessLoad)
params = new clsParams
params.aLoad = aLoad 'The Load
loadCollection.Add(aLoad.ID) 'Add load to collection
parmas.loadCollection = loadCollection
payThread.Start(params)
Do Until threadColl.Count < 6 'Don't add another thread until there are less than 6 running
Application.DoEvents()
Loop
ldRS.Movenext
Loop
Do Until threadColl.Count = 0 'Stall until done
Application.DoEvents()
Loop
End Sub
Class clsParams
Public aLoad as clsLoad
Public loadCollection as collection
End class
Sub CalcPay(ldParams as clsParams)
aLoad = ldParams.aLoad
...
Do a Bunch of Calculations
....
id = getNextIndex
UpdateDatabaseTableForPay(id, aLoad)
ldParams.loadCollection.Remove(aLoad.ID) 'Remove load from collection when done!
End Function
Things to note:
1. I have created a collection of load IDs. I add an ID every time I start a new thread. I pass the collection along the thread. When the thread finishes its work, it removes the load ID from the collection. Yes, this works. I then monitor the count of IDs in the collection and don't start a new thread until there are less than six. Otherwise, I would end up spinning up a zillion threads and accomplish nothing.
2. I can't pass more than one parameter to a thread. Thus, I have to make a class (called clsParams). I then add both my load object and my load collection to my class and then pass the class to the thread. More work than I should have to do, but this too does work.
3. So far so good. Everything works for a bit and thing the program crashes and burns. I get a bunch of errors for trying to insert transaction records with duplicate IDs. I study the code for a while and realize that if 2 threads are real close in their timing that thread 1 can get a new ID. And thread 2 can get the same ID IF thread 1 has NOT finished updating the table that stores the indices.
Here is the code again:
Function getNextIndex() as Integer
Dim aRs as recordset
Dim nextID as integer
aRs = LoadRecordSet("SELECT NEXT_ID FROM TBL_INDICES WHERE MYINDEX = 'PAY_CALCS'") 'Get the Next ID
nextID = aRS(0).Value
sql = "UPDATE TBL_INDICES SET NEXT_ID = NEXT_ID + 1 WHERE MYINDEX = 'PAY_CALCS'" 'Update the Indices Table
aConn.Execute(sql)
End function
You can see that if the threads are close enough together that 2 transactions could get the same index. The ordering would be:
1. Thread 1: Read the Next_ID
2. Thread 2: Read the Next_ID
3. Thread 1: Update the TBL_INDICES table.
4. Thread 2: Update the TBL_INDICES table.
What to do?
The first thing I tried was to wrap the 2 statements in a transaction, as so:This does NOT work!
Function getNextIndex() as Integer
Dim aRs as recordset
Dim nextID as integer
aConn.BeginTransaction
aRs = LoadRecordSet("SELECT NEXT_ID FROM TBL_INDICES WHERE MYINDEX = 'PAY_CALCS'") 'Get the Next ID
nextID = aRS(0).Value
sql = "UPDATE TBL_INDICES SET NEXT_ID = NEXT_ID + 1 WHERE MYINDEX = 'PAY_CALCS'" 'Update the Indices Table
aConn.Execute(sql)
aConn.CommitTrans
End function
I had always thought that a transaction would lock the row for selects and updates/inserts. But it does not. It may stop and insert/update into the TBL_INDICES table while the transaction is open, but it does NOT block the select statement.
Thus, I am no better with my duplicates.
So, I Google some and find the syncLock statement.
Basically, the SyncLock statement says: No other thread can execute this code until I am finished with it. By the way, it can be any code, not just database reads and writes.
The structure of syncLock is:
SyncLock(someObject)
...
Do a bunch of stuff
...
End SyncLock
The parameter "someObject" stores the state of the "locked code." At first, I tried making it a local variable (see below)
Function getNextIndex() as Integer
Dim aRs as recordset
Dim nextID as integer
Dim someObject as New Object
syncLock(someObject)
aConn.BeginTransaction
aRs = LoadRecordSet("SELECT NEXT_ID FROM TBL_INDICES WHERE MYINDEX = 'PAY_CALCS'") 'Get the Next ID
nextID = aRS(0).Value
sql = "UPDATE TBL_INDICES SET NEXT_ID = NEXT_ID + 1 WHERE MYINDEX = 'PAY_CALCS'" 'Update the Indices Table
aConn.Execute(sql)
aConn.CommitTrans
End SyncLock
End function
This does NOT work. Although the executing thread knows about someObject, the other threads do not. So they just thumb their noses and plow on.
You have to make someObject a Global variable so that all the threads can see it. Once I made this change, my program finally worked.
Its now time for a DeSchutes IPA