The Salty Economist

Things I Should Have Learned in High School
posts - 56, comments - 0, trackbacks - 0

The Raison d'être for F#

Well everything I've done up to now, I could just as easily done in VB or C#.  So what's the point?

The thing that really caught my eye about F# was the chance to convert my single-threaded code into multi-threaded code with minimal effort.  I have a number of projects where I end up writing batch processes.  These processes often involve independent operations that could take advantage of multi-core or processor machines.

The problem at hand fits the bill.  I have 100 industries that each have to have an index calculated; each index is separate from any other industries index.  Thus, I could care less in what order the indices are calculated.

First the basics:

A statement such as:

let aFunc = async {return func1 list1}

creates an object of type Async<??> where the ?? represent the return type.  Thus, in the statement above, if func1 returns an integer type, the object aFunc would be of type Async<int>.  The statement does not actually execute the function, but merely creates a set of instructions and stuffs them into the object where they can then be run later on.  The idea here is to create a set of these objects and then send them off to execute at the same time.

Here, compliments of Amanda Laucher, is the core logic of an asynchronous work flow:

let ListOfAsyncs =
    [ async { return func1 list1 }
      async { return func2 list2}
      async { return func3 list3}]

let Result =
    async { let! asyncWorkflow = Async.Parallel ListOfAsyncs
            return (Seq.fold (fun a b -> a + b) 0 asyncWorkflow) }
           
let R = Async.Run Result

The steps here are:

(1)  Create a list of functions.  The keyword async { } wraps a function call in curly braces.  The async<> syntax is a wrapper around each function that creates a data type Async<>.  If, in the example above, each function returned an integer, the result would be an Async<int> object represents a computation that will produce an integer result.  The result is that we have three Async<int> objects that sets of instructions that are all ready to be processed.

(2)  The next statement creates another Async Object named Result.  The keyword let! tells the system to halt execution until the statement is finished executing.  The method Async.Parallel tells the system to run the functions in the ListOfAsyncs in parallel.  Thus, all the functions in our list be queued up together and whose order of execution will be managed by the F# runtime.  We are telling the computer we don't care about the order of execution, just do it and get back to me.  As noted above, the let! keyword, halts further execution until all functions have completed.

The cryptic statement:

return (Seq.fold (fun a b -> a + b) 0 asyncWorkflow

The Seq.fold function is an interesting animal.  The basic syntax is:

Seq.fold (f) 0 aList

where f is some function, 0 is a starting value, and aList is some list.  For example, if you wanted to just add up a list of numbers you could write:

let somenumbers = [1..5]
let sum = Seq.fold (+) 0 somenumbers

This uses fold with the function '+' or add, an initial value of zero (0) and a list of numbers.

let somenumbers = [1..5]
let sum = Seq.fold (fun a b -> a + b) 0 somenumbers

 In the above two lines I have substituted (fun a b -> a + b) for the (+).

What is this?  It is an anonymous (unnamed) function that takes two arguments and returns their sum.  Well, that's great, but how does it work with the method Seq.fold?  Here's the trick.  Seq.fold takes the list and works left to right and uses each value as a input value.

In this example, the first addition operation that is performed is to take the starting value of 0 as the fist parameter and the first item on the list (=1) as the second parameter.  This, of course, returns the value of 1.  The second addition operation that is performed involves taking the result of the first operation (=1) as the first parameter and taking the second item (=2) on the list as the second parameter. The result for the second operation is 3; this is now used as the first parameter for the third addition operation.  The third item on the list (=3) is the second parameter.  The result is now 6.

So the series of operations is:

0 + 1 = 1 -> + 2 = 3 -> + 3 = 6 -> + 4 = 10 -> + 5 = 15 

In a sense, fold let's you recursively apply a function to a list (always from left to right) where the result from the function serves as the first input parameter to the next operation.  And, because you need a seed or starting value, you always need to provide the first parameter for the first operation.

So, looking back at the statement:

            return (Seq.fold (fun a b -> a + b) 0 asyncWorkflow) }

you can see that it is accumulating the results from the asyncWorkflow list.  The asyncWorkflow is alist returned when we run all our functions in parallel.  There will be one entry (the return value) per function in our group of async functions.  The sum of the results are returned in the variable 'Results'.

Note that we still have executed anything yet because of the async wrapper.  This means that the variable results is not an integer, for example, but an object of type Async<int>.  It is just another set of instructions.

(3)  We don't actually run anything until we hit the statement:

let R = Async.Run Result

The function Asysnc.Run actually sends our set of functions off to be executed.  The function takes an argument of Async<??> and awaits the return of all the asynchronous functions to return.

So now we have to employ this methodology to our problem.  Most of the work is done by a function I call runIndustryGroup.  This function grabs all of the major company groups that a given industry group.  As noted previously, I have 100 major industry groups.  These are aggregated into 9 basic industry sectors, such as resources, construction, services, retail, durable manufacturing, etc.).  Once I have all the major groups, I loop over them to build a list of input parameters that can used used to call my BuildIndustryIndex function.  I then create an Async list of objects that is a collection of the input parameters and my BuildIndustryIndex function.  Once the collection is built, I fire it off with an Async.Run command.

----------------------------------------------------------------------------------

 let runIndustryGroup (industry_order : int) (sd : DateTime) (ed : DateTime) =

    let conn = new System.Data.SqlClient.SqlConnection(connstr)
    let iOpen = conn.Open()

    let sql = "SELECT ID, GROUP_TYPE_ID, INDUSTRY_ORDER, MAJOR_ORDER, MINOR_ORDER, INDUSTRY_NAME, TICKER_ID FROM VL_INDUSTRIES WHERE INDUSTRY_ORDER = " + industry_order.ToString()

    let cmd = new System.Data.SqlClient.SqlCommand(sql, conn)

    let reader = cmd.ExecuteReader()

    let idxParams = new List<indexParam>()

    while reader.Read() do
        idxParams.Add {id = reader.GetInt32(0); group_type_id=reader.GetInt32(1); industry_order=reader.GetInt32(2); major_order=reader.GetInt32(3); minor_order=reader.GetInt32(4); industry_name=reader.GetString(5); ticker_id=reader.GetInt32(6); sd = sd; ed = ed;}
        ()

    let asyncList = new List<Async<int>>()

    for industry in idxParams do
        let (iret : Async<int>) = async {return BuildIndustryIndex industry.id industry.group_type_id industry.industry_order industry.major_order industry.minor_order industry.industry_name industry.ticker_id industry.sd industry.ed }
        asyncList.Add iret

    let Result =
        async { let! asyncWorkflow = Async.Parallel asyncList
                return (Seq.fold (fun a b -> a + b) 0 asyncWorkflow) }
               
    let R = Async.Run Result
    printf "Result = %i\n" R

----------------------------------------------------------------------------------

The first part of my code, collects all the major groups within a given industry and builds a list of input parameters.  I first set up a record type called indexParam and then create a list called idxParams.  The indexParam record type contains all the parameters I need for my BuildIndustryIndex function.

     type indexParam = {id: int; group_type_id: int; industry_order: int; major_order: int; minor_order: int; industry_name: string; ticker_id: int; sd : DateTime; ed : DateTime;}


    let idxParams = new List<indexParam>()

    while reader.Read() do
        idxParams.Add {id = reader.GetInt32(0); group_type_id=reader.GetInt32(1); industry_order=reader.GetInt32(2); major_order=reader.GetInt32(3); minor_order=reader.GetInt32(4); industry_name=reader.GetString(5); ticker_id=reader.GetInt32(6); sd = sd; ed = ed;}

My SQL Data reader is used to populate my list of parameters.  In the next snippet:

    for industry in idxParams do
        let (iret : Async<int>) = async {return BuildIndustryIndex industry.id industry.group_type_id industry.industry_order industry.major_order industry.minor_order industry.industry_name industry.ticker_id industry.sd industry.ed }
        asyncList.Add iret

I loop over my list of parameters and build by collection of Async<int> objects.  Not that for each item in the idxParams list, I create an object like:

let (iret : Async<int>) = async {return BuildIndustryIndex ....}

and add it to my list asyncList.

asyncList.Add iret

Once I have my asyncList, I can fire it off with an Async.Run command.

    let Result =
        async { let! asyncWorkflow = Async.Parallel asyncList
                return (Seq.fold (fun a b -> a + b) 0 asyncWorkflow) }
               
    let R = Async.Run Result
    printf "Result = %i\n" R

This code is in the exact same format as above.  The return value 'Result' is the sum of the return values for each of the BuildIndustryIndex calls.

The very last part of the program is trivial.  It is just a loop that runs over each of my major industry sectors.

So, at long last we can do some speed tests!

First, I'll run the program without the Async logic.

I ran it twice with the Async logic and got times of 3:07 and 2:51.

I ran it twice without the Async logic (meaning sequentially) and got times of 6:48 and 6:32.

This is pretty big.

I know this is not a rigorous test, but it sure gives me confidence that I can give my applications that require a slew of calculations a big boost!


 

 

Print | posted on Tuesday, June 23, 2009 8:38 AM |

Powered by:
Powered By Subtext Powered By ASP.NET