The Salty Economist

Things I Should Have Learned in High School
posts - 56, comments - 0, trackbacks - 0

Getting Started with F#(4)

OK, so now we have a function that will return a vector of prices for a given stock ticker, all indexed to the first price in the list.

My strategy here will be:

(1) to loop over all the companies in any specific industry and retrieve each company's indexed price data

(2) load each company's data into a row of a two-dimensional array, where the columns are dates

(3) once the array is loaded up, calculate the value of any given day as the average value across all the companies that have non-zero entries for the day.

So here is my code:

let BuildIndustryIndex id group_type_id industry_order major_order minor_order industry_name ticker_id (sd : DateTime) (ed : DateTime) =

    let selsql = getCompanyCountSQL group_type_id industry_order major_order
    let conn = new System.Data.SqlClient.SqlConnection(connstr)
    let iOpen = conn.Open()

    let cmd = new System.Data.SqlClient.SqlCommand(selsql, conn)

    let reader = cmd.ExecuteReader()
    let iRead = reader.Read()
    let nCo = reader.GetInt32(0)
    reader.Close()
   
    let xArray = Array2.create nCo ((1+Convert.ToInt32(ed.ToOADate()))-(Convert.ToInt32(sd.ToOADate()))) 0.0

    let indArray = Array.create ((1+Convert.ToInt32(ed.ToOADate()))-(Convert.ToInt32(sd.ToOADate()))) 0.0

    let selsql = getCompanySQL group_type_id industry_order major_order
    let cmd = new System.Data.SqlClient.SqlCommand(selsql, conn)
    let reader = cmd.ExecuteReader()

    let mutable i = -1
    let mutable j = 0
    //
    //Loop over each company in the group
    //
    while reader.Read() do
    
        i <- i+1
        //
        //Get the Prices for the company
        //
        let prList = getPriceData({tick = reader.GetInt32(1); sdate = sd.ToString(); edate = ed.ToString()})

        //
        //Fill up the Price Array
        //
        for xPrice in prList do
            j <- Convert.ToInt32(Convert.ToDateTime(xPrice.adate).ToOADate())-Convert.ToInt32(sd.ToOADate())
            xArray.[i, j] <- xPrice.price
        ()
    ()       
    //
    // Loop over the dates and calculate the average index
    //
    let nDates = 1+Convert.ToInt32(ed.ToOADate())-Convert.ToInt32(sd.ToOADate())
    let mutable (xObs : int) = 0
    for jj = 0 to nDates-1 do
        let mutable (nObs : int) = 0
        for ii = 0 to nCo-1 do
             indArray.[jj] <- indArray.[jj] + xArray.[ii, jj]
             if xArray.[ii, jj] > 0.001 then
                  nObs <- nObs + 1
        ()
        xObs <-
            if nObs > xObs then nObs else xObs
                 
        if (nObs > 0 && nObs >= xObs) then
             indArray.[jj] <- indArray.[jj] / Convert.ToDouble(nObs)
        else
             indArray.[jj] <- 0.0   
    ()
    //
    // OK, we should be done, so let's write it out
    //

Now let's go through the code line by line

The function prototype:

let BuildIndustryIndex id group_type_id industry_order major_order minor_order industry_name ticker_id (sd : DateTime) (ed : DateTime) =

builds an index for a specific industry.  It takes the following arguments:

id : a unique group identifier

group_type_id:  an indicator for the type of group.  I am working with two types of groups here:  whole sectors, such as resources or durable manufacturing, and individual major industries, such as petroleum refining or computer software.  So, my group_type_id's will either be 1 or 2.

industry_order:  Indicates the specific sector

major_order:  Indicates the specific major industry

minor_industry:  A placeholder that I may use in the future if I decide to break down a major industry into sub industries.

ticker_id:  Every industry group is assigned a pseudo ticker_id.  I use this ticker identifier to store my results in the same table as all my other stock prices.

sd, ed: The start and end date for the index.

In order to build my two-dimensional array, I first need to know the bounds.  Thus, I first need to know the number of companies in an industry.  The following code creates a SQL select statement (returned from the function getCompanyCountSQL and executes it.

    let selsql = getCompanyCountSQL group_type_id industry_order major_order
    let conn = new System.Data.SqlClient.SqlConnection(connstr)
    let iOpen = conn.Open()

    let cmd = new System.Data.SqlClient.SqlCommand(selsql, conn)

    let reader = cmd.ExecuteReader()
    let iRead = reader.Read()
    let nCo = reader.GetInt32(0)
    reader.Close()

The statement: nCo = reader.GetInt32(0) returns the number of companies in the industry group.  My function getCompanyCountSQL is shown below:

let getCompanyCountSQL group_type_id industry_order major_order =
    match group_type_id with
    | 1 -> " SELECT COUNT(*) FROM (( TBL_TICKERS T " +
            " INNER JOIN VL_INDUSTRY_TICKERS IT ON T.ID = IT.TICKER_ID ) " +
            " INNER JOIN VL_INDUSTRIES IG ON IT.GROUP_ID = IG.ID ) " +
            " WHERE IG.INDUSTRY_ORDER = " + industry_order.ToString()
    | 2 -> " SELECT COUNT(*) FROM (( TBL_TICKERS T " +
            " INNER JOIN VL_INDUSTRY_TICKERS IT ON T.ID = IT.TICKER_ID ) " +
            " INNER JOIN VL_INDUSTRIES IG ON IT.GROUP_ID = IG.ID ) " +
            " WHERE IG.MAJOR_ORDER = " + major_order.ToString() +

       " AND IG.INDUSTRY_ORDER = " + industry_order.ToString()
    | _ -> ""

This function returns a different SQL statement depending on the group_type_id that is passed in.  Don't worry about the SQL text.  The most important thing to take away from this code snippet is the structure of the 'match' statement in F#, which is equivalent to a 'Select' statement in VB  or 'Switch' statement in C#.

The basic structure is:

match expression with
| option1 -> result1
| option2 -> result2
| option3 result3
| _ -> defaultResult

Each pipe (|) defines a case or condition, the -> basically means "if the expression equals the option, return the result". The underscore (_) is the default result, meaning it matches anything that is not explicitly enumerated.  In general, you always need the default case to return a result (dummy) that matches the type of all the other results.  This is because, all the cases must return the same data type.  If you omit the default case the F# compiler will implicitly add a default case with a return type of the 'unit type.'  This type is not the same as a string or integer or double, thus you will get an error.

So what is a unit type?

According to Microsoft:

"The unit type has a single value, and that value is indicated by the token ()."

I suppose it goes without saying that all the options must be of the same type.

Note that options can be chained.  For example:

match expression with
| option1 | option4 -> result1
| option2 | option5 | option6 -> result2
| option3 result3
| _ -> defaultResult

chains options 1 & 4 together, as well as 2, 5 & 6.

There is also another syntax for simple matches.

The following two examples are equivalent:

let some_function_name expression =
    match expression with
    | option1 -> result1
    | option2 -> result2
    | option3 result3
    | _ -> defaultResult

let some_function_name = function
    | option1 -> result1
    | option2 -> result2
    | option3 result3
    | _ -> defaultResult


The latter example implicitly accepts an input parameter (as the same type as the options) and matches it to the options.   Though the former may be more elegant and require less typing,  personally, I prefer the former as the latter syntax is too opaque.

So, now let's get off the this tangent and get back to the main track.

We now have obtained the number of companies in an industry in form of the variable nCo.  We can now instantiate our array of prices.  The statement:

let xArray = Array2.create nCo ((1+Convert.ToInt32(ed.ToOADate()))-(Convert.ToInt32(sd.ToOADate()))) 0.0

uses the F# function Array2.create to instantiate a two-dimensional array.  The first argument is the number of rows; the second argument is the number of columns; and the third argument is initial value to fill the array.  Arrays cannot be resized after they are initialized.  The array dimensions are zero-indexed, meaning an array with 10 rows is indexed 0 to 9.

In the example above nCo (my number of companies defines the number of rows. 

The statement:

((1+Convert.ToInt32(ed.ToOADate()))-(Convert.ToInt32(sd.ToOADate())))

defines the number of columns.  What the devil is that anyway?  The function ed.ToOADate() turns the variable ed (end date) into a floating point (OLE Automation) number.

This is from Microsoft:

"An OLE Automation date is implemented as a floating-point number whose value is the number of days from midnight, 30 December 1899. For example, midnight, 31 December 1899 is represented by 1.0; 6 A.M., 1 January 1900 is represented by 2.25; midnight, 29 December 1899 is represented by -1.0; and 6 A.M., 29 December 1899 is represented by -1.25.

The base OLE Automation Date is midnight, 30 December 1899. The maximum OLE Automation Date is the same as maxvalue the last moment of 31 December 9999."

I convert the date into a number so I can use it as an index for my array.  But, alas, I cannot use a floating point number, so I must use Convert.ToInt32 to convert my floating point number into an integer.  I then decide that the first (or zeroth) element should be the start date.  This means that my maximum array bound is the last date minus the first date (both now integers) plus 1.  Ultimately, I have an array with the number of columns equal to the number of days between the first date and the last date that are passed in.

I next instantiate a vector (one dimensional array)

let indArray = Array.create ((1+Convert.ToInt32(ed.ToOADate()))-(Convert.ToInt32(sd.ToOADate()))) 0.0

that will be used to hold my results.

The next block of code:

    let selsql = getCompanySQL group_type_id industry_order major_order
    let cmd = new System.Data.SqlClient.SqlCommand(selsql, conn)
    let reader = cmd.ExecuteReader()

opens a data reader that contains a list of all the company tickers in the industry group.

Now, the next thing I tried did not work and I still don't know why.

type rsListType = {aList : List<clsprice>}
let inpList = new List<inputParam>()

while reader.Read() do
    inpList.Add ({tick = reader.GetInt32(0); sdate = sd.ToString(); edate = ed.ToString()})
    ()
         
let rsList = new List<rsListType>()
let rsList = List.map getPriceData inpList

What I wanted to do here was to set up a list of input parameters (my list inpList) and then apply my function getPriceData to the list using the List.map function.  The results would come back in a List of Lists, where each List is of type clsprice.

When I try to execute the above code a get this error:

"This expression has type  List<inputParam> but is here used with type  inputParam list."

I've tried figuring this one out with no luck.

So, I decided to take another tack:

    let mutable i = -1
    let mutable j = 0
    //
    //Loop over each company in the group
    //
    while reader.Read() do
    
        i <- i+1
        //
        //Get the Prices for the company
        //
        let prList = getPriceData({tick = reader.GetInt32(1); sdate = sd.ToString(); edate = ed.ToString()})

        //
        //Fill up the Price Array
        //
        for xPrice in prList do
            j <- Convert.ToInt32(Convert.ToDateTime(xPrice.adate).ToOADate())-Convert.ToInt32(sd.ToOADate())
            xArray.[i, j] <- xPrice.price
        ()
    ()       

The statements:

    let mutable i = -1
    let mutable j = 0

allow me to create two array indices that are 'mutable,'  meaning the act like real variables and can change their value.  Obviously, the key word here is mutable.  The way to reassign a value to a mutable variable is through the syntax:

         i <- i+1

The operator '<-' is the key.  Also, you don't need the word 'let'

I then use the following code:

        let prList = getPriceData({tick = reader.GetInt32(1); sdate = sd.ToString(); edate = ed.ToString()})

        //
        //Fill up the Price Array
        //
        for xPrice in prList do
            j <- Convert.ToInt32(Convert.ToDateTime(xPrice.adate).ToOADate())-Convert.ToInt32(sd.ToOADate())
            xArray.[i, j] <- xPrice.price
        ()

To retrieve a list of stock prices for a ticker (my getPriceData function).  I then loop over each price observation in my price list and insert it into my xArray at index i,j.

j <- Convert.ToInt32(Convert.ToDateTime(xPrice.adate).ToOADate())-Convert.ToInt32(sd.ToOADate())

Like above, I use the ToDateTime() function to count the number of days from the start date to use as my column index.

Because the stock market is not open every day (weekends, holidays, 9/11), not all the columns have data.  Also, since my starting point of 01/01/1990, many companies (e.g. Google) have been created and do not have a full stock price history.

The statement:

xArray.[i, j] <- xPrice.price

is used to assign a price value to an array element.  Note, an array is mutable (by definition), so we need to use the '<-' syntax without the word 'let'.  Also note that the array reference uses a dot '.' notation.  Beats me why.  I seem to recall that Pascal just used square brackets (e.g. xArray[i, j]) to assign array values.

At the end of the loop we have our xArray filled up with nCo rows of  stock prices (indexed) for all days between 01/01/1990 and today.  We now need to calculate the average index per day.  Before, I go into the code, I need to mention that I have stock prices for companies listed both the New York and Toronto exchanges.  I note this because it turns out that each exchange has different holidays, and therefore the number of company observations I have on a specific day can vary over the life of the index.  For example, New York is closed on July 4th; Toronto is closed July 1st.  Becuase I want my index to cover a consistent set of companies, I want to exclude those days where there are "missing" price observations.  I do this by counting the number of valid (non-zero) observations.  If the count is lower than the preceding day's count, I set the average value to zero.  And, later, when I write out my results, I skip the days with a zero average.

So here's my code:

    let nDates = 1+Convert.ToInt32(ed.ToOADate())-Convert.ToInt32(sd.ToOADate())
    let mutable (xObs : int) = 0
    for jj = 0 to nDates-1 do
        let mutable (nObs : int) = 0
        for ii = 0 to nCo-1 do
             indArray.[jj] <- indArray.[jj] + xArray.[ii, jj]
             if xArray.[ii, jj] > 0.001 then
                  nObs <- nObs + 1
       
        xObs <-
            if nObs > xObs then nObs else xObs
                 
        if (nObs > 0 && nObs >= xObs) then
             indArray.[jj] <- indArray.[jj] / Convert.ToDouble(nObs)
        else
             indArray.[jj] <- 0.0   
   
    //
    // OK, we should be done, so let's write it out
    //
 

I first create a variable nDates that is a count of the number of days I need to loop over.  I then use jj as my index to loop over each day.  This statement shows the format of a do loop in F#

for jj = 0 to nDates-1 do

There is no closing statement (like next jj) nor are there curly braces.  Everything is handled by indenting.  Also, the index jj is apparently mutable by definition.

The statement:

        let mutable (nObs : int) = 0

sets up a mutable integer variable nObs to keep track of the number of valid (non-zero) observations for any given day.  I then loop over the days:

        for ii = 0 to nCo-1 do
             indArray.[jj] <- indArray.[jj] + xArray.[ii, jj]
             if xArray.[ii, jj] > 0.001 then
                  nObs <- nObs + 1

and accumulate the total in my indArray array.  If the price is non-zero, I add it to my daily total of valid obverations.  Once I have finished a day, I check the total valid observations against my current valid count:

        xObs <-
            if nObs > xObs then nObs else xObs

If it is greater, I increase my valid count.  I should not that I am only looking at currently active companies.  Thus, companies can come into the index over time, they never leave (just like the Hotel California).

Lastly, I calculate the average index

        if (nObs > 0 && nObs >= xObs) then
             indArray.[jj] <- indArray.[jj] / Convert.ToDouble(nObs)
        else
             indArray.[jj] <- 0.0   

if the number of observations is greater then zero and the count is greather than or equal to my current running count of companies.

Note && means 'and,' and || would mean 'or.'

Once I have created my industry index, I write it to my database.

Print | posted on Monday, June 22, 2009 5:05 PM |

Powered by:
Powered By Subtext Powered By ASP.NET