Introduction
The ability to create a user-defined function (UDF) is a new feature added to SQL Server 2000. Developers have been asking Microsoft to add this feature for several versions of the product, so let's take a quick look at how to create a few simple UDFs to see if they can help you in your programming endeavors.
Creating a Simple UDF
A user-defined function is a database object that encapsulates one or more Transact-SQL statements for reuse. This definition is similar to the one for stored procedures, but there are many important differences between user-defined functions and stored procedures—the most pronounced being what types of data they can return. Let's create one so you can see how easy they are to create and reference.
The following statement shows how to create a function that accepts two input parameters, sums them together and then returns the sum to the calling statement.
CREATE FUNCTION fx_SumTwoValues ( @Val1 int, @Val2 int ) RETURNS int AS BEGIN RETURN (@Val1+@Val2) END
The structure of the CREATE FUNCTION statement is fairly straightforward. You provide an object name (fx_SumTwoValues), input parameters (@Val1 and @Val2), the type of data the function will return () and the statement(s) the function executes are located between the BEGIN…END block. The following SELECT statement calls the function. Note that the two-part name (owner.object_name) is required when calling this function.
SELECT dbo.fx_SumTwoValues(1,2) AS SumOfTwoValues SumOfTwoValues -------------- 3
When the SELECT is executed, the input parameters 1 and 2 are added together and the sum 3 is returned. You can use any values that either are, or can be, implicitly converted to an int data type for the input parameters. Keep in mind, though, that only an int can be returned, so the following statement will not produce the desired results.
SELECT dbo.fx_SumTwoValues(1.98,2.78) AS SumOfTwoValues SumOfTwoValues -------------- 3
The function returns a 3, which indicates the decimal portion of the parameters are truncated before the calculation occurs.
SQL Server's ability to implicitly convert data allows the following to execute successfully.
SELECT dbo.fx_SumTwoValues('7','7') AS SumOfTwoValues SumOfTwoValues -------------- 14
When values that cannot be converted to an int are passed to the function, the following error message is generated.
SELECT dbo.fx_SumTwoValues('Y','7') AS SumOfTwoValues Server:Msg 245,Level 16,State 1,Line 1 Syntax error converting the varchar value 'Y'to a column of data type int.
Three Types of User-Defined Functions
Now that you have seen how easy it is to create and implement a simple function, let's cover the three different types of user-defined functions and some of the nuances of how they are implemented.
Scalar Functions
A scalar function returns a single value of the data type referenced in the RETURNS clause of the CREATE FUNCTION statement. The returned data can be of any type except text, ntext, image, cursor, or timestamp.
The example we covered in the previous section is a scalar function. Although the previous example only contained one statement in the BEGIN…END block, a scalar function can contain an unlimited number of statements as long as only one value is returned. The following example uses a WHILE construct to demonstrate this.
CREATE FUNCTION fx_SumTwoValues2 ( @Val1 int, @Val2 int ) RETURNS int AS BEGIN WHILE @Val1 <100 BEGIN SET @Val1 =@Val1 +1 END RETURN (@Val1+@Val2) END go SELECT dbo.fx_SumTwoValues2(1,7) AS SumOfTwoValues SumOfTwoValues -------------- 107
The @Val1 input parameter is set to 1 when the function is called, but the WHILE increments the parameter to 100 before the RETURN statement is executed. Note that the two-part name (owner.object_name) is used to call the function. Scalar functions require that their two-part names be used when they are called. As you will see in the next two sections, this is not the case with the other two types of functions.
Inline Table-Valued Functions
An inline table-valued function returns a variable of data type table whose value is derived from a single SELECT statement. Since the return value is derived from the SELECT statement, there is no BEGIN/END block needed in the CREATE FUNCTION statement. There is also no need to specify the table variable name (or column definitions for the table variable) because the structure of the returned value is generated from the columns that compose the SELECT statement. Because the results are a function of the columns referenced in the SELECT, no duplicate column names are allowed and all derived columns must have an associated alias.
The following uses the Customer table in the Northwind database to show how an inline table-valued function is implemented.
USE Northwind go CREATE FUNCTION fx_Customers_ByCity ( @City nvarchar(15) ) RETURNS table AS RETURN ( SELECT CompanyName FROM Customers WHERE City =@City ) go SELECT * FROM fx_Customers_ByCity('London') CompanyName ---------------------------------------- Around the Horn . . . Seven Seas Imports
Multi-Statement Table-Valued Functions
The multi-statement table-valued function is slightly more complicated than the other two types of functions because it uses multiple statements to build the table that is returned to the calling statement. Unlike the inline table-valued function, a table variable must be explicitly declared and defined. The following example shows how to implement a multi-statement table-valued function that populates and returns a table variable.
USE Northwind go CREATE FUNCTION fx_OrdersByDateRangeAndCount ( @OrderDateStart smalldatetime, @OrderDateEnd smalldatetime, @OrderCount smallint ) RETURNS @OrdersByDateRange TABLE ( CustomerID nchar(5), CompanyName nvarchar(40), OrderCount smallint, Ranking char(1) ) AS BEGIN --Statement 1 INSERT @OrdersByDateRange SELECT a.CustomerID, a.CompanyName, COUNT(a.CustomerID)AS OrderCount, 'B' FROM Customers a JOIN Orders b ON a.CustomerID =b.CustomerID WHERE OrderDate BETWEEN @OrderDateStart AND @OrderDateEnd GROUP BY a.CustomerID,a.CompanyName HAVING COUNT(a.CustomerID)>@OrderCount --Statement 2 UPDATE @OrdersByDateRange SET Ranking ='A' WHERE CustomerID IN (SELECT TOP 5 WITH TIES CustomerID FROM (SELECT a.CustomerID, COUNT(a.CustomerID)AS OrderTotal FROM Customers a JOIN Orders b ON a.CustomerID =b.CustomerID GROUP BY a.CustomerID) AS DerivedTable ORDER BY OrderTotal DESC) RETURN END
The main difference between this example and the one in the previous section is that we were required to specify the structure of the @OrdersByDateRange table variable used to hold the resultset and list @OrdersByDateRange in the RETURNS clause. As you can see from the input parameter list, the function accepts a start date, an end date and an order count value to filter the resultset.
The first statement (--Statement 1) uses the input parameters to populate the table variable with customers who meet the specified criteria. The second statement (-Statement 2) updates the rows in table variable to identify the top five overall order placers. The IN portion of the UPDATE may seem a little confusing at first glance, but all its doing is using a derived table to select the CustomerID values of the top five order producers. Derived tables are discussed in Chapter 4. You can use the following to find the companies who have submitted more than two orders between 1/1/96 and 1/1/97.
SELECT * FROM fx_OrdersByDateRangeAndCount ('1/1/96','1/1/97',2) ORDER By Ranking CustomerID CompanyName OrderCount Ranking ---------- ------------------------------ ---------- ------- ERNSH Ernst Handel 6 A FOLKO Folk och fä HB 3 A HUNGO Hungry Owl All-Night Grocers 5 A QUICK QUICK-Stop 6 A SAVEA Save-a-lot Markets 3 A SEVES Seven Seas Imports 3 B SPLIR Split Rail Beer &Ale 5 B ...
The rows ranking values of 'A' indicate the top five order placers of all companies. The function allows you to perform two operations with one object. Retrieve the companies who have placed more than two orders between 1/1/96 and 1/1/97 and let me know if any of these companies are my top five order producers.
One of the advantages of using this type of function over a view is that the body of the function can contain multiple SQL statements to populate the table variable, whereas a view is composed of only one statement. The advantage of using multi-statement table-valued function versus a stored procedure is that the function can be referenced in the FROM clause of a SELECT statement while a stored procedure cannot. Had a stored procedure been used to return the same data, the resultset could only be accessed with the EXECUTE command.
A Real-World Example
Now that you have an idea of the different types of functions available in SQL Server 2000, let's wrap up this article with an example you might be able to use on one of your projects. The following statements create a function that determines the last day of the month (LDOM) for a given date parameter.
CREATE FUNCTION fx_LDOM ( @Date varchar(20) ) RETURNS datetime AS BEGIN --ensure valid date IF ISDATE(@Date) = 1 BEGIN --determine first day of month SET @Date = DATEADD(day,-DAY(@Date)+1,@Date) --determine last day of month SET @Date = DATEADD(day,-1,DATEADD(month,1,@Date)) END ELSE SET @Date = '1/1/80' RETURN @Date END
The function's parameter (@Date) is defined as varchar(20), so error-checking code can be implemented. The ISDATE function makes sure the supplied value is a valid date. When an invalid date value is supplied, the function returns '1/1/80' to the calling statement. If you do not use this type of error-checking, the call to the function will fail when an invalid date is supplied.
When a valid date value is supplied, the DATEADD function is used to:
- Determine the first day of the month, and
- Determine the last day of the month.
If you have never used DATEADD before this may seem a little confusing, but a quick explanation should eliminate any that might exist. You use DATEADD to add or substract a date/time unit from a given date. The first parameter (in this case 'day') indicates the portion of the date that should be incremented. You can also specify year, quarter, month...millisecond. The second parameter is the number of units to add or substract. When subtracting, you simply make the value negative as shown in the example. The third parameter is the date value on which the calculation is performed.
The first day of the month is determined by calculating the number of elapsed days in the supplied parameter with the DAY function, adding 1 and then substracting it from the parameter. For an @Date value of 1/15/01, it simply subtracts 14 (15-1) days to get 1/1/01.
The last day of the month is determined by adding 1 month to the current month value and subtracting one day. So, 1/1/01 plus 1 month is equal to 2/1/01 and when you substract one day you get: 1/31/01.
The following shows how fx_LDOM is used in a SELECT statement to calculate the number of days remaining in a month.
CREATE TABLE fx_Testing (DateValue datetime) go INSERT fx_Testing VALUES ('1/1/01') INSERT fx_Testing VALUES ('2/15/01') INSERT fx_Testing VALUES ('2/15/02') INSERT fx_Testing VALUES ('2/15/03') INSERT fx_Testing VALUES ('2/15/04') SELECT DateValue, dbo.fx_LDOM(DateValue) AS LDOM, DATEDIFF(day,DateValue,dbo.fx_LDOM(DateValue)) AS DaysLeftInMonth FROM fx_Testing DateValue LDOM DaysLeftInMonth ------------------------ ----------------------- --------------- 2001-01-01 00:00:00.000 2001-01-31 00:00:00.000 30 2001-02-15 00:00:00.000 2001-02-28 00:00:00.000 13 2002-02-15 00:00:00.000 2002-02-28 00:00:00.000 13 2003-02-15 00:00:00.000 2003-02-28 00:00:00.000 13 2004-02-15 00:00:00.000 2004-02-29 00:00:00.000 14
The DATEDIFF function is used to determine the difference between two dates. In this case, the number of days between the value in the DateValue column and the last day of the month calculated by fx_LDOM.