Configuring the Server
Added 4 Mar 2009
SQL Server 2005 is not configured to use CLR code by default. You will have to explicitly configure the server to use the CLR. This is easily done in TSQL by calling the system procedure sp_configure.
sp_configure 'clr enabled',
1
RECONFIGURE WITH OVERRIDE
(You are probably already aware that you can run sp_configure with only a partial completion of the first argument, as long as what you have typed is sufficient to uniquely identify the configuration property.)
Now we use the CREATE ASSEMBLY statement to tell SQL Server to copy the assembly into its system tables. Be sure that Management Studio query window is pointing to the database you want (e.g. Northwind) and run the following TSQL:
CREATE ASSEMBLY regX
FROM 'C:\KnowledgeBase\SqlStuff\SQL2005\CLR\UDF\TextFunctions\TextFunctions\bin\Debug\RegEx.dll'
It would be a remarkable coincidence indeed if your path were the same as this one; you might wish to change it.
After the CREATE ASSEMBLY statement is run, the executable code is safely stored in SQL Server's own system tables, away from access by any hacker. If you like, you can confirm this by deleting the dll file. The code will run from SQL Server anyway! If you wish to recompile the .NET assembly and reinstall it into SQL Server, you must either drop it or change the CREATE to ALTER, just as with TSQL statements.
The final step for the installation of our CLR code is telling SQL Server how to match up a Transact SQL request with a CLR function. We do this with a CREATE FUNCTION statement (or CREATE PROCEDURE as appropriate). However, unlike the usual CREATE FUNCTION statement, there is no TSQL code. There is only the EXTERNAL NAME reference to the function. Note that the function name is fully qualified, that is, assemblyName.ClassName.FunctionName. It is also important to be aware that the EXTERNAL NAME specification is case-sensitive!
-- The external name is case sensitive
CREATE Function RegExMatch(@Input NVARCHAR(512),@Pattern
NVARCHAR(127))
RETURNS BIT EXTERNAL NAME regX.RegularExpressions.RegExMatch
GO
CREATE Function RegExReplace(@Input NVARCHAR(512),@Pattern
NVARCHAR(127), @Replacement NVARCHAR(512))
RETURNS NVARCHAR(512) EXTERNAL NAME
regX.RegularExpressions.RegExReplace
Note that the assembly name in the CREATE FUNCTION statement is the name you gave it when you loaded the assembly into SQL Server, not the name of the dll file, which is no longer of any concern to SQL Server. The TSQL function name need not be the same as the CLR function, but I believe it is less confusing if they are the same. My choice of character size for the NVARCHAR declarations is arbitrary; you might feel some other size is more appropriate. Note that the RegExMatch function returns type BIT.
The time has come to test our creation. I suspect you are anxious to get on with it, and have no patience for a regular expression tutorial. So we'll make testing the functions our intro.
Here are few SELECT statements which operate on literal strings:
SELECT dbo.RegExMatch('Fred','.d')
SELECT dbo.RegExMatch('Fred','^.d')
SELECT dbo.RegExMatch('Fred','\d')
SELECT dbo.RegExMatch('Fr3d','\d')
In the first example, '.d' means the letter 'd' after any character. The period matches any character. The caret in the second example matches the beginning of a line of text. '^.d' therefore only matches 'd' if it is the second letter. In the next two examples, the 'd' is preceded by a backslash. This makes it one of the many shortcuts in regular expressions. '\d' matches any digit, i.e. 0-9. '\D' matches anything which is not a digit.
Here are some query examples which use our RegExMatch() function in a WHERE clause to extract matching row data from the Northwind Customers table:
SELECT * FROM Customers WHERE dbo.RegExMatch(CompanyName, '^.{1,3}n'
)=1
-- finds CompanyName where the second, third, or fourth letter is
an 'n'
SELECT * FROM Customers WHERE dbo.RegExMatch(CompanyName, '\sa'
)=1
-- find lower-case letter a after a whitespace character (space,
tab, etc)
SELECT * FROM Customers WHERE dbo.RegExMatch(CompanyName, '\sA'
)=1
-- find upper-case letter A after a whitespace character
SELECT * FROM Customers WHERE dbo.RegExMatch(CompanyName, '(cat|kat)'
)=1
-- find either 'cat' or kat'
SELECT * FROM Customers WHERE dbo.RegExMatch(Address, '\d'
)=0
-- find addresses which do not contain any numerals at all
Now let's have a look at RegExReplace. We will pick a slightly more complex example. We would like to query phone numbers from the Customers table of the Northwind sample database, but we do not want to see either parentheses or hyphens. Parentheses and hyphens, however, are special charcters in regular expressions. We will have to escape them with backslashes.
(\(|\)|\-)
The vertical pipe symbol represents OR in regular expression logic. This peculiar-looking expression can be read \( OR \) OR \-. If you prefer, an open paren OR a close paren OR a hyphen. The entire OR series is enclosed in parentheses. Now we can write our query:
SELECT CompanyName, Phone, dbo.RegExReplace(Phone,'(\(|\)|\-)' ,'' )
FROM Customers
WHERE Phone IS NOT NULL