Thursday, April 12, 2012

SQL Server CHARINDEX and PATINDEX


The CHARINDEX and PATINDEX functions are used to search a character string for a character or set of characters. If the character string being searched contains the characters being searched for, then these functions return a non-zero integer value. This integer value is the starting location of where the character string being searched for is located within the string being searched. The PATINDEX function allows for using wildcard syntax in the pattern being searched for, where as the CHARINDEX function does not support wildcard character searches. Let's look at each one of these functions in a little more detail.

How to use the CHARINDEX Function

The CHARINDEX function returns the starting position of a character, or a string of characters within another character string. The CHARINDEX function is called using the following format:

CHARINDEX ( expression1 , expression2 [ , start_location ] )

Where expression1 is the string of characters to be found in expression2, and start_location is the position where the CHARINDEX function will start looking for expression1 in expression2.

The CHARINDEX function returns an integer. The integer value returned is the position where the characters being search for are located within the string being searched. If the CHARINDEX does not find the characters you are searching for then the function returns a zero integer value. Let say we execute the following CHARINDEX function call:

	CHARINDEX('SQL', 'Microsoft SQL Server')

This function call will return the starting location of the character string "SQL", in the string "Microsoft SQL Server". In this case the CHARINDEX function will return the number 11, which as you can see is the starting position of "S" in string "Microsoft SQL Server".

Now say we have the following CHARINDEX Command:

	CHARINDEX('7.0', 'Microsoft SQL Server 2000')

In this example the CHARINDEX function will return zero, since the character string "7.0" cannot be found in the string "Microsoft SQL Server 2000". Let go through a couple of examples of how you might be able to use the CHARINDEX function to solve some actual T-SQL problems.

For the first example say you would like to display only the last name of the ContactName column, for the first 5 records in the Northwind database Customer table. Here are the first 5 records.

	ContactName 	------------------------------  	Maria Anders 	Ana Trujillo 	Antonio Moreno 	Thomas Hardy 	Christina Berglund 

As you can see, the CustomerName contains both the first and last name of the customer, where first and last name are separated by a single space. I will use the CHARINDEX function to identify the position of the space between the two names. This way we can used the position of the space to parse the ContactName so we can display only the last name portion of the column. Here is some T-SQL code to display only the last name for the first 5 records in the Northwind Customer table.

	select top 5 substring(ContactName,                        charindex(' ',ContactName)+1 ,                        len(ContactName)) as [Last Name]        from Northwind.dbo.customers 

Here is the output from this command:

	Last Name 	------------------------------  	Anders 	Trujillo 	Moreno 	Hardy 	Berglund 

The CHARINDEX function found the space between the First and Last Name, so that the substring function could split the ContactName, thus only the Last Name was displayed. I added 1 to the integer value that CHARINDEX returned, so the Last Name displayed did not start with a space.

For the second example, say you want to count all the records from a table where a given column contains a particular character string. The CHARINDEX function could be used to satisfy your request. To count all of the Addresses in the Northwind.dbo.Customer table where the Address column contains either the word 'Road' or an abbreviation for road ('Rd'), your SELECT statement would look like this:

	select count(*) from Northwind.dbo.Customers  	  where CHARINDEX('Rd',Address) > 0 or CHARINDEX('Road',Address) 	> 1

How Does the PATINDEX Function Work?

The PATINDEX function returns the starting position of a character or string of characters within another string, or expression. As stated earlier the PATINDEX has additional functionality over CHARINDEX. PATINDEX supports wildcard characters in the search pattern string. This makes PATINDEX valuable for searching for varying string patterns. The PATINDEX command takes the following form:

PATINDEX ( '%pattern%' , expression )

Where "pattern" is the character string you are searching for and expression is the string in which you are searching. Commonly the expression is a column in a table. The "%" sign is needed on the front and back of the pattern, unless you are searching for the pattern at the beginning and/or ending of the expression.

Like the CHARINDEX function, the PATINDEX function returns the starting position of the pattern within the string being searched. If you have a PATINDEX function call like so:

PATINDEX('%BC%','ABCD')

Then the result of the PATINDEX function call is 2, which is the same as the CHARINDEX function. The %'s in the above command tells the PATINDEX command to find the position of the "BC" string where the string might have zero or more characters in front of, or after "BC". The % sign is a wildcard character.

If you want to determine if a string starts with a specific set of characters you would leave off the first % sign, and your PATINDEX call would look like this:

PATINDEX('AB%','ABCD')

In this case the PATINDEX function returns a 1, indicating that the pattern 'AB' was found in the expression 'ABCD'.

Now with wildcard characters you can create a much more complicated pattern matching situation then the simple ones I have shown you so far. Say you want to determine if a character string contains the letters A and Z, as well as any numeric number. Then your PATINDEX function call might look like this.

PATINDEX('%[A,Z,0-9]%[A,Z,0-9]%[A,Z,0-9]%','XYZABC123')

Note that the pattern I am looking for in the above example uses a number of wildcard references. Review SQL Server Books Online for other wildcard syntax information. Let's go through a couple of examples in how we might use the PATINDEX command in conjunction with a SELECT statement.

Say you want to find all of the records that contain the words "Bread", or "bread" in the Description text column in the Northwind.dbo.Categories table, then your select statement would look like this:

	select Description from Northwind.dbo.Categories 	  where patindex('%[b,B]read%',description) > 0

Here I used wildcarding to look for either a lower or upper case "b". When I run this SELECT statement against my Northwind database I get the following Description columns displayed:

	Description 	-------------------------------------------------------- 	Desserts, candies, and sweet breads 	Breads, crackers, pasta, and cereal 

Here is another example where I used some additional wildcard references to find some records. This example excludes the Description that has an 'e' as the second letter from the result set in the above example.

	select Description from Northwind.dbo.Categories 	  where patindex('%[b,B]read%',description) > 0 	    and patindex('_[^e]%',description) = 1 

By adding an additional PATINDEX function call, to the where statement, that used the ^ wildcard symbol, I was able to exclude the "Dessert, candies, and sweet breads" description. The above example returned the following single description.

	Description 	-------------------------------------------------------- 	Breads, crackers, pasta, and cereal   

Conclusion

As you can see the CHARINDEX and the PATINDEX perform similar kinds of pattern searches within a character string. The PATINDEX function provides wildcard specifications, allowing it to be used for much different kinds of pattern matches, while the CHARINDEX function does not. Depending on what you need to do, these two functions are great in helping you search, manipulate and parse character strings in SQL Server.


No comments:

Post a Comment