Fuzzy Search Algorithm

This algorithm was developed specifically to search for similar names, spelling variations, and typos in proper names and business names.  For this purpose, it is superior to SOUNDEX, which searches only for similar sounding words.

The code below creates two functions.  The first function is MatchText, which simply strips strings of vowels and non-alphanumeric characters.  The second function is CompareText, which returns an integer between 0 and 100 indicating the similarity between two parameter strings.  For increased accuracy and efficiency, try running parameter values through MatchText before submitting them to CompareText, and for common searches store the MatchText values of names as a separate column in your table.

[code]if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[MatchText]')

and xtype in (N'FN', N'IF', N'TF'))
drop function [dbo].[MatchText]
GO

CREATE function MatchText (@InputString varchar (50))
returns varchar (50)

begin
--function MatchText
--blindman, 7/2005
--Strips a string of all vowels and non-alphanumeric characters.

-- --test parameters
-- declare    @InputString varchar(50)
-- set    @InputString = 'Bruce a. Lindman'

declare @TempString varchar (50)
declare @OutputString varchar (50)
declare @CharNum integer
declare @TestChar CHAR(1)

--Convert to uppercase and remove noise characters
set @TempString = UPPER(@InputString)
set @TempString = replace(@TempString, 'A', '')
set @TempString = replace(@TempString, 'E', '')
set @TempString = replace(@TempString, 'I', '')
set @TempString = replace(@TempString, 'O', '')
set @TempString = replace(@TempString, 'U', '')

--Build @OutputString with only alphanumeric characters
set @CharNum = 1
set @OutputString = ''
while @CharNum <= len(@TempString)
begin
set @TestChar = substring(@TempString, @CharNum, 1)
if (@TestChar between 'A' and 'Z') OR (@TestChar between '0' and '9')

set @OutputString = @OutputString + @TestChar
set @CharNum = @CharNum + 1
end

return @OutputString
end
GO

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[CompareText]')

and xtype in (N'FN', N'IF', N'TF'))
drop function [dbo].[CompareText]
GO

create function CompareText  (@String1 varchar (100), @String2 varchar (100))
returns integer
--Function CompareText
--     This function accepts two string values and returns an integer value between
--zero and one hundred indicating the similarity between the two string.  This
--algorithm was developed specifically to search for similar names, spelling
--variations, and typos in proper names and business names.  For this purpose,
--it is superior to SOUNDEX, which searches only for similar sounding words.
--     Proper name pairs which yield a CompareText value above 80 are very likely to
--represent the same person.  Pair values greater than 60 should be reviewed
--manually to confirm the match.  For greater accuracy and efficiency, run the names
--through the MatchText function to remove spaces and vowels before submitting them
--to comparetext.
--     For efficiency in comparing two large lists of names, it is best to join
--the sets on another column as well, such as zip code, or city name.

--Usage: select dbo.CompareText('Alan Smith', 'Smith, Alan J.')

--blindman 4/2005
--Adapted from MS Access algorithm developed 1997

begin

declare @Possibles integer
declare @Hits integer
declare @Counter integer

set @Possibles = len(@String1) + len(@String2) - 2
set @Hits = 0

set @Counter = len(@String1)-1
while @Counter > 0
begin
if charindex(substring(@String1, @Counter, 2), @String2) > 0 set @Hits =

@Hits + 1
set @Counter = @Counter - 1
end

set @Counter = len(@String2)-1
while @Counter > 0
begin
if charindex(substring(@String2, @Counter, 2), @String1) > 0 set @Hits =

@Hits + 1
set @Counter = @Counter - 1
end

return (100*@Hits)/@Possibles
end

[/code]