This SQLite extension is the ultimate library on fuzzy logic for SQLite.
- Note:
- This is a beta release, and some of the links are still under constructions. All the links should be complete before the full non-beta release.
- This fuzzy library is design so it can be used with or without SQLite.
- Using SqliteFuzzyPlusExtension
- Function List
- Examples
- Build
- ToDo
- Test Projects and Example Projects
- TestDatabase
The following query finds a 90% match for the name "David Jorge" using the default distance method.
select Name, Similar(Name, "David Jorge") as s FROM SimilarNames where s > .9
Query to list the different results of different distance methods.
select Name
, HowSimilar(Name, "David Jorge", "Levenshtein") as lev, HowSimilar(Name, "David Jorge", "DamerauLevenshtein") as dlev, HowSimilar(Name, "David Jorge", "LongestCommonSequence") as lcs, HowSimilar(Name, "David Jorge", "NeedlemanWunsch") as n
, HowSimilar(Name, "David Jorge", "JaroWinkler") as jw
FROM SimilarNames
Return a results that has less than 2 edit distance.
select Name, fuzzy_damlev(Name, "David Jorge") as d FROM SimilarNames where d < 2
For more examples, see Documentation.
SqliteFuzzyPlusExtension is a SQLite Fuzzy Extension which is build using both C/C++ and CSharp libraries. The Visual Studio solution builds 2 DLL (C++ & C#). 95% of the source is taken from other fuzzy libraries like SimMetricsCore, SQLean, Edlib, SimMetrics.Net, jaccardsimilarity, Phonix, and Microsoft PhoneticMatching.
- As a SQLite extension, the two DLL's are both required (SqliteFuzzyPlusExtension.dll and FuzzyPlusCSharp.dll).
- When calling the libraries from source code, add SqliteFuzzyPlusExtension.lib to the build and for c++ code, add include SqliteFuzzyPlusExtension.h.
- If building in C, add
#define EXCLUDE_NAMESPACE_SQLITEFUZZYPLUSEXTENSION
, before including SqliteFuzzyPlusExtension.h.
When using SqliteFuzzyPlusExtension.dll with an executable like DB Browser for SQLite, the FuzzyPlusCSharp.dll file MUST be in the same directory as the executable (DB Browser for SQLite.exe). While the SqliteFuzzyPlusExtension.dll can be located anywhere.
- Fuzzy Functions
- Most fuzzy functions in SqliteFuzzyPlusExtension can be called directly without using SQLite.
- See UsingFuzzyFunctionsOutsideSQLite.md for more details.
- Edit Distance Based Methods
- Levenshtein - (alias = Lev)
- Levenshtein2Distance - (alias = Lev2)
- DamerauLevenshteinDistance - (alias = DamLev)
- NormalizedLevenshteinDistance - (alias = NormLev)
- HammingDistance - (alias = HammDist)
- JaroWinklerDistance - more - (alias = JaroWin)
- JaroDistance - more - (alias = Jaro)
- ChapmanLengthDeviation
- EuclideanDistance - more
- ChapmanMeanLength - This method does not give expected results, and is only here for testing and comparisons purposes.
- EdlibDistance - (alias = Edlib)
- fuzzy_damlev - (alias = dlevenshtein)
- fuzzy_editdist - (alias = edit_distance)
- fuzzy_hamming - (alias = hamming)
- fuzzy_jarowin - (alias = jaro_winkler)
- fuzzy_leven - (alias = levenshtein)
- fuzzy_osadist - (alias = osa_distance)
- Sequence Alignment Based Methods
- LongestCommonSequence
- LongestCommonSubsequenceDistance - (alias = LCSQ)
- LongestCommonSubstringDistance - (alias = LCS)
- NeedlemanWunsch - more
- RatcliffObershelpSimilarityDistance - (alias = Ratcliff)
- SmithWaterman
- SmithWatermanGotoh
- SmithWatermanGotohWindowedAffine
- Token Based Methods
- CosineSimilarity
- JaccardIndex
- JaccardSimilarity
- TanimotoCoefficientDistance - (alias = Tanimoto)
- OverlapCoefficientDistance - more - (alias = OverlapCoef)
- SorensenDiceDistance - (alias = SorensenDice)
- DiceSimilarity - more
- BlockDistance
- MatchingCoefficient
- QGramsDistance
- NGramsDistance - more
- Hybrid Algorithms
- MongeElkan - more
- Sift4 Not in build, but in source code. Will be included in next build (0.1.5)
- Phrase token methods
- PhraseTokenize - (alias = PhraseDiff)
- SimplePhraseTokenize
- Caverphone2
- fuzzy_phonetic - (alias = phonetic_hash)
- fuzzy_rsoundex - (alias = rsoundex)
- fuzzy_soundex - (alias = soundex)
- fuzzy_translit - (alias = translit)
- fuzzy_caver - (alias = caverphone)
- EnPhoneticDistance
- Soundex2
- SameSound
- RegexMatch - (alias = XMatch)
- RegexReplace - (alias = Regex)
- RegexSearch - (alias = XSearch)
- Distance
- HowSimilar - (alias = Similar)
- SameSound
- SetDefaultDistanceMethod
- fuzzy_script - (alias = script_code)
- HasCharInSameOrder - (alias = HasChr)
- SameName
These minimum and maximum functions are included to allow developers to create custom percentage distance in a query.
- MaxValue - (alias = MaxVal)
- MaxLength - (alias = MaxLen)
- MinValue - (alias = MinVal)
- MinLength - (alias = MinLen)
- NormalizeNum
The plus is in the name because this extension has some extra functions that have little to do with fuzzy logic. Here's a list of the non-fuzzy functions.
- StringReverse - Returns the string in reverse order. - (alias = StrRev)
- GetDirectoryName - Retrieves the directory information from a given path string. - (alias = GetDirName)
- GetExtension - Retrieves the extension from a given path string.
- GetFileName - Retrieves the file name (including its extension) from a given path string. - (alias = GetFileNameWithExtension)
- GetFileNameWithoutExtension - Retrieves the file name (excluding its extension) from a given path string.
- IsDirExist - Takes the path of the directory as a string argument and returns 1 or 0, where 1 means the directory exists. - (alias = DirExist)
- IsFileExist - Takes a string path to determine if a specified file exists and returns 1 or 0, where 1 means the file exists. - (alias = FileExist)
- SQL examples are listed in the following link: Examples.
- Other examples will be posted before the non-beta version is posted.
SQLiteFuzyyPlusExtension was built using Visual Studio.
- The build was tested on two versions of Visual Studio's
- Microsoft Visual Studio Enterprise 2022 (64-bit) - Preview
- Version 17.14.13 Preview 1.0
- Microsoft Visual Studio Community 2022 (64-bit)
- Version 17.14.14
- Microsoft Visual Studio Enterprise 2022 (64-bit) - Preview
- All VS example projects where created using Microsoft Visual Studio Community 2022 (64-bit) - Current Version 17.14.14.
- See following link for more details: Visual Studio Example Projects
- GCC projects where built using MinGw.
- See following link for more details: GCC (MinGw) Example Projects
- All VS test projects where created using Microsoft Visual Studio Enterprise 2022 (64-bit) - Preview - Version 17.14.13 Preview 1.0
- The primary purpose of the test projects, is to verify at compile time that changes don't break supported programming languages and framework.
- The test projects are also used for the following:
- Create SQL scripts using all the fuzzy functions for API Distance, HowSimilar, and IsSimilar.
- Create SQL scripts using all the phonetic functions for API SameSound.
- Runs a performance test on all the fuzzy functions using SQLite, and it updates database TestData.db with the results, and also creates a README.md file showing the results.
- The SQL test is performed on 4 databases having the following row size.
- 10,000 rows
- 100,000 rows
- 1, 000, 000 rows
- 5, 000, 000 rows
- The SQL test is performed on 4 databases having the following row size.
- Add additional documentation for each function with examples.
- Try to figureout a way to load the extension to DB Browser for SQLite without having to copy the FuzzyPlusCSharp.dll to the same folder as the executable.
- Add an example C# project that uses the C# fuzzy functions with SQLite without having to attach extension.
- Finish implementing API's SameFirstLastName, SamePhone, SameSocial, SameZip, SameAddress, SameDate, and SameNumber. And add associated documentation.