Skip to content

David-Maisonave/SqliteFuzzyPlusExtension

Repository files navigation

SqliteFuzzyPlusExtension

This SQLite extension is the ultimate library on fuzzy logic for SQLite.

  • Note:
    • This is a beta release, and some of the links are still under constructions. All the links should be complete before the full non-beta release.
    • This fuzzy library is design so it can be used with or without SQLite.

Content

Example SQLite Usage

Example#1

The following query finds a 90% match for the name "David Jorge" using the default distance method.

select Name, Similar(Name, "David Jorge") as s FROM SimilarNames where s > .9
Example#1

Query to list the different results of different distance methods.

select Name
, HowSimilar(Name, "David Jorge", "Levenshtein") as lev, HowSimilar(Name, "David Jorge", "DamerauLevenshtein") as dlev, HowSimilar(Name, "David Jorge", "LongestCommonSequence") as lcs, HowSimilar(Name, "David Jorge", "NeedlemanWunsch") as n
, HowSimilar(Name, "David Jorge", "JaroWinkler") as jw
FROM SimilarNames
Example#3 (SQLean)

Return a results that has less than 2 edit distance.

select Name, fuzzy_damlev(Name, "David Jorge") as d FROM SimilarNames where d < 2

For more examples, see Documentation.

SqliteFuzzyPlusExtension is a SQLite Fuzzy Extension which is build using both C/C++ and CSharp libraries. The Visual Studio solution builds 2 DLL (C++ & C#). 95% of the source is taken from other fuzzy libraries like SimMetricsCore, SQLean, Edlib, SimMetrics.Net, jaccardsimilarity, Phonix, and Microsoft PhoneticMatching.

Using SqliteFuzzyPlusExtension

Using SqliteFuzzyPlusExtension With SQLite

  • As a SQLite extension, the two DLL's are both required (SqliteFuzzyPlusExtension.dll and FuzzyPlusCSharp.dll).
  • When calling the libraries from source code, add SqliteFuzzyPlusExtension.lib to the build and for c++ code, add include SqliteFuzzyPlusExtension.h.
  • If building in C, add #define EXCLUDE_NAMESPACE_SQLITEFUZZYPLUSEXTENSION, before including SqliteFuzzyPlusExtension.h.

DB Browser for SQLite

When using SqliteFuzzyPlusExtension.dll with an executable like DB Browser for SQLite, the FuzzyPlusCSharp.dll file MUST be in the same directory as the executable (DB Browser for SQLite.exe). While the SqliteFuzzyPlusExtension.dll can be located anywhere.

Using SqliteFuzzyPlusExtension Without SQLite

Function List

Fuzzy String Matching Algorithms

Phonetic Fuzzy Functions

Regex Functions

SqliteFuzzyPlusExtension Fuzzy Wrapper Functions

Miscellaneous Fuzzy Functions

Miscellaneous Functions

These minimum and maximum functions are included to allow developers to create custom percentage distance in a query.

The Plus in SqliteFuzzyPlusExtension

The plus is in the name because this extension has some extra functions that have little to do with fuzzy logic. Here's a list of the non-fuzzy functions.

  • StringReverse - Returns the string in reverse order. - (alias = StrRev)
  • GetDirectoryName - Retrieves the directory information from a given path string. - (alias = GetDirName)
  • GetExtension - Retrieves the extension from a given path string.
  • GetFileName - Retrieves the file name (including its extension) from a given path string. - (alias = GetFileNameWithExtension)
  • GetFileNameWithoutExtension - Retrieves the file name (excluding its extension) from a given path string.
  • IsDirExist - Takes the path of the directory as a string argument and returns 1 or 0, where 1 means the directory exists. - (alias = DirExist)
  • IsFileExist - Takes a string path to determine if a specified file exists and returns 1 or 0, where 1 means the file exists. - (alias = FileExist)

Examples

  • SQL examples are listed in the following link: Examples.
  • Other examples will be posted before the non-beta version is posted.

Build

SQLiteFuzyyPlusExtension Builds

SQLiteFuzyyPlusExtension was built using Visual Studio.

  • The build was tested on two versions of Visual Studio's
    • Microsoft Visual Studio Enterprise 2022 (64-bit) - Preview
      • Version 17.14.13 Preview 1.0
    • Microsoft Visual Studio Community 2022 (64-bit)
      • Version 17.14.14

VS Example Projects

  • All VS example projects where created using Microsoft Visual Studio Community 2022 (64-bit) - Current Version 17.14.14.
  • See following link for more details: Visual Studio Example Projects

GCC Example Projects

VS Test Projects

  • All VS test projects where created using Microsoft Visual Studio Enterprise 2022 (64-bit) - Preview - Version 17.14.13 Preview 1.0
  • The primary purpose of the test projects, is to verify at compile time that changes don't break supported programming languages and framework.
  • The test projects are also used for the following:
    • Create SQL scripts using all the fuzzy functions for API Distance, HowSimilar, and IsSimilar.
    • Create SQL scripts using all the phonetic functions for API SameSound.
    • Runs a performance test on all the fuzzy functions using SQLite, and it updates database TestData.db with the results, and also creates a README.md file showing the results.
      • The SQL test is performed on 4 databases having the following row size.
        • 10,000 rows
        • 100,000 rows
        • 1, 000, 000 rows
        • 5, 000, 000 rows

ToDo

  • Add additional documentation for each function with examples.
  • Try to figureout a way to load the extension to DB Browser for SQLite without having to copy the FuzzyPlusCSharp.dll to the same folder as the executable.
  • Add an example C# project that uses the C# fuzzy functions with SQLite without having to attach extension.
  • Finish implementing API's SameFirstLastName, SamePhone, SameSocial, SameZip, SameAddress, SameDate, and SameNumber. And add associated documentation.