Prepare

  • Current AgensGraph version is released lastest 2.1 version.
  • Installation (AgensGraph v.2.0)
  • 2.1 version requires a bug fix.

Comment

  • The string similarity example will proceed with the levenshtein distance first.
  • I will summarize more string similarity verification schemes in the future if possible.
  • This post related to the use case rather than the concept of levenshtein distance. I will not mention here because the concept comes from search.
  • The smaller the levenshtein distance cost, the more comparable the value.

Extension

  • There is an extension command in postgresql. Levenshtein distance function is present in the ‘fuzzystrmatch’ extension.
  • Before fuzzystrmatch extension command, When you compile AgensGraph, you must specify the following keywords.
$ make && make install-world
  • There is a branch in the Makefile (install-world).

If AgensGraph compile is complete, you can execute “create extension” in CLI(command line interface).

[agens@5628b77c5e38 agensgraph-2.0.0]$ agens
agens (AgensGraph 2.0.0, based on PostgreSQL 10.4)
Type "help" for help.

agens=#
agens=#
agens=# CREATE EXTENSION fuzzystrmatch;
CREATE EXTENSION
agens=#
  • You can check the function using the ‘\df’ command.
  • If you look at the list, you’ll see a function that takes the type of arguments as text.
\df levenshtein

                                    List of functions
 Schema |    Name     | Result data type |          Argument data types          |  Type
--------+-------------+------------------+---------------------------------------+--------
 public | levenshtein | integer          | text, text                            | normal
 public | levenshtein | integer          | text, text, integer, integer, integer | normal

Using to the levenshtein function example

  • There are two vertice.
  • Each vertice has include property called by name.
  • I will execute Cypher query for test data import
-- first graph path create
CREATE GRAPH test;
-- set graph_path (similar to RDB(mysql, mssql) USE DATABASE)
SET GRAPH_PATH TO test;
CREATE (:person {name:'hsjeon'});
CREATE (:person {name:'jhs'});
  • If you just use the levenshtein function in the AgensGraph cypher query, you will see the following error message.
MATCH (a:person)
RETURN levenshten('jhs', a.name);

ERROR:  function levenshten(jsonb, jsonb) does not exist
LINE 2: RETURN levenshten('jhs', a.name);
               ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
  • So far AgensGraph cypher queries require explicit type casting because the value type is treated as jsonb.
  • Create a to_string function for type casting of arguments.
    • jsonb type -> text type
  • Or type casting as hybrid query.
CREATE LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION to_string(value jsonb)
RETURNS text AS $$
return str(value.replace('"', ''))
$$ LANGUAGE plpythonu;
  • In cypher query, use levenshtein distance function
MATCH  (a:person)
RETURN  a.name, levenshtein(to_string('jhs'), to_string(a.name));
   name   | levenshtein
----------+------------
 "hsjeon" |           5
 "jhs"    |           0
(2 rows)
  • Hybrid query example
SELECT g1.name
     , levenshtein('jhs', to_string(g1.name)) 
FROM  (MATCH (a:person)
       RETURN a.name) g1;
   name   | levenshtein
----------+-------------
 "hsjeon" |           5
 "jhs"    |           0
  • In the next post, I will use the levenshtein distance function to illustrate some graph modeling after a brief string similarity comparison.