Prepare
- Current AgensGraph version is released lastest 2.1 version.
- Installation (AgensGraph v.2.0)
- 2.1 version requires a bug fix.
Comment
- The string similarity example will proceed with the levenshtein distance first.
- I will summarize more string similarity verification schemes in the future if possible.
- This post related to the use case rather than the concept of levenshtein distance. I will not mention here because the concept comes from search.
- The smaller the levenshtein distance cost, the more comparable the value.
Extension
- There is an extension command in postgresql. Levenshtein distance function is present in the ‘fuzzystrmatch’ extension.
- Before fuzzystrmatch extension command, When you compile AgensGraph, you must specify the following keywords.
$ make && make install-world
- There is a branch in the Makefile (install-world).
If AgensGraph compile is complete, you can execute “create extension” in CLI(command line interface).
[agens@5628b77c5e38 agensgraph-2.0.0]$ agens
agens (AgensGraph 2.0.0, based on PostgreSQL 10.4)
Type "help" for help.
agens=#
agens=#
agens=# CREATE EXTENSION fuzzystrmatch;
CREATE EXTENSION
agens=#
- You can check the function using the ‘\df’ command.
- If you look at the list, you’ll see a function that takes the type of arguments as text.
\df levenshtein
List of functions
Schema | Name | Result data type | Argument data types | Type
--------+-------------+------------------+---------------------------------------+--------
public | levenshtein | integer | text, text | normal
public | levenshtein | integer | text, text, integer, integer, integer | normal
Using to the levenshtein function example
- There are two vertice.
- Each vertice has include property called by name.
- I will execute Cypher query for test data import
-- first graph path create
CREATE GRAPH test;
-- set graph_path (similar to RDB(mysql, mssql) USE DATABASE)
SET GRAPH_PATH TO test;
CREATE (:person {name:'hsjeon'});
CREATE (:person {name:'jhs'});
- If you just use the levenshtein function in the AgensGraph cypher query, you will see the following error message.
MATCH (a:person)
RETURN levenshten('jhs', a.name);
ERROR: function levenshten(jsonb, jsonb) does not exist
LINE 2: RETURN levenshten('jhs', a.name);
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
- So far AgensGraph cypher queries require explicit type casting because the value type is treated as jsonb.
- Create a to_string function for type casting of arguments.
- jsonb type -> text type
- Or type casting as hybrid query.
CREATE LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION to_string(value jsonb)
RETURNS text AS $$
return str(value.replace('"', ''))
$$ LANGUAGE plpythonu;
- In cypher query, use levenshtein distance function
MATCH (a:person)
RETURN a.name, levenshtein(to_string('jhs'), to_string(a.name));
name | levenshtein
----------+------------
"hsjeon" | 5
"jhs" | 0
(2 rows)
- Hybrid query example
SELECT g1.name
, levenshtein('jhs', to_string(g1.name))
FROM (MATCH (a:person)
RETURN a.name) g1;
name | levenshtein
----------+-------------
"hsjeon" | 5
"jhs" | 0
- In the next post, I will use the levenshtein distance function to illustrate some graph modeling after a brief string similarity comparison.