Prepare

  • This post does not explain the concept of networkx or explain the principles of algorithms.
  • This post is an example of how AgensGraph uses the PL / Python library to grapple graph data.

Environment

  • AgensGraph v2.0 release
  • CentOS 7 (python 2.7)
  • enabled ‘pip’ executor

Setting

step 1. AgensGraph install (configure –with-python)

  • If you downloaded AgensGraph 2.0 source, you must be compiled C language files. Because we will use PL/Python in AgensGraph.
  • When you compiled files, you need to configure setting
user@pwd/blah/blah$ ./configure --prefix=$HOME/agraph --with-python
...
user@pwd/blah/blah$ make && make install-world
agens (AgensGraph 2.0.0, based on PostgreSQL 10.4)
Type "help" for help.

agens=#
agens=# CREATE GRAPH community_detection;
CREATE GRAPH
agens=# SET GRAPH_PATH TO community_detection;
SET
agens=#

step 2. AgensGraph plpython language creating

  • If you follow along well, the next step is easy.
  • In CLI, the following
CREATE LANGUAGE plpythonu;

step 3. python networkx library download

  • By using ‘pip’ executor, you can use the download networkx library.
  • And pip install networkx, pip install python-louvain, pip install numpy
  • (pip install pycopg2) is the most popular PostgreSQL adapter. (AgensGraph is based on PostgreSQL.)
  • (account with administrator rights)
user@pwd/blah/blah$ sudo pip install networkx
user@pwd/blah/blah$ sudo pip install python-louvain
user@pwd/blah/blah$ sudo pip install numpy
user@pwd/blah/blah$ sudo pip install psycopg2

step 4. community detection example

  • Next. In AgensGraph CLI mode, you can write the script used python.
  • It is similar to Oracle’s PL/SQL. AgensGraph uses python as the procedure language. Of course, SQL also exists.
  • Dataset is 6 nodes and 3 edges. following
SET GRAPH_PATH TO community_detection;

CREATE (a:person {name:'hsjeon'})-[r:knows]->(b:person {name:'jhs'});
CREATE (a:person {name:'foo'})-[r:knows]->(b:person {name:'bar'});
CREATE (a:person {name:'test'})-[r:knows]->(b:person {name:'test1'});
CREATE OR REPLACE FUNCTION louvain_method()
RETURNS void as $$
import networkx as nx
import psycopg2 as ag
import community

def doQuery(qry):
   try:
      conn = ag.connect("host=127.0.0.1 port=5555 dbname=agens user=agens password=agens")
      cur = conn.cursor()
      cur.execute("SET GRAPH_PATH TO community_detection")

      cur.execute(qry)
      records = cur.fetchall()

      G.add_edges_from([tuple(r) for r in records])
   except Exception, e:
      plpy.error(e)
   finally:
      conn.commit()
      cur.close()
      conn.close()

G = nx.Graph()
query = "MATCH (a)-[r]->(b) RETURN id(a) AS startNode, id(b) AS endNode"
doQuery(query)
partition = community.best_partition(G)
plpy.info("modularity >> "+str(community.modularity(partition,G)))
cluster = 0
for com in set(partition.values()):
    cluster = cluster + 1
    list_nodes = [nodes for nodes in partition.keys() if partition[nodes] == com]
    plpy.info(cluster,list_nodes)

$$ LANGUAGE plpythonu;
  • It is a simple script that runs the networkx louvain method algorithm after viewing the entire graph in AgensGraph.
  • This function execute is following
agens=# SELECT louvain_method();
INFO:  modularity >> 0.666666666667
INFO:  (1, ['6.6', '6.5'])
INFO:  (2, ['6.4', '6.3'])
INFO:  (3, ['6.2', '6.1'])
 louvain_method
----------------

(1 row)
  • In the above test data, we created six nodes with three relationships. You can see that the connected ones are community detection.
  • Then, in the next test, (hsjeon and foo), (jhs and bar) know each other and create a relationship.
MATCH (a:person),(b:person)
WHERE a.name = 'hsjeon'
AND   b.name = 'foo'
CREATE (a)-[r:knows]->(b);

MATCH (a:person),(b:person)
WHERE a.name = 'jhs'
AND   b.name = 'bar'
CREATE (a)-[r:knows]->(b);
  • And check that community detection is working normally to find two communities. ```` agens=# SELECT louvain_method(); INFO: modularity » 0.32 INFO: (1, [‘6.6’, ‘6.5’]) INFO: (2, [‘6.4’, ‘6.3’, ‘6.2’, ‘6.1’]) louvain_method —————-

(1 row) ````

  • Clustering is done using algorithms, clusters may not be found as you might think.
  • In order to increase the modularity, it is necessary to configure the graph modeling according to the priority so that the desired cluster can be obtained.
  • The reason for clustering the graph data is that when the information bound to different clusters is related, it is possible to find out which information is related to the different clusters.

  • In the next post, I will construct a graph modeling with a simple scenario and try to figure out how to clustering.