"HyperGraphQL" query language

(Jeff Hull) #1

Has anyone heard of this effort by “Semantic Integration Ltd.” in the UK?

I just read about it from a blog post about a company called GRAKN.AI which aims to provide a layer on top of graph databases for the purpose of creating knowledge graphs.

Here’s the link to the post:

ASIDE: GRAKN.AI uses “hypergraphs” instead of vanilla digraphs to model their knowledge, although under the hood, they map these hypergraphs to digraphs before persisting to whatever vendor DB using Apache TinkerPop. Here’s a link to a post about their “hypergraph” modeling approach.

(Eugen Brochier) #2

Thanx for the information

Was an interesting reading and a good input for refreshing research - and - some updating my knowledge

RE: Has anyone heard of this effort by “Semantic Integration Ltd.” in the UK?
I did not

  • was working some years for a company, which is still implemeting an intelligent search engine - responsible for technical infrastructure
  • they create a knowledge base using many dumps as base like wikipedia-dumps and other big-data sources - these dumps will be read into a Titan(JanusGraph)/Cassandra based knowledge engine combined with higher level of annotation and disambiguation
  • my view on big-data projects - many still use Java including a softwarestack from apache also partly implemented on Java - in my view a performance problem when GB of data will be processed

(Jeff Hull) #3

Seems interesting… Have they created a full-featured knowledge graph on top of this data? Do they rely on the standard SPARQL / Gremlin? It seems like the state of knowledge graphs, at least the public ones as opposed to whatever Google, Baidu, Facebook etc. have created… is pretty fragmented and very new. I am looking for the best way to control the schema of our knowledge graph and query it efficiently. The GRAKN.AI query language, Graql, along with its associated type system, seems interesting and with high-level abstractions appropriate for a knowlege graph… But from what I can tell, it’s a proprietary language rather than being on some standards track.

Just wondering, as I’m not familiar with the Java-related performance issues you point out. What do you think the problem with Java is?

I have my own reservations with Java… I’ve worked with it a bit and find it to be a bit clunky and verbose. That’s one of the reasons I like Dgraph, being based on concise and innovatie Golang. Not to mention the fact that Dgraph is the full-stack database, rather than just being a query layer on top of some other back-end datastore.

(Eugen Brochier) #4

We did not test procedures in Java versus Golang but versus Standard-C - these tests showed a much better performance in Standard-C
Java: using Oracle Java-V.8.xxx
Standard-C: using gcc compiler
VMs: Ubuntu 14.04 LTS in HyperV on IBM 4-Xeon servers

(Eugen Brochier) #5

As far as i know its standard Gremlin
there are different machines
a) 1 VM with 8-cores for each language (now 6) and filled with data from wikipedia dumps used for entity linking - but better than dexter from http://dexter.isti.cnr.it/
b) 1 VM with 8 cores for search and ranking of “documents” plus store in Titan with Cassandra persistance backend

  • plus connectors with Apache-Manifold and ActiveMQ to link to the main application
  • plus in memory screenshots
  • plus ps2html
  • plus elasticsearch

The problem also is that the project started with a very small free group and grew with Java and the tools available at the beginning. So it is not easy to change to newer techniques like Golang or change extern modules like ActiveMQ

They also see more “splitting” into more VM´s to increase the performance

The application itself targets enterprises and government orgs for
a) research
b) organise own documents which are stored in various formats (historically)

  • user can decide to organise their document metadata in the system and the documents external

(Jeff Hull) #6

Interesting. This is one of the problems we aim to solve also. Enterprise document management at the companies I’ve consulted for is so bad…MS sharepoint turns into a pile of unsearchable, unstructured information. And of course piles of documents on network share drives are even worse.

It’s interesting to me that Java dominates the vast majority of graph databases (Titan, Neo4j, OrientDB, JanusGraph…). Sometimes I wonder if this is because graph databases are relevant to many adademic topics in the “AI” space, and Java has a strong foothold in academia, or at least that is my experience.

(Eugen Brochier) #7

Hello again
Yes a true fact - Enterprise documents is still an unseen fact here in Europe
btw - sharepoint is a requested question by some “early birds” - i saw many “Bigs” thinking sharepoint can solve their problems - to tell someone what is a “documentproof solution” is a hard job

last - a bit clandestine but not really - my former colleagues told me that they r working on a sharepoint plugin (via manifold or ??) to crawl throught these structures - was a demand from some enterprises they could reach

(Eugen Brochier) #8

From my view easy - Java also here in Europe is the most used development platform - Hadoop, Spark, Solr etc etc are in use in various solutions
so what i see is huge heap of software with more huge dependencies of these libraries
all grown up by years - a hard(expensive) way to redevelope it

(Eugen Brochier) #9

However - Databases are a Backbone - need stability and ACID
So frontend solutions needs them - new methods of working with data have 2 demands
a) good frontends to transform real needs in requests/solutions
b) a powerfull backend to represent the needs into structured “memory/storage/refindable” backend store -> ACID