Fulltext search across multiple fields

Experience Report for Feature Request

What you wanted to do

Fulltext search across multiple fileds

What you actually did

Nothing, yet, as it is not possible.

Why that wasn’t great, with examples

Alternatively we are contemplating creating duplicate data fields that hold all the strings concatenated together, but then we break the single source of truth and would rely on other mechanisms to keep this field in sync with the others.

Any external references to support your case

Use to being able to do logic within a filter in MySQL do filter on a concatenated field.

I will expand on this to maybe start another think tank sessions after just spending 3 hours on the phone trying to work through this.

Context

In our knowledge graph we access control data to public, users, groups, and group access rights. This means that for each piece of data we have an access control list defining who can access the data (user/group) and then for each user in a group they have rights that define what group data they are allowed to access. Here is part of our schema for this without any auth rules/directives for some simplification:

GraphQL Schema
type ACL {
  id: ID!
  grants: Contact
}
type Contact {
  id: ID!
  isPublic: Boolena
  access: [ACL]
  firstName: String
  lastName: String
  organization: String
  addresses: [Address]
  phones: [Phones]
  notes: [Notes]
  isUser: User
  isGroup: Group
  hasMemberships: [Member]
}
type User {
  id: ID!
  username: String
  isContact: Contact
}
type Group {
  id: ID
  isContact: Contact
  members: [Member]
}
type Member {
  id: ID
  group: Group
  member: Contact
  accessRights: [AccessRight] #enum not shown here
}
type Address {
  id: ID!
  isPublic: Boolean
  access: [ACL]
  line1: String
  city: String # just a string for simplicity here
  state: String
  forContacts: [Contact]
}
type Phone {
  id: ID!
  isPublic: Boolean
  access: [ACL]
  number: String
  forContacts: [Contact]
}
type Note {
  id: ID!
  isPublic: Boolean
  date: DateTime
  content: String
  forContact: Contact
}

This schema works beautifully with auth rules to query down to the data that any specific user is able to access. But now we want to enable a “global search” for a specific data type. Given a search string, find all contacts that best match that search string looking at the contact’s name, address, phone number, or note content. (We have identified in our real schema 12 nested fields right now we want to use).

Normally the best way to do this would build an index across all of these fields and search that index looking for the best matches, this would mean we could run a query at a single field something like: Contact.indexedData that is a string with all the terms that is related to that Contact and could easily get results, but that doesn’t work to honor access control rules for the index. Talking through this we would have to have multiple indexes for every contact and then access control those indexes for which users, and groups with access rights would be allowed to see the index. And even worse these indexes would have to be rebuilt/updated on a recurring schedule. I calculated some data and we would end up with roughly 4.8 Billion nodes of just indexes that contain a lot of duplicate data.

So since we can’t use any kind of prebuilt index, we need to run a query (or multiple queries) to find the Contacts using a search term across multiple nested fields. Then we need to find which contacts appears the most in these queries and create a ordered hashmap and slice that to get the top 5-10 results and query those again to display them from the search.

This sounds (and is becoming) very complicated for something that should be a really fast and light global search across all contacts. Started down this rabbit whole with some pseudo code and logically concluded that we would need to run 12*(words in search string > 2 characters) to be able to do a partial word match. One instance is searching for “Home Jack”, should return contacts whose organization name is “Home Improvement” (full text search) or contacts having an address with the city of “Jacksonville” (regex partial word match). In the business logic layer we will then take these queries and flatten them down to a list of contact ids. With these lists of contact ids we will then find the ones that appear the most often across all the queries. Then with those top 5-10 ids, we will run another query to fetch the actual data to display.

The problem I am looking at now is the efficiency of running 12-48+ queries in a single query operation for just a “simple” search. I understand these queries will run in parallel, but I am concerned thinking about the load it will place on the database.

I have not implemented this solution yet to even try to see how it might perform, I guess my next step is just to handwrite these 12-48 queries that might get generated for a single search and see how it performs or if it comes crashing down. :grimacing:

If anyone has any ideas how to efficiently create such a “global search” across multiple nested fields of a type that honors access control rules, please chime in. I feel like I have been tasked with this simple search form that in essence requires a custom in house silicon valley sized search application. Dgraph, provides a lot of great features, but this is something where I am just stuck racking my head thinking there isn’t a good solution.

If I was in a sql system I would form a query that just got every field and then build a temp table to perform a search over, but then I would still have to handle auth rules somehow.

I think I am stuck on this with either a solution that does not search over very many fields and only fields at the root level or a seemingly costly search that might get the results wanted but not within the timeframe needed.