Full-text tokenizer can't deal with apostrophe

Moved from GitHub dgraph/4633

Posted by Kubera2017:

What version of Dgraph are you using?


Have you tried reproducing the issue with the latest release?


What is the hardware spec (RAM, OS)?

12GB, Ubuntu 19.04

Steps to reproduce the issue (command/config used to run Dgraph).

  1. Set the schema:
    file_content: string @index(fulltext) @lang .
  2. Insert data
    “file_content@en”: “unrelated breaks GIT Buccal micron Standard burst College Overall absorptive paracellular measures advance contains mm protein’s chymosin beyond β-lactotensin permanent respective rigid-body apical corneum information murine medium After supported mCherry—a ZOT fluorometer immobilized fully”
  3. Try to search “protein”

Expected behaviour and actual result.

Text search should found “protein” in “protein’s” but can’t found it. Neo4j’s Lucene can.