Full-text tokenizer can't deal with apostrophe

diggy · January 21, 2020, 6:18pm

Moved from GitHub dgraph/4633

Posted by Kubera2017:

1.1.1

No

12GB, Ubuntu 19.04

Set the schema:
file_content: string @index(fulltext) @lang .
Insert data
“file_content@en”: “unrelated breaks GIT Buccal micron Standard burst College Overall absorptive paracellular measures advance contains mm protein’s chymosin beyond β-lactotensin permanent respective rigid-body apical corneum information murine medium After supported mCherry—a ZOT fluorometer immobilized fully”
Try to search “protein”

Text search should found “protein” in “protein’s” but can’t found it. Neo4j’s Lucene can.

Topic		Replies	Views
Improve CJK tokenizer support Dgraph dgraph , kind:enhancement	1	452	July 8, 2020
@lang not indexing correctly, breaking `anyofterms` and `allofterms` Dgraph status:accepted , kind:bug , ticket:created , tokenization	7	1246	January 18, 2021
Prefix and postfix query Dgraph	2	547	March 31, 2021
Regular Expression Positive Look Ahead not supported Dgraph	1	453	July 5, 2023
String escaping and language fulltext search question Issues kind:question , dgraph	4	469	July 29, 2021