Migrate: Assigns many predicates with datatype 'unknown'

Moved from GitHub dgraph/5546

Posted by gentle-noah:

What version of Dgraph are you using?

Dgraph version   : v20.03.2
Dgraph SHA-256   : aac66a8dc9feb7da845be56f6be1b088339986d29d6d063ac4e8c14823180044
Commit SHA-1     : 7553f0dea
Commit timestamp : 2020-05-15 15:42:30 -0700
Branch           : HEAD
Go version       : go1.14.1

Have you tried reproducing the issue with the latest release?


What is the hardware spec (RAM, OS)?

OS: MacOS Catalina Version 10.15.5
Processor: 2.3 GHz 8-Core Intel Core i9
Memory: 16 GB 2400 MHz DDR4

Steps to reproduce the issue (command/config used to run Dgraph).

Some notes to set context before diving into reproduction steps.

This is attempting to use the dgraph migrate cmd to migrate data from an AWS MySQL RDS instance. Here are the configs of the RDS instance:

  1. Aurora MySQL

  2. db.r4.2xlarge

  3. This instance is part of a larger 5 shard cluster

  4. Create config.properties file in project

  5. Set user, password and db values as such:

user = <db_username>
password = <db_password>
db = <db_name>
  1. Run the following command:
    dgraph migrate --config config.properties --output_schema schema.txt --output_data sql.rdf --host xxx.us-xxx-1.rds.amazonaws.com --port 3306

Expected behaviour and actual result.

The expected behavior was the following:

  1. A schema.txt file would be generated in the project directory.
    This does actually happen. However, A large number of the generated predicate definitions come back with datatype as unknown. This happens in many instances but to highlight a few examples:
    1.a) If the ID is a string, such as a CUID or UUID, the datatype comes back as unknown
    1.b) If the data type is text then the defined data type by the migrate command labels this as unknown
    1.c) In some instances, when the data type is a string the migrate cmd labels the data type for the predicate as unknown

  2. Once the schema file is generated it automatically starts creating the RDF N-Quads in the sql.rdf file
    This also starts happening as expected. However, once the data is ready to begin writing to the file, an error is thrown because of the predicates with the data type of unknown

  3. Expectation: I should be able to edit the schema.txt file to manually fix the unknown data type predicates and rerun the following command:
    dgraph migrate --config config.properties --output_schema schema.txt --output_data sql.rdf --host xxx.us-xxx-1.rds.amazonaws.com --port 3306

BUT I am prompted with the following: overwriting the file schema.txt (y/N)?. If I select y then my manual work is overridden and the cycle starts over. If I select N then the command aborts completely.

I have also tried this on two other MySQL data stores, with the same issues, where data types for predicates get set to unknown. The assignment of unknown is happening here: https://github.com/dgraph-io/dgraph/blob/master/dgraph/cmd/migrate/datatype.go#L35 I’m wondering if just additional field types are needed to be supported and also if unknown cshould just default to string as to not disrupt the process?

Please let me know if there is any more context I can provide. Also, thank you for DGraph, it is awesome.