Moved from GitHub dgraph/5546
Posted by gentle-noah:
What version of Dgraph are you using?
Dgraph version : v20.03.2
Dgraph SHA-256 : aac66a8dc9feb7da845be56f6be1b088339986d29d6d063ac4e8c14823180044
Commit SHA-1 : 7553f0dea
Commit timestamp : 2020-05-15 15:42:30 -0700
Branch : HEAD
Go version : go1.14.1
Have you tried reproducing the issue with the latest release?
Yep!
What is the hardware spec (RAM, OS)?
OS: MacOS Catalina Version 10.15.5
Processor: 2.3 GHz 8-Core Intel Core i9
Memory: 16 GB 2400 MHz DDR4
Steps to reproduce the issue (command/config used to run Dgraph).
Some notes to set context before diving into reproduction steps.
This is attempting to use the dgraph migrate cmd to migrate data from an AWS MySQL RDS instance. Here are the configs of the RDS instance:
-
Aurora MySQL
-
db.r4.2xlarge
-
This instance is part of a larger 5 shard cluster
-
Create
config.properties
file in project -
Set
user
,password
anddb
values as such:
user = <db_username>
password = <db_password>
db = <db_name>
- Run the following command:
dgraph migrate --config config.properties --output_schema schema.txt --output_data sql.rdf --host xxx.us-xxx-1.rds.amazonaws.com --port 3306
Expected behaviour and actual result.
The expected behavior was the following:
-
A schema.txt file would be generated in the project directory.
This does actually happen. However, A large number of the generated predicate definitions come back with datatype asunknown
. This happens in many instances but to highlight a few examples:
1.a) If the ID is a string, such as a CUID or UUID, the datatype comes back asunknown
1.b) If the data type istext
then the defined data type by the migrate command labels this asunknown
1.c) In some instances, when the data type is astring
the migrate cmd labels the data type for the predicate asunknown
-
Once the schema file is generated it automatically starts creating the RDF N-Quads in the
sql.rdf
file
This also starts happening as expected. However, once the data is ready to begin writing to the file, an error is thrown because of the predicates with the data type ofunknown
-
Expectation: I should be able to edit the
schema.txt
file to manually fix theunknown
data type predicates and rerun the following command:
dgraph migrate --config config.properties --output_schema schema.txt --output_data sql.rdf --host xxx.us-xxx-1.rds.amazonaws.com --port 3306
BUT I am prompted with the following: overwriting the file schema.txt (y/N)?
. If I select y
then my manual work is overridden and the cycle starts over. If I select N
then the command aborts completely.
I have also tried this on two other MySQL data stores, with the same issues, where data types for predicates get set to unknown
. The assignment of unknown
is happening here: dgraph/datatype.go at master · dgraph-io/dgraph · GitHub I’m wondering if just additional field types are needed to be supported and also if unknown
cshould just default to string
as to not disrupt the process?
Please let me know if there is any more context I can provide. Also, thank you for DGraph, it is awesome.