Confuse about schema file in bulk loading

jo3yzhu · May 6, 2021, 8:32am

I’m trying to benchmark dgraph and other graph database with the freebase film 21 million dataset provided by dgraph-io/benchmarks repo in github.

The dataset has been successfully imported by command below:

     dgraph bulk -f ./21million.rdf.gz -s ./21million.schema

I found that the schema description in 21million.schema seems not complete, this file is put below:

    director.film        : [uid] @reverse @count .
    actor.film           : [uid] @count .
    genre                : [uid] @reverse @count .
    initial_release_date : datetime @index(year) .
    rating               : [uid] @reverse .
    country              : [uid] @reverse .
    loc                  : geo @index(geo) .
    name                 : string @index(hash, term, trigram, fulltext) @lang .
    starring             : [uid] @count .
    performance.character_note : string @lang .
    tagline              : string @lang .
    cut.note             : string @lang .
    rated                : [uid] @reverse .
    email                : string @index(exact) @upsert .

Article " Neo4j vs Dgraph - The numbers speak for themselves" mentioned that there are 50 distinct types of entities and 132 types of relationships between these entities in this dataset, however, there are much less information in schema file obviously.

I wonder how to get the complete schema of imported data like:

type Task {
    id: ID!
    title: String!
    completed: Boolean!
    user: User!
}

type User {
    username: String! @id
    name: String
    tasks: [Task] @hasInverse(field: user)
}

Is there any way I can handle this?

JatinDevDG · May 6, 2021, 10:52am

Hi @jo3yzhu , Welcome to Dgraph Community!!
yeah, schema there is minimum schema , so that loading process is completed without any errors.
For example, See the directives like @lang,while loading, language predicates requires that directive.
All other predicates in schema are generated automatically when you load data.

jo3yzhu:

I wonder how to get the complete schema of imported data like:

You can access full schema after loading using ratel Dgraph Ratel Dashboard
Just put your local dgraph server URL http://localhost:8080 and connect it to dgraph.
In schema section you will see the full schema. You can go to bulk edit form more details.
type Task {
    id: ID!
    title: String!
    completed: Boolean!
    user: User!
}

type User {
    username: String! @id
    name: String
    tasks: [Task] @hasInverse(field: user)
}

Above schema is GraphQL schem but schema you see in Dgraph ratel is Dgraph schema not graphql.

jo3yzhu · May 6, 2021, 11:23am

Much thanks for your help.
I’ve follow your advice to use Ratel Dashboard, trying to find out where schema is, however, the schema I found in dashboard page seems not that similar with my original film RDF dataset

The command I run for startup dgraph were below:

dgraph zero

dgraph alpha --zero localhost:5080 -o 100

I think that means I should connect to http://localhost:8180 when initial Ratel Dashboard and that’s what I did.

Is there any idea if I should do some other thing like activate my imported data?

JatinDevDG · May 6, 2021, 11:51am

These are the built in predicates in dgraph. If you have dataset loaded in the cluster then schema should be there. May be you started a new cluster and haven’t loaded data yet. I just loaded data and for me the predicates are there.

@MichelDiz , anything else I may be missing?

jo3yzhu · May 6, 2021, 12:12pm

I’m not pretty sure if I should the specify where my data(sstables) is when start up Zero and Alpha, because it seems that Alpha cannot find where the data is from log of it

JatinDevDG · May 6, 2021, 12:25pm

Dgraph creates 3 directories p,w,z where all the data is stored. if previously you loaded data using bulk load command then it is stored in those folders. Make sure to start you are in same working directory where those folders are present. If you restart the cluster in some other location then you won’t be able to see loaded data .

This below message indicate we don’t have GraphQL schema, and that’s ok.
No GraphQL schema in Dgraph:

MichelDiz · May 6, 2021, 1:38pm

In this case the bulk files are at out/0/p and you have to move it to the instance path or to the cloud or wherever you have Dgraph running. Remember to use always the same Zero you have started the import.

Ignore any predicate prefixed with dgraph.***.

Also, 21 million dataset doesn’t have a GraphQL Schema available. We have started to create it, but it looks like it was discontinued.

jo3yzhu · May 19, 2021, 1:41am

Sorry for my ignoring, does that means I must extract schema of 21 million dataset if I want to build it and do query manually ? I’m working on a query benchmark between dgraph and my graph system.

MichelDiz · May 19, 2021, 1:51am

Not sure what you mean. The 21mi RDF has a Schema, in DQL, not GraphQL.

Topic		Replies	Views
Creating Schema and loading data Dgraph kind:question	18	1660	July 23, 2021
Bulk loader fails to create graphql schema properly Dgraph graphql	11	1225	October 13, 2020
Bulk loader still OOM during reduce phase Dgraph area:bulk-loader	18	871	August 1, 2021
Bulk Load doesn't import / save the schema file Dgraph area:bulk-loader	8	1000	August 17, 2022
Strange result in comparison with neo4j Users	5	587	December 22, 2019

Confuse about schema file in bulk loading

Related topics