Confuse about schema file in bulk loading

I’m trying to benchmark dgraph and other graph database with the freebase film 21 million dataset provided by dgraph-io/benchmarks repo in github.

The dataset has been successfully imported by command below:

     dgraph bulk -f ./21million.rdf.gz -s ./21million.schema

I found that the schema description in 21million.schema seems not complete, this file is put below:

    director.film        : [uid] @reverse @count .
    actor.film           : [uid] @count .
    genre                : [uid] @reverse @count .
    initial_release_date : datetime @index(year) .
    rating               : [uid] @reverse .
    country              : [uid] @reverse .
    loc                  : geo @index(geo) .
    name                 : string @index(hash, term, trigram, fulltext) @lang .
    starring             : [uid] @count .
    performance.character_note : string @lang .
    tagline              : string @lang .
    cut.note             : string @lang .
    rated                : [uid] @reverse .
    email                : string @index(exact) @upsert .

Article " Neo4j vs Dgraph - The numbers speak for themselves" mentioned that there are 50 distinct types of entities and 132 types of relationships between these entities in this dataset, however, there are much less information in schema file obviously.

I wonder how to get the complete schema of imported data like:

type Task {
    id: ID!
    title: String!
    completed: Boolean!
    user: User!
}

type User {
    username: String! @id
    name: String
    tasks: [Task] @hasInverse(field: user)
}

Is there any way I can handle this?

Hi @jo3yzhu , Welcome to Dgraph Community!!
yeah, schema there is minimum schema , so that loading process is completed without any errors.
For example, See the directives like @lang,while loading, language predicates requires that directive.
All other predicates in schema are generated automatically when you load data.

Above schema is GraphQL schem but schema you see in Dgraph ratel is Dgraph schema not graphql.

Much thanks for your help.
I’ve follow your advice to use Ratel Dashboard, trying to find out where schema is, however, the schema I found in dashboard page seems not that similar with my original film RDF dataset

The command I run for startup dgraph were below:

dgraph zero
dgraph alpha --zero localhost:5080 -o 100

I think that means I should connect to http://localhost:8180 when initial Ratel Dashboard and that’s what I did.

Is there any idea if I should do some other thing like activate my imported data?

These are the built in predicates in dgraph. If you have dataset loaded in the cluster then schema should be there. May be you started a new cluster and haven’t loaded data yet. I just loaded data and for me the predicates are there.

@MichelDiz , anything else I may be missing?

I’m not pretty sure if I should the specify where my data(sstables) is when start up Zero and Alpha, because it seems that Alpha cannot find where the data is from log of it

Dgraph creates 3 directories p,w,z where all the data is stored. if previously you loaded data using bulk load command then it is stored in those folders. Make sure to start you are in same working directory where those folders are present. If you restart the cluster in some other location then you won’t be able to see loaded data .

This below message indicate we don’t have GraphQL schema, and that’s ok.
No GraphQL schema in Dgraph:

In this case the bulk files are at out/0/p and you have to move it to the instance path or to the cloud or wherever you have Dgraph running. Remember to use always the same Zero you have started the import.

Ignore any predicate prefixed with dgraph.***.

Also, 21 million dataset doesn’t have a GraphQL Schema available. We have started to create it, but it looks like it was discontinued.

Sorry for my ignoring, does that means I must extract schema of 21 million dataset if I want to build it and do query manually ? I’m working on a query benchmark between dgraph and my graph system.

Not sure what you mean. The 21mi RDF has a Schema, in DQL, not GraphQL.