Usage of GART Storage

GART Storage can be used with user-defined data sources.

The workflow of GART is the following steps:

Users can directly input the data as TxnLog to the converter and provide an RGMapping file.

Building

git clone https://github.com/GraphScope/GART.git gart
cd gart

docker image rm gart; docker build -t gart .
docker run -it --name gart0 gart

Input Data Format

Each data change record should be input to the converter in the format of TxnLog.

The sample format of TxnLog is as follows (Debezium style, only necessary information):

{
    "before": null,
    "after": {
        "org_id": "0",
        "org_type": "company",
        "org_name": "Kam_Air",
        "org_url": "http://dbpedia.org/resource/Kam_Air"
    },
    "source": {
        "ts_ms": 1689159703811,
        "db": "ldbc",
        "table": "organisation"
    },
    "op": "c"
}

This sample records the log that inserts a tuple of the organisation table. For delete and update operations, the difference is about the before, after, and op.

For delete operations:

{
    "before": {
        "org_id": "0",
        "org_type": "company",
        "org_name": "Kam_Air",
        "org_url": "http://dbpedia.org/resource/Kam_Air"
    },
    "after": null,
    "source": {
        "ts_ms": 1689159703815,
        "db": "ldbc",
        "table": "organisation"
    },
    "op": "d"
}

For update operations:

{
    "before": {
        "org_id": "0",
        "org_type": "company",
        "org_name": "Kam_Air",
        "org_url": "http://dbpedia.org/resource/Kam_Air"
    },
    "after": {
        "org_id": "0",
        "org_type": "company",
        "org_name": "Peter_Mark",
        "org_url": "http://dbpedia.org/resource/Peter_Mark"
    },
    "source": {
        "ts_ms": 1689159703815,
        "db": "ldbc",
        "table": "organisation"
    },
    "op": "u"
}

RGMapping Format

The sample RGMapping format is in the file rgmapping-ldbc.yaml.

The vertex type is defined under vertexMappings.vertex_types as follows, which dedicates the table name (dataSourceName) and the mapping between relational attributes and vertex properties.

- type_name: organisation
    dataSourceName: organisation
    idFieldName: org_id
    mappings:
    - property: org_id
        dataField:
        name: org_id
    - property: org_type
        dataField:
        name: org_type
    - property: org_name
        dataField:
        name: org_name
    - property: org_url
        dataField:
        name: org_url

The edge type is defined under edgeMappings.edge_types as follows, which dedicates the source and destination vertex types (source_vertex, destination_vertex), the relational table name (dataSourceName), the keys to connecting two vertices, and the mapping between relational attributes and edge properties.

- type_pair:
    edge: org_islocationin
    source_vertex: organisation
    destination_vertex: place
    dataSourceName: org_islocationin
    sourceVertexMappings:
    - dataField:
        name: src
    destinationVertexMappings:
    - dataField:
        name: dst
    dataFieldMappings:
    []