View on GitHub

dataasee

DatAasee - A Metadata-Lake for Libraries

DatAasee Software Documentation

Version: 0.3

DatAasee is a metadata-lake for centralizing bibliographic metadata and scientific metadata from various sources, to increase research data findability and discoverability, as well as metadata availability, and thus supporting FAIR research and research reporting in university libraries, research libraries, academic libraries or scientific libraries.

Particularly, DatAasee is developed for and by the University and State Library of Münster, but available openly under an free and open-source license.

Table of Contents:

Selected Subsections:


Explanations

In this section understanding-oriented explanations are collected.

Overview:

About

Features

Components

DatAasee uses a three-tier architecture with these separately containered components and orchestrated by Compose:

Function Abstraction Tier Product
Metadata Catalog Multi-Model Database Data (Database) ArcadeDB
EtLT Processor Declarative Streaming Processor Logic (Backend) Benthos
Web Frontend Declarative Web Framework Presentation (Frontend) Lowdefy

Design

Data Model

The internal data model is based on the one big table (OBT) approach, but with the exception of linked enumerated dimensions (Look-Up tables) making it effectively a denormalized wide table with star schema. Specifically, the type (table) is named metadata.

EtLT Process

Combining the ETL (Extract-Transform-Load / schema-on-write) and ELT (Extract-Load-Transform / schema-on-read) concepts, processing is built upon the EtLT approach:

Particularly, this means “EtL” happens (batch-wise) during ingest, while “T” occurs when requested.

Security

Secrets:

Infrastructure:

Interface:


How-Tos

In this section, step-by-step guides for real-world problems are listed.

Overview:

Prerequisite

The (virtual) machine deploying DatAasee requires docker-compose on top of docker or podman, see also the container engine compatibility.

Resources

The compute and memory resources for DatAasee can be configured via the compose.yaml. Overall, a bare-metal machine or virtual machine requires:

So, a Raspberry Pi would be sufficient. In terms of DatAasee components this breaks down to:

Note, that resource and system requirements depend on load, particularly, database and backend are under heavy load during ingest. Post ingest, (new) metadata records are interrelated, also causing heavy database loads. Generally, the database drives the overall performance. Thus, to improve performance, try first to increase the memory limits (in the compose.yaml) for the database component (i.e. from 4G to 6G).

Using DatAasee

In this section the terms “operator” and “user” are utilized, where “operator” refers to the party installing, serving and maintaining DatAasee, and “user” refers to the individuals reading from DatAasee.

Operator Activities

User Activities

This means the user can only use the GET API endpoints, while the operator also uses the POST API endpoints.

Deploy

$ mkdir -p backup  # or: ln -s /path/to/backup/volume backup
$ wget https://raw.githubusercontent.com/ulbmuenster/dataasee/0.3/compose.yaml
$  DB_PASS=password1 DL_PASS=password2 docker compose up -d

NOTE: The required secrets are kept in the temporary environment variables DL_PASS and DB_PASS, the leading space in the line starting docker compose omits this command from the history.

NOTE: To further customize your deploy, use these environment variables. The runtime configuration environment variables can be stored in an .env file.

WARNING: Do not put secrets into the .env file!

Probe

wget -SqO- http://localhost:8343/api/v1/ready

NOTE: The default port for the HTTP API is 8343.

Shutdown

$ docker compose down

NOTE: A (database) backup is automatically triggered on every shutdown.

Ingest

$ wget -qO- http://localhost:8343/api/v1/ingest --user admin --ask-password --post-data \
  '{"source":"https://my.url/to/oai","method":"oai-pmh","format":"mods","steward":"https://my.url/identifying/steward"}'

NOTE: A (database) backup is automatically triggered after every ingest.

Backup Manually

$ wget -qO- http://localhost:8343/api/v1/backup --user admin --ask-password --post-data=

NOTE: A custom backup location can alternatively also be specified inside the compose.yaml.

Logs

$ docker compose logs backend --no-log-prefix

NOTE: For better readability the log output can be piped through grep -E --color '^([^\s]*)\s|$ highlighting the text before the first whitespace, which corresponds to the log-level in the DatAasee logs.

Update

$ docker compose pull
$  DB_PASS=password1 DL_PASS=password2 docker compose up -d

NOTE: “Update” means: if available, new images of the same DatAasee version but updated dependencies will be installed, whereas “Upgrade” means: a new version of DatAasee will be installed.

Upgrade

$ docker compose down
$  DB_PASS=password1 DL_PASS=password2 DL_VERSION=0.3 docker compose up -d

NOTE: docker compose restart cannot be used here because environment variables (such as DL_VERSION) are not updated when using restart.

NOTE: Make sure to put the DL_VERSION variable also into the .env file for a permanent upgrade.

Reset

$ docker compose restart

NOTE: A reset may become necessary if, for example, the backend crashes during an ingest; a database backup is created during a reset, too.

Web Interface (Prototype)

NOTE: The default port for the web frontend is 80 for a production deployment and 8000 in the development environment.

Index Screenshot

List Screenshot

Filter Screenshot

Query Screenshot

Overview Screenshot

About Screenshot

Fetch Screenshot

Insert Screenshot

Admin Screenshot

API Indexing

Add the JSON object below to the apis array in your global apis.json:

{
  "name": "DatAasee API",
  "description": "The DatAasee API enables research data search and discovery via metadata",
  "keywords": ["Metadata"],
  "attribution": "DatAasee",
  "baseURL": "http://your-dataasee.url/api/v1",
  "properties": [
    {
      "type": "InterfaceLicense",
      "url": "https://creativecommons.org/licenses/by/4.0/"
    },
    {
      "type": "x-openapi",
      "url": "http://your-dataasee.url/api/v1/api"
    }
  ]
}

For FAIRiCat, add the JSON object below to the linkset array:

{
  "anchor": "http://your-dataasee.url/api/v1",
  "service-doc": [
    {
      "href": "http://your-dataasee.url/api/v1/api",
      "type": "application/json",
      "title": "DatAasee API"
    }
  ]
}

References

In this section technical descriptions are summarized.

Overview:

HTTP-API

The HTTP-API is served under http://<your-url-here>/api/v1 (see DL_BASE) and provides the following endpoints:

Method Endpoint Type Summary
GET /ready system Returns service readiness
GET /api system Returns API specification and schemas
GET /schema metadata Returns database schema
GET /attributes metadata Returns enumerated attributes
GET /stats data Returns metadata record statistics
GET /sources data Returns ingested metadata sources
GET /metadata data Returns queried metadata record(s)
POST /insert data Inserts single metadata record
POST /ingest system Triggers ingest from metsadata source
POST /backup system Triggers database backup
POST /health system Returns service liveness
GET /export data TODO:

For details see the associated OpenAPI definition and api.csv.

NOTE: The base path for all endpoints is /api/v1.

NOTE: All GET requests are unchallenged, all POST requests are challenged, which are handled via “Basic Authentication”.

NOTE: All request and response bodies have content type JSON, and if provided, the Content-Type HTTP header must be application/json or application/vnd.api+json!

NOTE: As the metadata-lake’s data is metadata, a type “data” means metadata, and a type “metadata” means metadata about metadata (global metadata).

NOTE: Responses follow the JSON:API format.

NOTE: The id property is the server’s Unix timestamp.


/ready Endpoint

Returns boolean answering if service is ready.

NOTE: The ready endpoint can be used as readiness probe.

Status:

Example:

Get service readiness:

$ wget -qO- http://localhost:8343/api/v1/ready

/api Endpoint

Returns OpenAPI specification (without parameters), or request and response schema.

NOTE: In case of a successful request, the response is NOT in the JSON:API format, but the requested JSON file directly.

Statuses:

Examples:

Get OpenAPI definition:

$ wget -qO- http://localhost:8343/api/v1/api

Get ingest endpoint request schema:

$ wget -qO- http://localhost:8343/api/v1/api?request=ingest

Get metadata endpoint response schema:

$ wget -qO- http://localhost:8343/api/v1/api?response=metadata

/schema Endpoint

Returns internal metadata schema.

Statuses:

Example:

Get native metadata schema:

$ wget -qO- http://localhost:8343/api/v1/schema

/attributes Endpoint

Returns list of enumerated attribute values.

Statuses:

Example:

Get all enumerated attributes:

$ wget -qO- http://localhost:8343/api/v1/attributes

Get “languages” enumerated attributes:

$ wget -qO- http://localhost:8343/api/v1/attributes?type=languages

/stats Endpoint

Return statistics about records.

Statuses:

Example:

$ wget -qO- http://localhost:8343/api/v1/stats

/sources Endpoint

Return ingested sources.

Statuses:

Example:

$ wget -qO- http://localhost:8343/api/v1/sources

/metadata Endpoint

Fetch from, search, filter or query metadata record(s). Four modes of operation are available:

Paging via page is supported only for the source query and the combined full-text and filter search, sorting via newest only for the latter.

NOTE: Only idem-potent read operations are permitted in custom queries.

NOTE: This endpoint’s responses includes pagination links, except for custom queries.

NOTE: For searches without id and query, a maximum of 20 results are returned; for by-source and custom queries using query a maximum of 100 results are returned.

NOTE: An explicitly empty source parameter (i.e. source=) implies all sources.

NOTE: A full-text search always matches for all argument terms (AND-based) in titles, descriptions and keywords in any order, while accepting * as wildcards and _ to build phrases.

Statuses:

Examples:

Get record by record identifier:

$ wget -qO- http://localhost:8343/api/v1/metadata?id=

Search records by single filter:

$ wget -qO- http://localhost:8343/api/v1/metadata?language=chinese

Search records by multiple filters:

$ wget -qO- http://localhost:8343/api/v1/metadata?resourcetype=book&language=german

Search records by full-text for word “History”:

$ wget -qO- http://localhost:8343/api/v1/metadata?search=History

Search records by full-text and filter, oldest first:

$ wget -qO- http://localhost:8343/api/v1/metadata?search=Geschichte&resourcetype=book&language=german&newest=false

Search records by custom SQL query:

$ wget -qO- http://localhost:8343/api/v1/metadata?language=sql&query=SELECT%20FROM%20metadata%20LIMIT%2010

List the second page of records from all sources:

$ wget -qO- http://localhost:8343/api/v1/metadata?source=&page=1

/insert Endpoint

Inserts and parses, if necessary, a new record into the database.

NOTE: This endpoint is meant for metadata records that are not ingestible like a report of ingested sources; general use is discouraged. For details on the request body, see the associated JSON schema.

Status:

Example:

Insert record with given fields: TODO:

$ wget -qO- http://localhost:8343/api/v1/insert --user admin --ask-password --post-file=myinsert.json

/ingest Endpoint

Trigger ingest from data source.

NOTE: To test if the server is busy, send an empty (POST) body to this endpoint. HTTP status 200 means here available, status 503 means currently ingesting.

NOTE: The method and format are case-sensitive.

Status:

Example:

Start ingest from a given source:

$ wget -qO- http://localhost:8343/api/v1/ingest --user admin --ask-password --post-data \
  '{"source":"https://datastore.uni-muenster.de/oai", "method":"oai-pmh", "format":"datacite", "steward":"forschungsdaten@uni-muenster.de"}'

/backup Endpoint

Trigger database backup.

NOTE: The backup location can be set through the DL_BACKUP environment variable.

Status:

Example:

$ wget -qO- http://localhost:8343/api/v1/backup --user admin --ask-password --post-data=''

/health Endpoint

Returns internal status and versions of service components.

NOTE: The health endpoint can be used as liveness probe.

Status:

Example:

Get service health:

$ wget -qO- http://localhost:8343/api/v1/health --user admin --ask-password --post-data=''

/export Endpoint

TODO:

Ingest Protocols

Ingest Encodings

Currently, XML (eXtensible Markup Language) is the sole encoding for ingested metadata, with the exception of ingesting via the DatAasee protocol, which uses JSON (JavaScript Object Notation).

Ingest Formats

Native Schema

The main type of the metadatalake database is metadata vertex type with the following properties:

Key Class Entry Internal Type Constraints Comment
schemaVersion Process Automatic Integer    
recordId Process Automatic String    
metadataChecksum Process Automatic String    
metadataQuality Process Automatic String    
dataSteward process Automatic String max 4095  
source Process Automatic String max 4095  
createdAt Process Automatic Datetime    
           
metadataFormat Technical Automatic String max 255  
sizeBytes Technical Automatic Integer min 0  
dataFormat Technical Automatic String max 255  
dataLocation Technical Automatic String max 4095, regexp  
           
numberViews Social Automatic Integer min 0  
keywords Social Optional String max 255 Comma separated
categories Social Optional List(String) max 4 Pass array of strings to API, returned as array of strings form API
           
name Descriptive Mandatory String max 255  
creators Descriptive Mandatory List(pair) max 255 Pass array of pair objects (name, identifier) to API
publisher Descriptive Mandatory String max 255  
publicationYear Descriptive Mandatory Integer min -9999, max 9999  
resourceType Descriptive Mandatory Link(pair) resourceTypes Pass string to API, returned as string from API
identifiers Descriptive Mandatory List(pair) max 255 Pass array of pair objects (type, identifier) to API
           
synonyms Descriptive Optional List(pair) max 255 Pass array of pair objects (type, title) to API
language Descriptive Optional Link(pair) languages Pass string to API, returned as string from API
subjects Descriptive Optional List(pair) max 255 Pass array of pair objects (name, identifier) to API
version Descriptive Optional String max 255  
license Descriptive Optional Link(pair) licenses Pass string to API, returned as string from API
rights Descriptive Optional String max 65535  
fundings Descriptive Optional List(pair) max 255 Pass array of pair objects (project, funder) to API
description Descriptive Optional String max 65535  
message Descriptive Optional String max 65535  
externalItems Descriptive Optional List(pair) max 255 Pass array of pair objects (type, URL) to API
           
rawMetadata Raw Optional String max 2097151 Larger raw data is discarded

NOTE: See also the schema diagram: schema.md

NOTE: The properties related and visited are only for internal purposes and hence not listed here.

NOTE: The preloaded set of categories (see categories.csv) is highly opinionated.

Global Metadata

The metadata type has the custom metadata fields:

Key Type Comment
version Integer Internal schema version (compare against schemaVersion)
comment String Database comment

Property Metadata

Each schema property has a label, additionally the descriptive properties have a comment property.

Key Type Comment
label String For UI labels
comment String For UI helper texts

pair Documents

A helper document type used for creators, identifiers, synonyms, subjects, fundings, externalItems link targets or list elements.

Property Type Constraints
name String max 255
data String max 4095, regexp

Interrelation Edges

Type Comment
isRelatedTo Base edge type
isNewVersionOf Derived from isRelatedTo
isDerivedFrom Derived from isRelatedTo
isPartOf Derived from isRelatedTo
commonExpression Derived from isRelatedTo
commonManifestation Derived from isRelatedTo

Edge Metadata

Key Type Comment
label String For UI labels (outbound edge)
altlabel String For UI labels (incoming edge)

Ingestable to Native Schema Crosswalk

TODO: Add sub elements

DatAasee DataCite DC LIDO MARC MODS
name titles.title title descriptiveMetadata.objectIdentificationWrap.titleWrap.titleSet 245, 130 titleInfo.title, titleInfo.partName, titleInfo.partNumber, part.text, part.detail.title, part.detail.caption
creators creators.creator creator   100, 700 name, relatedItem
publisher publisher publisher   260, 264 originInfo.publisher
publicationYear publicationYear date descriptiveMetadata.eventWrap.eventSet 260, 264 originInfo.dateIssued, originInfo.dateCreated, originInfo.dateCaptured, originInfo.dateOther, part, recordInfo
resourceType resourceType type descriptiveMetadata.objectClassificationWrap.objectWorkTypeWrap.objectWorkType 007, 337 genre, typeOfResource
identifiers identifier, alternateIdentifiers.alternateIdentifier identifier objectPublishedID 001, 020, 856 identifier, recordInfo.recordIdentifier
synonyms titles.title title descriptiveMetadata.objectIdentificationWrap.titleWrap.titleSet 210, 222, 240, 242, 246, 247 titleInfo.title, titleInfo.subTitle
language language language   008, 041 language.languageTerm
subjects subjects.subject   category.Concept 655, 689 subject.topic, subject.geographic, subject.genre, subject.temporal, subject.occupation
version version     250 originInfo.edition
license rightsList.rights       accessCondition
rights   rights administrativeMetadata.rightsWorkWrap.rightsWorkSet 506, 540 accessCondition
fundings fundingReferences.fundingReference        
description descriptions.description description descriptiveMetadata.objectIdentificationWrap.objectDescriptionWrap.objectDescriptionSet 520 abstract
message       500 note
externalItems relatedIdentifiers.relatedIdentifier related     identifier
           
keywords subjects.subject subject category.term    
           
dataLocation identifier source      
dataFormat formats.format format      
sizeBytes          
           
isRelatedTo relatedItems.relatedItem, relatedIdentifiers.relatedIdentifier related   773 relatedItem
isNewVersionOf relatedItems.relatedItem, relatedIdentifiers.relatedIdentifier       relatedItem
isDerivedFrom relatedItems.relatedItem, relatedIdentifiers.relatedIdentifier       relatedItem
isPartOf relatedItems.relatedItem, relatedIdentifiers.relatedIdentifier       relatedItem
CommonExpression         relatedItem
CommonManifestation         recordInfo

Query Languages

Language Identifier Documentation
SQL sql ArcadeDB SQL
Cypher cypher Neo4J Cypher
GraphQL graphql GraphQL Spec
Gremlin gremlin Tinkerpop Gremlin
MQL mongo Mongo MQL
     
SPARQL sparql SPARQL (WIP)

Runtime Configuration

The following environment variables affect DatAasee if set before starting.

Symbol Value Meaning
TZ CET (Default) Timezone of database and backend servers
DL_PASS password1 (Example) DatAasee password (use only command local!)
DB_PASS password2 (Example) Database password (use only command local!)
DL_VERSION 0.3 (Example) Requested DatAasee version
DL_BACKUP $PWD/backup (Default) Path to backup folder
DL_USER admin (Default) DatAasee admin username
DL_BASE http://my.url (Example) Outward DatAasee base URL (including protocol and port, but no trailing slash)
DL_PORT 8343 (Default) DatAasee API port
FE_PORT 8000 Web Frontend port (development default 8000, release default 80)

Tutorials

In this section learning-oriented lessons for new-comers are given.

Overview:

Getting Started

  1. Setup compatible compose orchestrator
  2. Download DatAasee release
     $ wget https://raw.githubusercontent.com/ulbmuenster/dataasee/0.3/compose.yaml
    

    or:

     $ curl https://raw.githubusercontent.com/ulbmuenster/dataasee/0.3/compose.yaml
    
  3. Create or mount folder for backups (assuming your backup volume is mounted under /backup on the host in case of mount)
     $ mkdir -p backup
    

    or:

     $ ln -s /backup backup
    
  4. Start DatAasee service, note the space in front of the command excluding it from the terminal history.
     $  DB_PASS=password1 DL_PASS=password2 docker compose up -d
    

    or:

     $  DB_PASS=password1 DL_PASS=password2 podman compose up -d
    

Now, if started locally point a browser to http://localhost:8000 to use the web frontend, or send requests to http://localhost:8343/api/v1/ to use the HTTP API directly, for example via wget or curl.

Example Ingest

For demonstration purposes the collection of the “Directory of Open Access Journals” (DOAJ) is ingested. An ingest has four phases: First, the administrator needs to collect the necessary information of the metadata source, i.e. URL, protocol, format, and data steward. Second, the ingest is triggered via the HTTP-API. Third, the backend ingests the metadata records from the source to the database. Fourth and lastly, the ingested data is interconnected inside the database.

  1. Check the documentation of DOAJ:
     https://doaj.org/docs
    

    The oai-pmh protocol is available.

  2. Check the documentation about OAI-PMH:
     https://doaj.org/docs/oai-pmh/
    

    The OAI-PMH endpoint URL is: https://doaj.org/oai.

  3. Check the OAI-PMH for available metadata formats:
     https://doaj.org/oai?verb=ListMetadataFormats
    

    A compatible metadata format is oai_dc.

  4. Start an ingest:
     $ wget -qO- http://localhost:8343/api/v1/ingest --user admin --ask-password --post-data \
       '{"source":"https://doaj.org/oai", "method":"oai-pmh", "format":"oai_dc", "steward":"helpdesk@doaj.org"}'
    

    A status 202 confirms the start of the ingest. Here, no steward is listed in the DOAJ documentation, thus a general contact is set. Alternatively, the “Ingest” form of the “Admin” page in the web frontend can be used.

  5. DatAasee reports the start of the ingest in the backend logs:
     $ docker logs dataasee-backend-1
    

    with a message akin to: Starting ingest from https://doaj.org/oai via oai-pmh as oai_dc..

  6. DatAasee reports completion of the ingest in the backend logs:
     $ docker logs dataasee-backend-1
    

    with a message akin to: Finished ingest of 21424 records from https://doaj.org/oai after 0.1h..

  7. DatAasee starts interconnecting the ingested metadata records:
     $ docker logs dataasee-database-1
    

    with a message akin to: Interconnect Started!.

  8. DatAasee finishes interconnecting the ingested metadata records:
     $ docker logs dataasee-database-1
    

    with a message akin to: Interconnect Completed!.

NOTE: The interconnection is a potentially long-running, asynchronous operation, whose status is only reported in the database logs.

NOTE: Generally, the ingest methods OAI-PMH for suitable sources, S3 for multi-file sources, and GET for single-file sources should be used.

Example Harvest TODO:

A typical use-case for DatAasee is to forward all metadata records from a specific source. To demonstrate this, the previous Example Ingest is assumed to have happened.

  1. Check the ingested sources
     $ wget http://localhost:8343/api/v1/sources
    
  2. Request the first set of metadata records from source https://doaj.org/oai (the source needs to be URL encoded):
     $ wget http://localhost:8343/api/v1/metadata?source=https%3A%2F%2Fdoaj.org%2Foai
    

    At most 100 records are returned. For the first page, also the parameter page=0 may be used.

  3. Request the next set of metadata records via pagination:
     $ wget http://localhost:8343/api/v1/metadata?source=https%3A%2F%2Fdoaj.org%2Foai&page=1
    

    The last page will contain less than 100 records, all pages before contain 100 records.

NOTE: Using the source filter, the full record is returned, instead of a search result when used without.

Secret Management

Two secrets need to be managed for DatAasee, the database root password and the backend admin password. To protect these secrets on a host running docker(-compose), for example, the following tools can be used:

sops

$ printf "DB_PASS=password1\nDL_PASS=password2" > secrets.env
$ sops encrypt -i secrets.env
$ sops exec-env secrets.env 'docker compose up -d'

consul & envconsul

$ consul kv put dataasee/DB_PASS password1
$ consul kv put dataasee/DL_PASS password2
$ envconsul -prefix dataasee docker compose up -d

env-vault

$ EDITOR=nano env-vault create secrets.env
$ env-vault secrets.env docker compose -- up -d

openssl

$  printf "DB_PASS=password1\nDL_PASS=password2" | openssl aes-256-cbc -e -a -salt -pbkdf2 -in - -out secrets.enc
$ (openssl aes-256-cbc -d -a -pbkdf2 -in secrets.enc -out secrets.env; docker compose up -d --env-file .env --env-file secrets.env; rm secrets.env)

Container Engines

DatAasee is deployed via a compose.yaml (see How to deploy), which is compatible to the following container and orchestration tools:

Docker-Compose (Docker)

Installation see: docs.docker.com/compose/install/

$ docker compose up -d
$ docker compose ps
$ docker compose down

Docker-Compose (Podman)

Installation see: docs.docker.com/compose/install/

NOTE: Alternatively the package podman-docker can be used to emulate docker through podman.

NOTE: The compose implementation podman-compose is not compatible at the moment.

$ podman compose up -d
$ podman compose ps
$ podman compose down

Kompose (Minikube)

Installation see: kompose.io/installation/

Rename compose.yaml to compose.txt and run:

$ kompose -f compose.txt convert
$ minikube start
$ kubectl create secret generic dataasee --from-literal=database=password1 --from-literal=datalake=password2
$ kubectl apply -f .
$ kubectl port-forward service/backend 8343:8343  # now the backend can be accessed via `http://localhost:8343/api/v1`
$ minikube stop

Container Probes

The following endpoints are available for monitoring the respective containers; here the compose.yaml host names (service names) are used. Logs are written to the standard output.

Backend

Ready:

http://backend:4195/ready

returns HTTP status 200 if ready, see also Benthos ready.

Liveness:

http://backend:4195/ping

returns HTTP status 200 if live, see also Benthos ping.

Metrics:

http://backend:4195/metrics

allows Prometheus scraping, see also Connect prometheus.

Database

Ready:

http://database:2480/api/v1/ready

returns HTTP status 204 if ready, see also ArcadeDB ready.

Frontend

Ready:

http://frontend:3000

returns HTTP status 200 if ready.

Custom Queries

NOTE: All custom query results are limited to 100 items.

SQL

DatAasee uses the ArcadeDB SQL dialect. For custom SQL queries, only single, read-only queries are admissible, meaning:

The vertex type (cf. table) holding the metadata records is named metadata.

Examples:

Get the schema:

SELECT FROM schema:types

Get one-hundred metadata record titles:

SELECT name FROM metadata

Gremlin TODO:

DatAasee supports a subset of Gremlin.

Get one-hundred metadata record titles:

g.V().hasLabel("metadata")

Cypher

DatAasee supports a subset of OpenCypher. For custom Cypher queries, only read-queries are admissible, meaning:

Examples:

Get labels:

MATCH (n) RETURN DISTINCT labels(n)

Get one-hundred metadata record titles:

MATCH (m:metadata) RETURN m

MQL TODO:

DatAasee supports a subset of a MQL as JSON queries.

Examples:

Get one-hundred metadata record titles:

{ 'collection': 'metadata', 'query': { } }

GraphQL TODO:

SPARQL TODO:

Custom Frontend

Remove Prototype Frontend

Remove the YAML object "frontend" in the compose.yaml (all lines below ## Frontend # ...).


Appendix

In this section development-related guidelines are gathered.

Overview:

Dependency Docs:

Development Decision Rationales:

Infrastructure

Database

Backend

Frontend

Development Workflows

Development Setup

  1. git clone https://github.com/ulbmuenster/dataasee && cd dataasee (clone repository)
  2. make setup (builds container images locally)
  3. make start (starts development setup)

Compose Setup

Dependency Updates

  1. Dependency documentation
  2. Dependency versions
  3. Version verification (Frontend only)

Schema Changes

  1. Schema definition
  2. Schema documentation
  3. Schema implementation

API Changes

  1. API definition
  2. API architecture
  3. API documentation
  4. API implementation
  5. API rendering
  6. API testing

Dev Monitoring

Coding Standards