DatAasee (0.5)
DatAasee centralizes and interlinks distributed library/research metadata into an API‑first union catalog.

A Metadata-Lake for Libraries
Repository: github.com/ulbmuenster/dataasee (nb sources backup)
Maintainer: Christian Himpe (at University and State Library of Münster)
Licenses: MIT (add. CC-BY for openapi.yaml)
Function: Metadata-Lake, Metadata Catalog, Metadata Aggregator, Union Catalog
Audience: University Libraries, Research Libraries, Academic Libraries, Scientific Libraries
Documentation
- Dependencies Overview
- Software Documentation
- Architecture Documentation
- Database Schema
- OpenAPI Schema (Swagger UI)
DatAasee: A Metadata-Lake as Metadata Catalog for a Virtual Data-Lake (Companion Paper, Open Access)
Getting Started (Deployment)
Quick Start (Prepare a dedicated directory, inside run:)
$ wget https://raw.githubusercontent.com/ulbmuenster/dataasee/0.5/compose.yaml
$ mkdir -p -m 766 backup
$ DL_PASS=password1 DB_PASS=password2 docker compose up
Web: http://localhost:8000 (API: http://localhost:8343/api/v1/ )
- Depends on
docker compose(and compatible todockerandpodman) - To deploy, no need to clone, just use the
compose.yamlfile. - See the Deploy Documentation for details.
Tech Stack Canvas
- Setting: Many distributed data and metadata sources
- Goals:
- Centralize metadata
- Interlinked metadata catalog
- Super-index for bibliographic and research data
- Features:
- Interact through HTTP-API (JSON)
- Search by filter, full-text, source, doi
- Custom query via:
SQL,Gremlin,Cypher,MQL,GraphQL
- Frontend: Lowdefy (Optional)
- Backend: Connect (fmr. Benthos)
- Data Storage: ArcadeDB (Graph Database)
- Infrastructure: Compose (via Docker or Podman)
- Deployment: via Harbor (at Uni Münster)
- Monitoring: Container Logs (local logging driver)
- Integrations:
- Protocols:
OAI-PMH(HTTP),S3(HTTP),GET(HTTP),DatAasee(HTTP) - Encodings:
XML(Plain-Text) - Formats:
DataCite(XML),DC(XML),LIDO(XML),MARC(XML),MODS(XML)
- Protocols:
- Exports:
DataCite(JSON),BibJSON(JSON) - Security: Privileged endpoints (CQRS)
- Testing: check-jsonschema
- Development: Github
Default Ports
8343DatAasee API8000Web Frontend2480Database API (Development Container Images Only)9999Database JMX (Development Container Images Only)
API Cheat Sheet
GETapi/v1/apiReturns API specification and schemas.GETapi/v1/readyReturns service readiness.GETapi/v1/metadataReturns queried metadata records.GETapi/v1/sourcesReturns ingested metadata sources.GETapi/v1/schemaReturns database schema.GETapi/v1/enumsReturns enumerated attributes.GETapi/v1/statsReturns metadata record statistics.POSTapi/v1/backupTriggers database backup.POSTapi/v1/ingestTriggers async ingest of metadata.POSTapi/v1/insertInserts single metadata record.POSTapi/v1/healthProbes and returns service liveness.
Repository Contents
api/API definition and message schemasassets/Logos and style definitionbackend/Processor pipeline and component definitionscontainer/Dockerfilesdatabase/Database initialization, schemas and enumerated datadocs/Documentation of software, data and architecturefrontend/Prototype frontend definitiontests/Test definitions and data
Getting Started (Development)
- Available
maketargets:make setupBuild server images (builds development images)make startStart serversmake stopStop serversmake resetStop and start serversmake buildBuild release images (passREGISTRY=to set container image registry)make emptyDelete database backupsmake logsShow logs (requiresgrep)make peakReport peak database memory usage (requiresgrep)make testRun tests (requirescheck-jsonschema,busybox,wget)make tidyList violations of StrictYAML (requiresyamllint)make todoList inline TODOs in repo (requiresgrep)
- Custom
makevariable:COMPOSE(set Compose implementation)
Contributors
tl;dr
DatAasee is centralized Metasearch for distributed Metadata.