DatAasee (0.9)
DatAasee centralizes and interlinks distributed library / research metadata into an API‑first union catalog.

A Metadata-Lake for Libraries
Repository: github.com/ulbmuenster/dataasee (NB sources backup)
Maintainer: Christian Himpe (at University and State Library of Münster)
Licenses: MIT (add. CC-BY for openapi.yaml)
Function: Metadata-Lake, Metadata Catalog, Metadata Aggregator, Union Catalog
Audience: University Libraries, Research Libraries, Academic Libraries, Scientific Libraries
DatAasee is currently in pilot stage and not production-ready yet.
Documentation
- Dependencies Overview
- Software Documentation
- Architecture Documentation
- Database Schema (YASQL)
- OpenAPI Schema (Swagger UI)
DatAasee: A Metadata-Lake as Metadata Catalog for a Virtual Data-Lake (Companion Paper, Open Access)
Getting Started (Test Deployment)
Quick Start (Prepare a dedicated directory, inside run:)
$ wget https://raw.githubusercontent.com/ulbmuenster/dataasee/0.9/compose.yaml
$ mkdir -p -m 777 backup
$ DL_PASS=password1 DB_PASS=password2 docker compose up
The DL_PASS environment variable passes the password for the admin user of
DatAasee which is required for the POST HTTP-API endpoints. The DB_PASS
environment variable passes the database root password used by the back-end.
Web: http://localhost:8000 (API: http://localhost:8343/api/v1/ )
- Depends on
docker compose(>=2.37), and is compatible withdockerandpodman. - To deploy, no need to clone, just use the
compose.yamlfile. - See the Deploy Documentation for details.
API Cheat Sheet
GETapi/v1/apiReturns API specification and schemas.GETapi/v1/readyReturns service readiness.GETapi/v1/schemaReturns database schema.GETapi/v1/metadataReturns metadata records.GETapi/v1/databaseReturns metadata queries.POSTapi/v1/healthReturns service liveness.POSTapi/v1/ingestTriggers async ingest of metadata.
Tech Stack Canvas
- Setting: Many distributed data and metadata sources
- Goals:
- Centralize metadata
- Interlinked metadata catalog
- Super-index for bibliographic and research data
- Features:
- Interact through HTTP API (
JSON) - Search by filter/facet, full-text, ingest-source, DOI
- Custom queries via:
SQL,OpenCypher,MQL,GraphQL,Redis
- Interact through HTTP API (
- Frontend: Lowdefy (Optional)
- Backend: Connect (formerly Benthos)
- Data Storage: ArcadeDB (Graph Database)
- Infrastructure: Compose (via Docker or Podman)
- Deployment: (Public) Container Images from Harbor (at Uni Münster)
- Monitoring: Container Logs (local logging driver)
- Integrations:
- Protocols:
OAI-PMH(HTTP),S3(HTTP),GET(HTTP),DatAasee(HTTP) - Encodings:
XML(Plain-Text) - Formats:
DataCite(XML),DC(XML),LIDO(XML),MARC(XML),MODS(XML)
- Protocols:
- Exports:
DataCite(JSON),BibJSON(JSON) - Security: Privileged endpoints
- Testing: check-jsonschema
- Development: Github
Repository Contents
api/API definition and message schemasassets/Logos and style definitionbackend/Processor pipeline and component definitionscontainer/Dockerfilesdatabase/Database initialization, schemas and enumerated datadocs/Documentation of software, data and architecturefrontend/Prototype frontend definitiontests/Test definitions and data
Getting Started (Development)
Local Development (After a git clone)
- Available
maketargets:make setupBuild development server container imagesmake startStart serversmake stopStop serversmake resetStop and start serversmake buildBuild release container images (passREGISTRY=to set registry)make emptyDelete database backupsmake logsShow backend processor logs (requiresgrep)make peakReport peak database memory usage (requiresgrep)make testRun tests (requirescheck-jsonschema,busybox,wget)make tidyList violations of StrictYAML (requiresyamllint)make todoList inline TODOs in repo (requiresgrep)
- Custom
makevariable:COMPOSE(set Compose implementation) - Open the development frontend in your browser for manual testing of the backend
Contributors
tl;dr
DatAasee provides centralized Metasearch for distributed Metadata.