Research

© Quelle: Christian Malsch / LUH
Quelle: Christian Malsch / LUH

Efficient and scalable methods for the integration of large amounts of data as well as knowledge representation and discovery are central challenges of the research program of the Scientific Data Management Research Group. The developed applications are used in various domains (especially biomedicine and digital libraries) to turn heterogeneous data into usable knowledge.

The research plan includes the development of state-of-the-art infrastructures for managing heterogeneous scientific data, extracting knowledge from these data, and developing new relationships and patterns. These infrastructures facilitate the integration and analysis of large and complex data sets into scientific knowledge graphs and facilitate the cooperation of all actors in value-added chains around scientific data. The challenges that the research group is working on include:

  • Knowledge graphs that not only encode the meaning and connections of scientific data, but also contain knowledge about provenance, privacy, quality, and uncertainty.
  • Domain-specific ontologies and link discovery techniques are capable of promoting the interoperability of heterogeneous and large scientific data sets in a scalable manner.
  • Integration methods for heterogeneous and extensive scientific data sources, e. g. legacy, structured and unstructured data, static data, and continuous data streams.
  • Storage and distribution of extensive scientific data and knowledge graphs.
  • Access control methods to enforce privacy regulations for sensitive data. 
  • Federated query engines for scientific knowledge graphs.
  • Data analysis and methods of knowledge discovery through scientific knowledge graphs. 

The developed infrastructure components are evaluated on the basis of various data sets. Scientific data from publications archived in the TIB's databases (e. g. via RADAR or DataCite) are particularly suitable for this purpose. Scientists will be able to use the developed scientific data management infrastructures to sustainably increase the effectiveness and productivity of their research work.