WissKI aims to become and already is the Swiss army knife for scholars from diverse disciplines that deal with object-centric documentation. It implements new Semantic Web methods for data acquisition, storage and re-use by enhancing it with semantic information. Its flexible data model extends to domain or project-specific vocabulary and ensures common interpretability across disciplines.
WissKI makes your institutional or project knowledge fit for web-based publication, long-term reuse and transdisciplinary research.
Read more about selected topics:
- WissKI as a resource
- Create content
- Semantic backend
- Ontology layer cake
- System architecture
- Name authorities
Research projects in academia or memory institutions (museums, archives, libraries and other cultural heritage institutions) produce extensive quantities of high quality digital data. However, the used methods and techniques for storing and managing these data and information are often made according to ad-hoc short-term project-specific decisions and thus do not satisfy today's needs regarding quality, standard-conformance, and re-use of data.
Existing software solutions for digital knowledge repositories like Fedora, DSpace or Greenstone are hardly used, because they are simply not known, do not meet specific requirements of a project or require too much technical knowledge or manpower for installation, use and maintenance. Instead, simple solutions like Word files, Excel sheets or simple databases created with Access or Filemaker are easily adjusted to the needs of the project. In other words, techniques that are intended for use on an individual desktop PC are used in distributed environments, leading to systems that do not allow collaborative research. At the end the generated digital information typically resides on a hard disc. After some years, the situation faced is a two-fold problem: First, you have to be able to read the digital information in a technical way, i.e. by still supporting old and/or proprietary file formats. Second, you have to be able to understand the information in regard to its content structure. Often, missing documentation makes it very hard to reconstruct the original meaning and purpose of the given data. Therefore, persistence of the digital data is not provided. In the past years, this situation has led to gigantic data silos — often also called digital cemeteries — containing knowledge of past research buried away from current research due to the lack of technical and contential accessability and interpretability.
One of WissKI's driving force is to remedy this defect and provide means to easily create data that can be accessed, used, enhanced, and evaluated long after their live-giving project is gone.
WissKI as a resource
The goal of the WissKI project is to apply the concept of Wikis to the scientific domain and to support transdisciplinary collaboration between scientists and researchers from various domains, enabling them to learn about results of research and work together on research topics of common interest. In general, the system supports scientific communication and a new way of documentation in memory institutions, provides long-term availability and interpretability of research results, assures the identity of authorship and the authenticity of information, enables the persistence of citations, offers quality management tools, and support the preparation of scientific publication. Scholarly knowledge can be communicated using the platform, where it is stored and can be reused to follow scientific discussions and to prepare publications.
A unique aspect of WissKI in comparison to other projects dealing with Wikis in a scientific context is its primary focus on the combination of data from diverse scholarly disciplines like the humanities as well as social and natural sciences. Through the involvement of the GNM and the ZFMK the system supports scientific documentation of museum collections. The curators themselves are enabled to act as digital knowledge curators and therefore to handle their knowledge about digital objects the same way as they handle their knowledge about real-life objects. To comply with current and future standards of museum documentation and to assure the unity of names and description terms in use, WissKI can dynamically incorporate globally accepted authority files and enables the creation and management of local authority files. One idea of Wikis was to support non-computer scientists creating content to be published in the World Wide Web. The well-known Wikipedia project successfully proved that millions of articles can be created and maintained by non-technical users. This demonstrates that Wikis provide a good platform for creating and sharing information. WissKI is a common platform allowing non-technical users the creation of content for the World Wide Web as well as the creation of enriched content for the semantic web.
To reach a maximum of flexibility of the data structure and simultaneously to assure the homogeneity and processibility of the data WissKI implements a storage facility built entirely on semantic web technologies such as technical implementations of formal ontologies. This enables us to create a new way of scientific workflow and content management. It is also the basis of a semantic text annotation mechanism that lets users submit free text to add structured information to the system. The text is analysed to connect mentioned entities (names, places, dates etc.) to the system's knowledge base. Overall the system serves as a communication platform for curated knowledge.
Researchers of different scientific domains record data in different ways. The WissKI system already supports two ways of creating semantically enriched content: First entering data in a traditional web-form and second the aggregation of data in free texts. These two possibilities were implemented because of the assumption that some scientists are used (and like) to record data by entering data manually in a database with a form-based interface. This data is normally high potential input for scientific discussions and work, however hardly available in the semantic web. Moreover scientists communicate and publish data by writing scientific texts (papers, articles, books, etc.). This specific scientific content needs a clear identity of authorship, the authenticity of information and persistence of citations. Different modules of the WissKI system allow these requirements to be reached by using well known techniques from content management systems and other web based knowledge management approaches. The WissKI project additionally enables support for semi-automatic connections between content. Such connections, for example, allow researchers to create inter- and transdisciplinary content, and to verify their results with facts from other research communities without large expense.
The core of the WissKI system is the "pathbuilder". This tool allows the administrator to construct semantic definitions for the content creation of the system based on an uploadable ontology. These definitions are used to automatically construct the forms for data aggregation in the system. Moreover a special parser is developed by the AI department, which parses natural english and german language (texts) and creates instances of concepts of the application ontology according to these definitions. In the current state of development the parser recognises named entities like time and date phrases, person names and place names according to several dynamically incorporatable name authority files. Due to the fact that at the moment an automatic detection and computation of instances in natural language texts cannot have the same quality of annotations made by a human author, a WYSIWYG-Editor enhanced with capabilities of semi-automatic semantic markup is provided by the WissKI system. Annotated documents created with this WYSIWYG-Editor are themselves annotated semantically. Semi-automatic means, that the system automatically computes proposals for annotations, which the author can accept or revise.
The probably most important type of scientific communication is the discussion between researchers. The base CMS of WissKI provides several communication functionalities and modules like mailing(lists), forums, blogs, etc. Moreover the CMS provides the possibility to create and manage different usergroups. Therefore different roles and rights can be assigned to the users. E.g. the WissKI system provides a role that can be described as scientific project manager. This role decides for the concrete project which statements from a discussion are integrated in the description of an object, in the free texts, incorporated in the knowledge pool and if they can be published. This facilitates a cooperative preparation of scientific publications.
To reach the stated goal of the flexibility of data structures and support users creating semantic annotated scientific texts, the WissKI project developed an interface to integrate semantic back-ends like a triple store. Triple stores generally provide several interfaces for communication. The most common language for interaction with a triple store is SPARQL, which is a recommendation of the World Wide Web Consortium (W3C) since the end of 2008 and therefore is used in the WissKI software. The triple store builds the foundation of a middle-ware that is able to deal with the semantic base technology like ontologies consisting of concepts and properties and instances of them. Such ontologies can be defined in different common languages to serialize them in a machine-readable way. Ontologies in computer science are normally based on formal logics with well founded semantics.These are used to define, categorize, describe and infer knowledge in a formal way. We decided to use the Web Ontology Language (OWL), which is developed by the W3C. OWL1 became a recommendation in 2004, and a revised and extended OWL2 became a recommendation in 2009. OWL consists of three levels of expressiveness. OWL DL is the optimal language for this task in the WissKI project as it is a machine-processable and therefore "understandable" formalism with a maximum of expressiveness while retaining computability. The DL in OWL DL stands for Description Logics and OWL DL 1.0 is a syntactic equivalent to the DL SHOIN(D). Ontologies in OWL DL have a lot of advantages: The syntax is readable by computers and humans, the formally defined semantics give expressions a clear meaning and SHOIN(D) allows decidable inferences, but high expressivity, the computation of semantics supports modelling and therefore the creation of ontologies. There are already implemented tools like editors (e.g. Protégé), reasoners (e.g. Racer, Hermit, Pellet, Fact++) and other semantic web software. In contrast to other logic based systems, OWL follows the idea of open world semantics, which, for example, do not interpret unknown facts as false ones. Most of the web software with support for OWL DL uses the serialisation in RDF/XML syntax.
Ontology layer cake
The semantic back-end features the integration of ontologies. Thus the WissKI system is able to load nearly any ontology. However as the system is a communication platform for curated knowledge, a common top level ontology is needed for all WissKI Systems. The chosen top level ontology has been developed for more than ten years by a group of knowledge experts from museums, archives and libraries in conjunction with philosophers and computer scientists. This group is known as the CIDOC Conceptual Reference Model Special Interest Group (CRM SIG) which forms a Working Group of the International Committee for Documentation (CIDOC) as part of the International Council of Museums (ICOM). The ontology created by this group is called the CIDOC Conceptual Reference Model (CRM). The CRM claims to be a "formal ontology intended to facilitate the integration, mediation and interchange of heterogeneous cultural heritage information". Therefore the CRM is strongly recommended as a top level ontology for all WissKI systems to benefit from all features of WissKI.
A special feature of the CRM is its event-centric approach. Every aspect of a real world item is connected to an event. For instance a work of art is not simply "made" by an artist but it is the outcome of a so called "production event". The artist is a person in a specific role who participated in this event. Additional aspects that are normally assigned to the object itself, such as the creation date or the place of origin are connected with this event too. This approach offers the possibility to create a time-line of events (production, modification, destruction etc.) in the lifetime of an object and to attach dates and places to them. It additionally allows new connections between objects via special context information (weather, other people of importance, travelling information etc.) connected with their "production event". In 2006 Version 3.4.9 of the CRM has been accepted as ISO-standard 21127. Nowadays, it is often taken into account for developing information management systems in the cultural heritage domain. The current version 5.01 defines 86 entities (classes, concepts) and 137 properties (relations, roles), each of them explained by a scope note and illustrated with examples. The CRM SIG provides the ontology only as a paper document. In the run-up to the WissKI project the AI Department developed with its partners an OWL DL implementation of the CIDOC CRM called Erlangen CRM (ECRM). The ECRM incorporates all entities, properties, scope notes and examples of the CRM. In addition it defines cardinalities and constraints that are not explicitly defined in the CRM but mentioned in the scope notes.
Since the CRM/ECRM is an upper ontology, most concepts have an abstract character (e.g. E22 Man-Made-Object). To refine and specialise the ECRM concepts and relations, the project partners developed the System Ontology. The System Ontology extends the ECRM and provides more specialised concepts where the CRM is not specific enough for an implementation.
The System Ontology serves as an upper ontology for specific application ontologies that can be generated and extended by the users of WissKI. Therefore the WissKI system allows the users to add their project specific application ontology. The only precondition for the application ontologies to be used in the WissKI system is that they have to import and use the System Ontology as upper ontology. Several examples for domain specific ontologies will be given from the ZFMK and the GNM as use cases and as basis for the development of initial application ontologies.
Every WissKI system consists of four layers: The ontology layer, consisting of the layer for the reference ontologies and the layer for the application ontologies, the data layer for data storage and the authorities layer for name authorities. Each layer comes with an API with well documented interfaces for import and export.
The layers of the reference and application ontologies should be filled at the time when the WissKI system is installed. As stated, any ontology can be used as a reference ontology, but the use of the ECRM and the System Ontology is strongly recommended to benefit from all features of WissKI. The API current is able to import ontologies in OWL/XML, RDF/XML, N-Triples, Turtle, SPARQL + SPOG, Legacy XML, HTML tag soup, RSS 2.0 and Google Social Graph API JSON. Instance data can be exchanged using OWL-DL/XML or RDF/XML. Moreover the system supports LIDO as exchange format. The administrative information about the data such as its status or the rights involved is provided by the Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH) which serves as a data wrapper. Therefore every WissKI system is able to act as a data provider for every OAI-PMH harvesting institution e.g. the Europeana with hardly any additional effort by the user.
The preferred format for authority files is the Simple Knowledge Organization System (SKOS). SKOS is a common data model for sharing and linking knowledge organization systems such as thesauri using RDF. It is developed by the World Wide Web Consortium (W3C), and the specifications are currently in the stage of a candidate recommendation. Because it is a new standard not many thesauri are available in this format. For example the mentioned Getty thesauri and lists are provided as a proprietary XML-format. So as a byproduct we developed a tool to convert this format to SKOS. Since authority files can be huge, it is not feasible to use the OAI-PMH and to import them as a whole. One possibility is to access those files via a REST-API where only the requested data is submitted. The Getty Research Institute is currently working on a Web-API for their lists and thesauri. So the import and export mechanisms of WissKI support widely adopted standards and based on standardised vocabulary. Data exchange is not only possible between a WissKI installation and other systems but also between multiple installations of the WissKI system. Through the underlying ontology knowledge can be seamlessly integrated.
In the future a single WissKI installation should also be able to communicate automatically using these interfaces. This will enable the WissKI users to share digital objects, information and knowledge across the local system borders to support inter- and transdisziplinary research.
Name authorities and controlled vocabularies are lists or otherwise organized sets of terms and names for different kinds of things. They typically serve for standardized reference and look-up lexica. WissKI can be coupled with almost arbitrary name authorities, thus enabling the user to reuse foreign datasets. It is WissKI's standard way of linking ones own data with the world of Linked Open Data.
Name authorities are also a key enabler for better data hygiene. As such, WissKI makes use of name authorities to support the user in the data creation process. After the integration of a name authority in the WissKI system the values of the name authority are used as normative lists, values for the autocompletion and values for the automatic semantic annotation in the WYSIWYG editor. By circumventing typos and spelling variants, the system ensures high quality of data and easier linkage and retrieval. However the cultural heritage domain has only a few globally accepted authority files which consider multilingualism at least in parts. Some of the best known authority files are maintained by the Getty Research Institute. The Union List of Artist Names (ULAN) is a list of around 293,000 names of artists worldwide. The Art & Architecture Thesaurus (AAT) is a structured vocabulary containing around 131,000 terms and other information about concepts. It is currently available in dutch, chinese and german. The Thesaurus of Geographic Names (TGN) contains around 1,106,000 names of places together with informations about historical place names or place names in different languages. Currently those authorties are partly imported and used inside the system. Taking into account such normative content for creating data is an important prerequisite for knowledge exchange.
If new names are gathered in a project, WissKI supports the creation and export of local name authorities consisting of the data stored in the system. These local name authorities are handled just like the global name authorities. The API supports an easy export for providing such name authorities on the internet for other, similar projects, partners or transdisciplinary work.