Pundit community session at DARIAH-EU meeting

4th-dariaheu-general-vcc-meeting-71 On 17-19 September the DARIAH-EU network organises its fourth General VCC (Virtual Competency Centres) meeting in Rome. The DARIAH-EU infrastructure is focused on bringing together individual state-of-the-art digital Arts and Humanities activities across Europe. Their annual meeting provides opportunities to work together on topics within their Virtual Competency Centres (VCC) and to share experiences from various fields within the digital humanities.

This year, the DARIAH-EU general meeting will also guest specific community sessions alongside the general programme. We are happy to announce that DM2E has been selected to host a community session on the Pundit tool, the semantic annotation tool that is being developed as part of work package 3.

This community session, entitled Pundit, a semantic annotation tool for researchers will take place on Thursday 18 September from 11.15-13.00. The session aims at illustrating Pundit main features and components (the client, Feed, Ask) as well as showing how it has been used by specific scholarly communities in Philosophy, History of Art, Philology and History of Thought domains. Moreover, attendees will be practically introduced to Pundit through dedicated exercises thought to give them the first skills to produce semantic annotations on Digital Libraries and generic websites.

Attendance to this event is free: registration is possible through Eventbrite.

Mapping the “Polytechnisches Journal” to DM2E

Johann Gottfried Dingler, born 1778, was a German chemist and industrialist. He realised that the reporting on technological innovations was insufficient in his time. In 1820 he started to publish his “Polytechnisches Journal” on a monthly basis, which included scientific articles in the field of electrical technology, mining and chemical engineering, and the translation and discussions of European patent specifications. The journal is often referred to as “Dingler” and seen as a valuable resource:

“The journal was published over a period of 111 years and has hence became an important and European-wide source for the history of knowledge, culture, and technology — in Germany at least it is without compare” (Polytechnische Journal website).

The Humboldt-Universität zu Berlin provides the digitised edition of the “Polytechnisches Journal” (figure 1) which was created by the Institute for Cultural History and Theory at Humboldt-Universität zu Berlin in cooperation with Saxon State and University Library Dresden (SLUB). During the DM2E project, the Berlin School of Library and Information Science created a mapping from the metadata of the digitised journal to the DM2E model.

Figure 1: Screenshot from the digitised edition of the Polytechnische Journal

Facsimile: CC-BY-NC-ND 3.0 (SLUB Dresden), Text: CC-BY-SA 3.0 (HU Berlin)

The schema language used to describe the metadata is non modified TEI-P5 XML. The logical description of the records follows the recommendations of the TEI guidelines.

Background: The Metadata Format TEI

TEI stands for Text Encoding Initiative, which is a consortium for the contributed development of a standard metadata format for the representation of texts in digital form. The provided guidelines by the initiative are standard specifications for encoding methods for machine-readable texts. The TEI guidelines are widely used by libraries, museums, publishers and individual scholars to present texts for online research, teaching and preservation. The most recent version of the guidelines is TEI-P5.

The provided metadata for the mappings in DM2E came directly from the owner and creator of the records, the Institute for Cultural History and Theory. For the finalised version of the mapping to the DM2E model, DM2E got local copies of the last modified TEI-XML metadata records of the complete journal on volume and on article level.

The current mapping is based on the first test mappings which were carried out using the DM2E model v1.0 schema in MINT. Two different ore:Aggregation and edm:ProvidedCHO classes were created: one for a journal issue, another for a journal article. After the first mapping circle with MINT, which already included about two-thirds of the first mapping, further mapping steps were carried out by manually working on the MINT output (supported by the Oxygen editor). This was mainly done due to readability reasons (the output file was split up into different files for the creation of journal issues and articles), to reduce redundant steps in the mapping workflow (URIs of all classes were created as variables instead of typing them repeatedly) and to include steps that were not possible to proceed with MINT (e.g. normalising URIs or the creation of titles for smaller CHOs). Furthermore, the mappings were first created for the DM2E model v1.0 and then manually adapted to DM2E v1.1. It was much easier and faster to do this step by hand than by repeating the whole mapping in MINT.

The structure of the XSLT custom script is based on the XSLT script provided by the Berlin State Library and further developed for the requirements of the Institute for Cultural History and Theory at Humboldt-Universität zu Berlin.

The TEI data of the Dingler records are mapped on journal, issue, article and page level since almost all TEI documents encode full texts. Basic provider descriptive metadata from the TEI header is transformed in DM2E without any loss of data. Missing mandatory elements that the DM2E model requires are completed by default values.

Although all TEI-encoded full texts are based on philological methods, there are almost no semantically marked up persons, corporate bodies, or other subjects. In order to produce not only RDF literals, but URI references (resources), full text literals have to be transformed into URIs during the mapping or have to be extracted and processed by SILK in a second step, the contextualisation.

Representation of Hierarchical Levels

The TEI records include a representation of the hierarchical structure of the journal. The top-level is described within the TEI-header on article level and includes the basic metadata about the physical journal and about the online journal as well. The metadata on the journals is mapped to the top-level CHO, which is related to the sub-level-CHOs on the next level, the issues of the journal, via the dcterms:hasPart property. Issues include articles, which in turn gather CHOs on the lowest representational level in the object hierarchy: the pages. All top-down hierarchical relations are described by dcterms:hasPart and respectively with dcterms:isPartOf for all bottom-up relations, as these are inverse properties.

Figure 2 illustrates the hierarchical concept in the Dingler records. The linear relations between the resources on one level are defined with the property edm:IsNextInSequence as proposed in the Europeana Data Model specification.

Bild1.png

Figure 2: Hierarchical concept in the Dingler records

 

Julia Iwanowa, Evelyn Dröge and Violeta Trkulja

Berlin School of Library and Information Science, Humboldt Universität zu Berlin

MINT – Metadata Interoperability Platform

The metadata interoperability platform MINT (http://mint.image.ntua.gr) is implementing aggregation workflows in the Europeana ecosystem. It was first introduced in the ATHENA project, that made circa 4 million items available to Europeana between 2008 and 2011. Central development continued, along with customised versions that facilitated several initiatives in the domain, including EUscreen, ECLAP, CARARE, DCA, Linked Heritage, PartagePlus and 3D-Icons. MINT currently supports several projects in the Europeana ecosystem such as Europeana Photography, Europeana Fashion, AthenaPlus, LoCloud, EUscreenXL and Europeana Sounds. The MINT group also contributes in various infrastructure, technology and policy projects such as Europeana Connect, Indicate, Europeana Awareness, Europeana Creative, Ambrosia and Europeana Space. Finally, it is in the core of the Europeana ingestion infrastructure that implements their internal aggregation and publication workflow, while it has contributed in the starting up of the Digital Public Library of America, having succeeded in the respective beta sprint and invited to present in the first and second plenary meetings.

Dm2E_mint

In DM2E, MINT and D2R were introduced in order to kick off the aggregation tasks, before the development of the workflow management component and the respective user interface by work package 2. A dedicated MINT instance was setup, implementing the XSD for the DM2E model, based on the National Technical University of Athens (NTUA)’s implementation of EDM for Europeana. Dedicated training workshops instructed providers in the use of the visual mapping editor for the XSLT language, in order to create test mappings for XML or CSV imports that had to be translated to RDF. The two RDFizer tools and the SILK framework for contextualisation consisted the initial version of the project’s interoperability infrastructure.

With the design of the intermediate version of the infrastructure, MINT’s REST API was extended to expose the platform’s data and services, using the ontology that was introduced in the web-based engine for the specification, composition and execution of workflows. MINT also implemented a preprocessing step that improved the handling of records serialised in the MARC format and the subsequent use of the visual mapping editor. The preview services were also improved with the addition of a Europeana portal preview for EDM and a graph visualisation for RDF instances of the DM2E model. In parallel, an evaluation process led by work package 1 aimed at identifying the benefits and shortcomings of MINT when used with the various input models ingested by DM2E providers. Particularly, users were asked to evaluate the four basic aggregation steps; import, mapping creation, validation and XSLT export.

MINT workspace – import

In general, MINT’s visual mapping functionality was accepted by the users. The concept was deemed very intuitive and helped users become familiar with the target data schema. The results of the evaluation pointed out that schemas which are not focused only at representing descriptive metadata, but also incorporate business processes or hierarchical representations of collections – such as EAD – are difficult to handle with the visual mapping editor, but could still benefit by creating a first version of the XSLT in MINT. Finally, users were able to identify some interesting aspects of working with that version of MINT that resulted in a set of bug fixes and improvements in the next release.

MINT workspace – mapping

With the adoption of the single sign-on solution (JOSSO), MINT is fully integrated in the DM2E infrastructure, allowing the use of the mapping editor from the browser-based user interface (OmNom). For the final version of the infrastructure, two more MINT services are reused in order to assist providers and the work package 1 content team with improving the quality of publication, the Europeana HTML preview and, the validation service for the EDM model that uses the XSD and schematron rules.

Overall, the development, integration and evaluation processes resulted in productive discussions, continuously fine-tuned requirements and the evolution of both MINT and the interoperability infrastructure towards stable, intuitive tools for the execution of aggregation workflows for digital cultural heritage objects in the realm of digitised manuscripts.

Nasos Drosopoulos

Senior Researcher, National Technical University of Athens

Open Humanities Awards round 2 – Winners announced

awards-logo
 

During May we invited humanities academics and technologists to submit innovative ideas for small technology projects that would further humanities research by either using open content, open data and/or open source (in the Open track) or building upon the research, tools and data developed within the DM2E project (in the DM2E track).

We’re very pleased to announce that the winners of this second round of the Open Humanities Awards are :

Open track

  • Dr. Rainer Simon (AIT Austrian Institute of Technology), Leif Isaksen & Pau de Soto Cañamares (University of Southampton) and Elton Barker (The Open University) for the project SEA CHANGE
  • Dr.-Ing. Michael Piotrowski (Leibniz Institute of European History (IEG)) for the project Early Modern European Peace Treaties Online

DM2E track

  • Dr. Maximilian Hadersbeck (Center for Information and Language Processing (CIS), University of Munich (LMU)) for the project finderApp WITTFind

All winners will receive financial support to help them undertake the work they proposed and will be blogging about the progress of their project. You can follow their progress via the DM2E blog.


Open track – Award 1: SEA CHANGE

Geographic metadata from Early Geospatial Documents: mapped place references from 40 Latin documents dated between 1st and 8th century AD.  Approx. 5,000 places. Data from Pelagios, base map from the AWMC.

The first award of the Open track goes to Dr. Rainer Simon (AIT), Leif Isaksen & Pau de Soto Cañamares (University of Southampton) and Elton Barker (The Open University) for the project Socially Enhanced Annotation for Cartographic History And Narrative GEography (SEA CHANGE). This project will make available high-quality open geographic metadata for Historic Geospatial Documents – historic documents that use written or visual representation to describe geographic space. In the course of two “hackathon”-like workshops, the project will work with academics and students of relevant disciplines (e.g. history, geography) as well as with interested members of the public on annotating selected documents and making use of the results.

The outcome will be a body of Linked Open Data that enables humanities scholars to “map” and compare the narrative of ancient literary texts, and the contents of early cartographic sources with modern day tools like Web maps and GIS. This data will make it possible to contrast their geographic properties, toponymy and spatial relationships. Contributing to the wider ecosystem of the “Graph of Humanities Data” that is gathering pace in the Digital Humanities (linking data about people, places, events, canonical references, etc.), it will open up new avenues for computational and quantitative research in a variety of fields including History, Geography, Archaeology, Classics, Genealogy and Modern Languages.

SEA CHANGE will complement the ongoing Pelagios research project, a pioneering multi-year initiative funded by the Andrew W. Mellon Foundation, JISC and the AHRC, that aims to aggregate a large corpus of geographic metadata for geospatial documents from Latin, Greek, European medieval and maritime, as well as early Islamic and Chinese traditions. SEA CHANGE will draw content from similar sources as Pelagios (e.g. the Perseus Digital Library, the Open Philology Project, the Internet Archive or Wikisource), and re-use some of its tools (e.g. the Recogito annotation tool). But in contrast to Pelagios, SEA CHANGE will explore a crowdsourcing approach. It will trial different aspects of collaborative geo-annotation, and ascertain their consequences in terms of data quality, resources required, and participant motivation. Most importantly, however, Dr Rainer Simon highlights:

we are convinced that SEA CHANGE is more than just a means to generate exciting new data relevant to humanities research – it is also a chance to engage with a wider audience and, ultimately, build community.

Open track – Award 2: Early Modern European Peace Treaties Online

screenshot_2

Early Modern European Peace Treaties Online (“Europäische Friedensverträge der Vormoderne online”) is a comprehensive collection of about 1,800 bilateral and multilateral European peace treaties from the period of 1450 to 1789, published as an open access resource by the Leibniz Institute of European History (IEG).

Peace treaties between dynasties and states form an important part of our European cultural heritage.  They are also essential for research into early modern peacekeeping and diplomacy. Early Modern European Peace Treaties Online bundles manuscripts that are scattered over archives all over Europe, often hard to access, and partly undocumented.  The digitized manuscripts are annotated with basic metadata, and some particularly important treaties are also available as full-text critical editions.  This unique combination of digital facsimiles and critical editions has turned out to work as a well-received starting point for scholarly research in this area.

The collection data is currently stored in a relational database with a Web front-end and is one of the most popular digital offerings of IEG.  However, it is currently not available as Linked Open Data.  This project aims to bring the collection to the Linked Data cloud, which will allow researchers not only to browse the collection but also to use and reuse the data in novel ways and to integrate it with other collections, including Europeana.

The approach foreseen is to represent the key facts of the peace treaties (date, place, signatories, powers, type of treaty, etc.) in RDF using the nanopublications approach, an approach originally developed in the biomedical domain.  The publication of the European peace treaties collection as Linked Open Data will make more content and data openly available for researchers to use, and will make it possible to link it to other relevant information, e.g., persons and places via GND/VIAF.


DM2E track – Award: finderApp WITTFind

wittfind

At his death, the Austrian philosopher Ludwig Wittgenstein (1889-1951) left behind 20,000 pages of philosophical manuscripts and typescripts, the Wittgenstein’s Nachlass. In 2009 the Wittgenstein Archives at the University Bergen (WAB), a member of the DM2E project, made 5000 pages from the Nachlass freely available on the web at Wittgenstein Source. Since 2010, the research group “Wittgenstein in Co-Text” has worked on developing the web-frontend finderApp WiTTFind and the “Wittgenstein Advanced Search Tools” (WAST), which provide the possibility of rule-based searching Wittgenstein’s Nachlass in the context of sentences.

The current project  finderApp WiTTFind offers to the users and researches in the field of humanities a new kind of search machine. Unlike the search capabilities of Google books and the Open Library project, the tools are rule-based and in combination with electronic lexicon and various computational tools, this project will provide lemmatized and inverse lemmatized search and allow queries to the Nachlass which include word forms, semantic and sentence structured specifications. Syntactic disambiguation is done with Part-of-Speech tagging. Query results are displayed in a web browser as XSLT-transformations of the transcribed texts, together with facsimile of the matching segment in the original.  With this information researchers are able to check the correctness of the edition and can explore original handwritten edition-texts which are otherwise stored in access-restricted archives.

The project consists of three elements:

  1. An extension of the finderApp which is currently used for exploring and researching only Ludwig Wittgenstein’s Big Typescript TS-213 (BT) to the rest of the open available 5000 pages of Wittgenstein’s Nachlass
  2. Making the tool openly available to other humanity projects by defining APIs and a XML-TEI-P5 tagset, which defines the XML-structure of the texts which are processed from the finderApp
  3. Building a git-server-site which offers the applications and programs to other research projects in the field of Digital Humanities

We congratulate all winners and look forward to seeing the outcomes of their work preseted on the DM2E blog and at upcoming DM2E events in the near future.

 

Project meeting 5, Bergen, 12-13 June 2014

University of Bergen, Department of Philosophy

After a busy first half of 2014, the DM2E project consortium met at the University of Bergen to discuss the progress made, as well as the upcoming final period of the project. Antoine Isaac of Europeana was invited to present the new Europeana strategy as well as to take part in discussing the link between the technical work done in DM2E and Europeana, especially regarding the aggregation, conversion and ingestion of EDM (Europeana Data Model) data into Europeana. Following on several more detailed presentations on the ongoing research in work packages, there was a session devoted to dissemination of the final results, as well as a debate on the future sustainability of DM2E results, one of the focus points for the final period.

The meeting started with presentations on the results of the last six months by each of the four work package leaders. Their slides are included below.

After this overview, Antoine Isaac gave a presentation on the new Europeana strategy, which is focused on transforming from a platform into a multi-sided portal, offering distinctive value to end users, creatives and professionals. This was followed by a discussion on the ingestion of EDM data from DM2E in Europeana.

Another part of the meeting was devoted to more detailed presentations of the ongoing research in work packages 2 and 3, as well as a demonstration of the new functionality of the Pundit tool by Net7. Dominique Ritze presented updates on the contextualisation tool SILK (integrated in Omnom) through which new linkage rules can be defined to create links between Linked Data resources.

Kai Eckert showed the link DM2E has with the recently started DCMI RDF Application Profiles Task Group (RDF-AP). This group deals with the development of recommendations regarding the proper creation of data models, in particular the proper reuse of existing data vocabularies. The DM2E project was a driving factor in establishing the task group, and the DM2E model will be one of the main case studies.

At the end of the meeting, Steffen Hennicke from the Humboldt University reported on the progress of the research into the scholarly domain model related to the digital humanities. This research focuses on questions such as what kinds of ‘reasoning’ digital humanists want to see enabled by the data and information available in Europeana, and which types of operations digital humanists expect to apply to this data. Several experiments related to this task will be running in the final six months of the project.

In addition, some time was reserved for discussing the dissemination of project results in the final period: there are busy months ahead, with six more DM2E-related events being organised between July and December 2014. The first of these will be the Open Data in Cultural Heritage workshop (15 July, Berlin): more information is available here. All other events will be announced through the DM2E website in the near future. We look forward to a busy and fruitful final period!

Feeding Digital Humanities

While the Digital Humanities community of information scientists, developers and scholarly enthusiasts is making huge progress in the development of tools and virtual research environments (VREs), the vast majority of scholars in the field of Jewish studies rely on traditional methods of research. At the same time digitised primary resources for Jewish studies resources are growing exponentially worldwide.

Jewish Studies were one of the first academic communities to make use of digital resources with the Responsa project which began in 1967.Butis it possible that the present advances in Digital Humanities and many researchers in Jewish studies are like ships in the night that are going to pass by, and probably not meet again?

Ketubah, Herat, 5628 Ḥeshvan 16
[1867 November 14], Ket 270

The Judaica Europeana project and network of libraries, museums and archives which hold digital collections has uploaded millions of digital objects to Europeana. This process continues in the framework of the DM2E and AthenaPlus projects. Led by the European Association for Jewish Culture, the partners in the network will in the coming months integrate in Europeana some of the most valuable resources for Jewish studies: the metadata of the collections from the Center for Jewish History in New York (the YIVO and the Leo Baeck Institutes), the JDC Archives, the Jewish Theological Seminary Library, the Jewish Museum in Prague and many others. The latest Judaica Europeana newsletter presents the highlights of some of these collections.

But will anything be done to ensure that these magnificent collections – digitized at great expense with public or charitable funding ― are used to their full potential? Will the opportunities of the Linked Open Data web and the growing box of open-source tools find many takers in the Jewish Studies community? As they used to say: you can lead a horse to the water, but you can’t make it drink….

Dov Winer, Judaica Europeana’s Scientific Manager, has been arguing for some time that Digital Humanities and LOD have the potential to revolutionize Jewish Studies. The time is ripe: the DM2E project and its experts have been working on modelling the scholarly domain and developing award-winning research tools that respond to the needs of scholars. The DM2E project also provides a platform for the integration of Jewish-content metadata in Europeana.

DM2Epunditphilosopher351

So what’s next?

DM2E and Judaica Europeana are currently involved in converting vocabularies and encyclopaedias into formats which make them available as Linked Open Data and therefore capable of enriching the metadata of Jewish content and provide contextual meanings. This initiative, which is driven by Dov Winer of EAJC and Kay Eckert of the Research Group on Data and Web Science at Mannheim University, will soon result in the publication of The YIVO Encyclopaedia of Jews in Eastern Europe in a LOD format. The PUNDIT and ASK tools, winners of the 2013 LODLAM Challenge, are freely available with tutorials in four languages, on the DM2E website. The newsletters of Judaica Europeana, disseminated widely to the Jewish studies scholars, have been promoting the Digital Humanities agenda and these new tools to their potential constituencies and users. The work of Judaica Europeana and DM2E will be brought to the attention of participants in the forthcoming Xth Congress of the European Association for Jewish Studies in Paris: on 21 July, our partners will present captivating research in a panel entitled New perspectives on Jewish and non-Jewish relations in modern European culture based on Judaica Europeana digital collections.

Dov Winer, in his paper ‘Feeding Digital Humanities’ argues that what is needed to take all these efforts further is an ongoing virtual infrastructure and a community of practice: a network of scholars committed to using a Virtual Research Environment for their research including a small part-time team to lead it. So far 16 academic researchers from various European universities expressed a strong interest.

 

Lena Stanley-Clamp

Coordinator, Judaica Europeana

Director, European Association for Jewish Culture

Open Humanities Awards: 2nd round – Deadline extended to 6 June 2014!

OpenHumanitiesLogos

We are excited to announce the second round of the Open Humanities Awards. *The deadline for submissions to the awards has been extended to Friday 6 June 2014.*

There are €20,000 worth of prizes on offer in two dedicated tracks:

  • Open track: for projects that either use open content, open data or open source tools to further humanities teaching and research

  • DM2E track: for projects that build upon the research, tools and data of the DM2E project

Whether you’re interested in patterns of allusion in Aristotle, networks of correspondence in the Jewish Enlightenment or digitising public domain editions of Dante, we’d love to hear about the kinds of open projects that could support your interest!

Why are we running these Awards?

Humanities research is based on the interpretation and analysis of a wide variety of cultural artefacts including texts, images and audiovisual material. Much of this material is now freely and openly available on the internet enabling people to discover, connect and contextualise cultural artefacts in ways previously very difficult.

We want to make the most of this new opportunity by encouraging budding developers and humanities researchers to collaborate and start new projects that use this open content and data paving the way for a vibrant cultural and research commons to emerge.

In addition, DM2E has developed tools to support Digital Humanities research, such as Pundit (a semantic web annotation tool), and delivered several interesting datasets from various content providers around Europe. The project is now inviting all researchers to submit a project building on this DM2E research in a special DM2E track.

Who can apply?

The Awards are open to any citizen of the EU.

Who is judging the Awards?

The Awards will be judged by a stellar cast of leading Digital Humanists:

What do we want to see?

maphub_oa_comment.png

Maphub, an open source Web application for annotating digitized historical maps, was one of the winners of the first round of the Open Humanities awards

For the Open track, we are challenging humanities researchers, designers and developers to create innovative projects open content, open data or open source to further teaching or research in the humanities. For example you might want to:

  • Start a project to collaboratively transcribe, annotate, or translate public domain texts

  • Explore patterns of citation, allusion and influence using bibliographic metadata or textmining

  • Analyse and/or visually represent complex networks or hidden patterns in collections of texts

  • Use computational tools to generate new insights into collections of public domain images, audio or texts

You could start a project from scratch or build on an existing project. For inspiration you can have a look at the final results of our first round winners: Joined Up Early Modern Diplomacy and Maphub, or check out the open-source tools the Open Knowledge Foundation has developed for use with cultural resources.

As long as your project involves open content, open data or open source tools and makes a contribution to humanities research, the choice is yours!

For the DM2E track, we invite you to submit a project building on the DM2E research: information, code and documentation on the DM2E tools is available through our DM2E wiki, the data is at http://data.dm2e.eu. Examples include:

  • Building open source tools or applications based on the API’s developed

  • A project focused on the visualisation of data coming from Pundit

  • A deployment of the tools for specific communities

  • A project using data aggregated by DM2E in an innovative way

  • An extension of the platform by means of a practical demonstrative application

Who is behind the awards?

The Awards are being coordinated by the Open Knowledge Foundation and are part of the DM2E project. They are also supported by the Digital Humanities Quarterly.

How to apply

Applications are open from today (30 April 2014). Go to openhumanitiesawards.org to apply. The application deadline has been extended to 6 June 2014, so get going and good luck!

More information…

For more information on the Awards including the rules and ideas for open datasets and tools to use visit openhumanitiesawards.org.

 

Fourth Digital Humanities Advisory Board meeting

On 3 April 2014 the DM2E Digital Humanities Advisory Board held their fourth meeting through Skype. This Board is responsible for steering the research direction of the DM2E project and ensuring that the technical development on the project responds to the needs of scholars.

Attendees from the Digital Humanities Advisory Board included:

  • Sally Chambers (DARIAH)
  • Alastair Dunning (Europeana)
  • Dirk Wintergrün (Max-Planck-Institut für Wissenschaftsgeschichte)
  • Felix Sasaki (W3C)
  • Alois Pichler (University of Bergen)
  • Laurent Romary (INRIA)

In this meeting, Vivien Petras (Project coordinator, Humboldt-Universität) presented a summary of what happened in DM2E in the second project year.

Christian Morbidoni (Net7) gave an overview of the progress of work package 3, which is researching the scholarly practices in the humanities as well as building the tools that respond to the needs of scholars.

Next, the workplan for the research on the digital humanities scholarly primitives was presented by Steffen Hennicke (Humboldt-Universität) and discussed between Board members. The three main principle research objectives of this task are (1) to investigate the functional primitives of Digital Humanists, (2) the kinds of reasoning Digital Humanists want to see enabled, and (3) the types of operations Digital Humanists want to see enabled. More information on this can be found in this paper (presented at the DH2013 conference):

In the third project year, the DM2E team is planning to supplement this research paper with a discussion on how the Scholarly Domain Model (SDM) relates to similar activities and if and how these activities map to the SDM. Another experiment with the Pundit tool based on a non-philosophical use case is being investigated, with the aim to demonstrate and evaluate the usefulness and added value of Pundit (especially ASK) and Linked Data: How do relevant research questions translate to the context of Pundit and Linked Data? The additional experiment and the earlier work on the Wittgenstein Incubator will be discussed and analyzed using the terminology of the SDM: Which primitives and activities have been enabled by the experiments and how have operations been enabled through RDF ontologies?

The Advisory Board provided some valuable input on further cooperation with other projects such as Europeana Cloud, DARIAH and Europeana 1914-1918, and approved on the proposed workplan.

Finally, all members agreed that Dirk Wintergrün will serve as interim Chair of the Digital Humanities Advisory Board in the next six months.

Open Humanities Awards – Joined Up Early Modern Diplomacy – final update

This is the final blog in a series of guest blog posts by Robyn Adams and Jaap Geraerts, part of the project team that won the DM2E Open Humanities Award at the Center for Editing Lives and Letters. The final report is available here.

The blind spots of network visualizations

In our last blog we discussed some of the issues regarding the use of network visualizations as well as the limits of the information such visualizations might convey. We will examine this topic into more detail in this blog by employing some of the data derived from our own project. To reiterate a point we previously made, network visualizations often include only one type of relationship, which obscures the various links which connected people to one another. When analysing epistolary networks and the dissemination of information, the omission of such links can be of vital importance, for even when letters were sent to one person, often recipients were asked to pass on information to a third party. Often letters were packages, consisting of multiple letters or including other documents that sometimes were addressed to different people, thus turning a recipient (or the first recipient of the package) into a transmission agent, a person responsible for dispatching the documents or the information to their final destination. Besides the fact that people had different roles, how can we include the flow of information that resulted from people meeting in person to convey the information they received via letters, for instance?

We will illustrate the limits of a visualization of a binary epistolary network by focussing on a case-study, namely the correspondence and the transmission of information regarding the siege of Groningen, which took place in the late-spring and summer of 1594. The following image is a traditional network visualization, the edges representing the letters sent between Bodley and his correspondents regarding the siege of Groningen in the period May to August 1594.1

Fug9qr

This simple visualization tells us, for instance, that only a small number of people wrote about the siege and that Bodley was the main correspondent, yet there is much more information that is not represented in or by this image. The visualization also creates the impression that, while the siege was of great importance to the Dutch authorities, it nevertheless was only discussed by English correspondents who were part of the epistolary network. In order to show which people actually were involved in transmitting information, the way in which they did this, and the various links that were forged in the process of disseminating information about the siege, we returned to the primary sources. Extra research was necessary as we wanted to move beyond the basic relationship codified in our dataset, namely the authors and recipients of the letters, to include a large palette of connections which linked people to one another. The results are visualized in a SDL-diagram, which is normally used to depict processes within a particular system (e.g. a computer program), yet its format enabled us to track the flow of the information as well as the various actions and the ensuing relationships. The diagram follows after the figure below, which is a key to the symbols used.

Case study II Groningen (SDL)

SDL diagram explanation

Because of the density of the information included in the diagram, it might be initially more challenging to ‘read’ the image, but it immediately becomes apparent that a lot more people were involved in the exchange of information regarding the siege than has been shown in the network visualization, including the Dutch stadholder, Maurice of Orange, and Sir Francis Vere. Other relationships are also visible: in her letter to Bodley (letter id. 29), Queen Elizabeth asked him to convey the Queen’s message to the Council of State, the Estates General, and to Maurice of Orange. It is likely that Bodley went to see the members of these political bodies in person in order to pass on the information, hence links were created that existed outside of but are closely related to the epistolary network.

The diagram makes it clear that Bodley was not ‘just’ a correspondent, but also acted as a transmission agent, and the people who normally took care of transporting the letters, the bearers, can be easily included in the diagram as well (thus expanding the network and more clearly showing the different people who were involved in the transmission of information and who connected the various correspondents to one another). The sources from which Bodley derived his information are also shown: on July 14, 1594, Bodley wrote to Robert Cecil (letter id. 454) about the siege and he mentioned that he had received letters from the army camp at Groningen which provided him with information. The symbols indicate where other documents were enclosed with the letters sent between the correspondents: a letter package from Bodley to Burghley included a map about the siege of Groningen, one of the three maps about the siege Bodley sent to England.2

The SDL-diagram is one way of including various types of relationships and different flows of information that are difficult to include in many network visualizations (when using open source visualization software). Instead of depicting straightforward binary networks consisting of authors and recipients, we can zoom in closer, as it were, and show the material processes of collecting and disseminating information in more detail. Moreover, using such visualizations enables us to capture the complexity of the historical data as well as the diversity of the network. Arguably this comes at the cost, for it is difficult to visualize a large dataset in this way, but it opens up possibilities of visualizing networks without losing too much of the complexity and richness of the historical data which makes it so interesting to study in the first place. It also enhances our understanding of the often idiosyncratic process of gathering and spreading information and the fluid character of early modern information networks, aspects which tend to be ill-represented in neatly constructed network visualizations.

1 Gephi and Inkscape have been used to create this visualization. The letters that have been selected all mentioned the city of Groningen in the period in which the siege took place.
2 For the maps, see: Robyn Adams, ‘Sixteenth-Century Intelligencers and Their Maps’, Imago Mundi: The International Journal for the History of Cartography 63:2 (2011), 201-16.

Open Humanities Awards – Maphub final update

*This is the final blog in a series of posts from Dr Bernhard Haslhofer, one the recipients of the DM2E Open Humanities Award. The final report is available here. *

Semantic Tagging in Maphub – Final Results and Lessons Learned

Maphub (http://maphub.github.io) is an open source Web application which allows people to annotate digitized historical maps. It pulls maps out of closed environments, adds zooming functionality, and assigns Web URIs so that people can talk about them on the Web. It has been built as a demonstrator for the W3C Open Annotation specification (http://www.w3.org/community/openannotation/), which currently works towards a common, RDF-based, specification for annotating digital resources. Here is a screenshot of the prototype application:

maphub.jpg

A first prototype (http://maphub.herokuapp.com) has been bootstrapped with a set of around 6,000 digitized high-resolution historical maps from the Library of Congress’ Map Division. It allows users to retrieve maps either by browsing or searching over available metadata and user-contributed annotations and tags.

Technical Details

Semantic tagging is part of Maphub’s annotation feature: to create an annotation, users markup regions on the map with geometric shapes such as polygons or rectangles. Once the area to be annotated is defined, they are asked to tell their stories and contribute their knowledge in the form of textual comments. While users are composing their comments, Maphub periodically suggests tags based on either the text contents or the geographic location of the annotated map region. Suggested tags appear below the annotation text. The user may accept tags and deem them as relevant to their annotation or reject non-relevant tags. Unselected tags remain neutral.

The screenshot in the next figure shows an example user annotation created for a region covering the Strait of Gibraltar. While the user entered a free-text comment related to the naming of the area, Maphub queried an instance of Wikipedia Miner (http://wikipedia-miner.cms.waikato.ac.nz/) to perform named entity recognition on the entered text and received a ranked list of Wikipedia resource URIs (e.g., http://en.wikipedia.org/wiki/Mediterranean_sea) in return. URIs should not be exposed to the user, so Maphub displays the corresponding Wikipedia page titles instead (e.g., Mediterranean Sea). Since page titles alone might not carry enough information for the user to disambiguate concepts, Maphub offers additional context information: the short abstract of the corresponding Wikipedia article is shown when the user hovers over a tag.

annotation.png

Once tags are displayed, users may mark them as relevant for their annotation by clicking on them once, which turns the labels green. Clicking once more rejects the tags, and clicking again sets them back to their (initial) neutral state. In the previous screenshot, the user accepts five tags and actively prunes two tags that are not relevant in the context of this annotation.

Sharing Annotations and Semantic Tags

Sharing collected annotation data in an interoperable way was another major development goal. Maphub is an early adopter of the Open Annotation specification and demonstrates how to apply that model in the context of digitized historic maps and how to expose comments as well as semantic tags. As described in the Maphub API documentation (http://maphub.github.io/api), each annotation becomes a first class Web resource that is dereferencable by its URI and therefore easily accessible by any Web client. In that way, while users are annotating maps, Maphub not only consumes data from global data networks – it also contributes data back. The following screenshot shows how the previous annotation could be represented following the Open Annotation specification.

maphub_oa_comment.png

Tagging Experiments

While working on Maphub, its semantic tagging functionality has become our core research interest. We conducted an in-lab user study with 26 participants to find out how semantic tagging differs from label-based tagging and learned that there was no significant difference in its tag production capacity, in the types and categories of tags added, and in overall user task load. Hence, semantic tagging as implemented in Maphub could produce the same result as a label-based tagging, with the main difference that semantic tagging gives references to unambiguous Web resources instead of semantically ambiguous labels. More details on the methodology and results of that experiment are described in our report available at (http://arxiv.org/abs/1304.1636).

Enabling Annotations and Semantic Tagging in other Applications

We found that semantic tagging might be useful for other application scenarios as well. Therefore, with the support we received from the Open Humanities Award, we added a semantic tagging feature to Annotorious (http://annotorious.github.io/), which is a JavaScript image annotation library that can be used in any Website. Annotorious is also compatible with the Open Knowledge Foundation’s Annotator (http://annotatorjs.org/) tool. Our next research and development steps will go into two main directions: (i) providing a more efficient and lightweight (semantic) tag suggestion service, and (ii) improving tag recommendation strategies.