Pundit, the open source semantic web annotator tool that is being developed within DM2E, is organising a full day event on 2 April 2014 in Berlin to hear your ideas on its User Interface and User Experience.
After winning prizes, being adopted in various environments and successfully adding semantic information to thousand of web pages using gazillions of Linked Open Data objects, developers and designers are working their brains off on the next version of the tool.
This new version will make it possible to annotate faster, more easily and with less distractions, without losing its powerful semantic expressivity. Not an easy task: that’s why we want to hear from you!
What do you expect from the new version of Pundit? How can we, together, best develop this open source tool for a better, faster, stronger semantic web? Join us at the Humboldt University in Berlin on 2 April 2014 for the Pundit UX/UI event.
9.30 Registration and coffee
10.00 Introduction to the day and demonstration of the proposed updates to the Pundit user interface
13.00 Hands on session: brainstorming, tests and hacks on prototypes and mockups
Exzellenzcluster »Bild Wissen Gestaltung« / Cluster of Excellence »Image Knowledge Gestaltung«, Humboldt University Berlin
Sophienstraße 22a, Berlin-Mitte, 2nd backyard, 2nd floor, right wing
This is a guest blog post by Jaap Geraerts. Jaap works as part of the team that won the DM2E Open Humanities Award at the Center for Editing Lives and Letters.
Even without knowing what these blue dots mean, the viewer instantly recognizes that this image tells us something about the United States of America. The image, which shows the number of people who board or alight a flight (and where), can be explained very easily. Even though much is left unexplained in this image, the fact that the context is rather obvious and the fact that the data easily can be used to show density, makes this an effective visualization. However, in other cases the density of a network structure and the often resulting hairballs, unintelligible clutters of nodes and edges, are more confusing than explanatory. When aiming, for instance, to include another layer of data into an epistolary network (such as the people mentioned in the letters), the result can become something like this:
This is an extreme example as the visualization has not been modified by using algorithms and filters, but it exemplifies how easily visualizations can become meaningless when using substantial layers of data in combination with the lack of a clear context. It therefore behoves the scholar to consider the particularities of the data and its possible limits as well as the aim of the visualization before enthusiastically pouring the data into computer programs. Furthermore, how to integrate the visualizations with the scholarship that underlies the visualization? And how do the visualizations tie in with the aims of the research and in what way do they enrich this research?
Possible ways of dealing with complex data can be to use different types of visualizations which show aspects or parts of the data set, perhaps to elucidate an aspect of the dataset which is otherwise difficult to perceive. Another option is to focus on a specific part of a larger visualization, or to approach the data from a specific angle (e.g. how did topic X flow through an epistolary network) in order to highlight the specifics of the network. Although countless options are imaginable, incorporating various visualizations into a narrative structure is a potential way of dealing with a complex data set, as the text can provide the much-needed (historical) context while also explaining the limits of the visualization to the reader. This does not mean that each visualization is accompanied by a lengthy explanation, but rather that the text and the visualizations support each other so that visualizations are not merely an addition to a story, but become part of it.
The point is that complex data can be visualized, but often at the cost of losing some of the complexity which makes the data (or sources) so interesting to study in the first place. When standing on their own, modern research techniques such as visualizations do not always add significantly to the existing scholarship: the crux is to combine these innovative techniques with more ‘traditional’ scholarship and to integrate the methodologies that are used for the gathering and mining of archival data in order to be able to push the boundaries of the research undertaken in the fields in which we are working.
One of the primary goals of the DM2E project is to build a set of tools that can be used to support and further humanities scholarship. Early on in the project a group of specialists on the influential twentieth century philosopher, Ludwig Wittgenstein, were identified as a key scientific community for the tools under development. The scholars based at the Bergen Wittgenstein Archives at the University of Bergen, also a content provider to DM2E, have been consulted throughout the development of the project’s flagship annotation tool, Pundit.
At the end of December members of this very community of Wittgenstein experts and digital humanists gathered in Bergen to give feedback on their experiences using Pundit to annotate Wittgenstein’s digitised manuscripts that have been made available through Wittgenstein Source. Many of those present had been involved in the Agora and Discovery projects, which had undertaken much of the technical groundwork which DM2E has built on.
As preparation for the workshop all participants had been asked to do some exercises and complete a survey with the DM2E annotation tool, Pundit. After a welcome by Alois Pichler from the Wittgenstein Archives, Kristin Dill of DM2E partner the Austrian National Library opened up proceedings with a brief introduction to the project and a presentation of the survey results to get a sense of some early responses to Pundit from the group present.
The DM2E partners behind Pundit, Net7, demonstrated a new and important aspect of the annotation tool, called AskThePundit. Ask enables users who have created annotations in Pundit to share their own “notebooks” and discover those of others. The platform offers an incredibly powerful way to connect users and enable novel presentations of sets of annotations. Pundit is increasingly being taken up and integrated in other tools for Wittgenstein research, like for example the splendid search tool WiTTFind developed at the LMU-CIS in Munich.
If you’re interested you can check out the current Beta version of Ask here. The demonstration of the platform was very well received by the participants.
On top of the demonstration of the AskThePundit platform, the Net7 team demonstrated impressive integrations that had been developed with Pundit allowing users to visualise networks of influence and create timelines using from Pundit annotation data.
The second-half of the day was dedicated to an open discussion in which the researchers could discuss in detail their experiences using Pundit and suggest possible improvements to the software. Key points and feature requests that emerged from the lively discussion that followed were as follows:
Feature requests for Pundit and AskThePundit
An option for licensing your annotations according to how you would like them to be used;
More possibilities for visualising graphs created by annotations;
Make it possible to “reply to” annotations;
Allow users to search for annotations with a URL;
Allow annotations to be grouped by the time created;
Enable users to delete annotations.
More Linked Open Data and content is needed before scholars can really feel like they are acting as if in a library in the Linked Data cloud;
Better documentation should be made available for using ontologies so that it’s clearer how to use them.
During the latter stages of the day some interesting questions were raised concerning the opportunities for community building around tools like Pundit that offer humanities researchers new ways of working with traditional texts. A key issue that was identified by participants was that many researchers were not accustomed to using digital environments for the creation of annotations, let alone the creation of annotations as Linked Data. It was therefore felt that the best means of engaging the scholarly community in the use of novel digital humanities tools was through working with students and young researchers who were digital natives and more flexible in their approach to working with texts.
The day was wrapped up by Kristin Dill of the Austrian National Library. As a follow up, participants were given an opportunity to rate the various features that had been demonstrated during the day in the form of a survey. Data from this survey will help the DM2E team evaluate how successfully the current version of Pundit responds to the scholars’ needs.
At the end of November the DM2E Consortium met in Athens to review progress made on the project so far and strategise about the next six months. The two days also involved presentations from two other Europeana projects, Europeana Cloud and Europeana Inside, both with overlaps with the technical work within DM2E. The presentations were followed by a lively debates on how DM2E can best demonstrate its value to the scientific community it serves.
The meeting began with a review from each of the four Workpackages on the last six months. Presentations from each of the Workpackage leaders can be found below:
Following on from the updates, Klaus Thoden of Workpackage gave a presentation of the preliminary results of the contextualisation of the digitised manuscripts data made available to the project by the content providers:
During the afternoon of the first day the DM2E Consortium had the opportunity to demo the tools being developed as part of Workpackage 1.
For the next section of the stage was given over to two related Europeana project, Europeana Cloud and Europeana Inside, as a basis for discussion for possible future collaborations. Gordon McKenna provided some background on the Europeana Inside project, his slides can be found below:
Joris Klerkx followed up with a similar exposition of the work of Europeana Cloud, identifying possible avenues of collaboration between Europeana Cloud and Europeana Inside.
This is a guest blog post by Jaap Geraerts. Jaap works as part of the team that won the DM2E Open Humanities Award at the Center for Editing Lives and Letters.
Since the last update about the Joined Up Early Modern Diplomacy project I have devoted a bit of time to the rationalization of the databases which have been created for this project. As the data of the Bodley project is stored in two databases (an ACCESS database and a MySQL database which powers the website) and both of the databases will be used to create the visualizations, it is important to ensure that both databases contain similar data. Moreover, we tried to see which data that was stored in the ACCESS database could be included in the MySQL database in order to enhance our understanding of Bodley’s correspondence network. The XML-files which contain the transcriptions of the letters that are visible on the website of the project had to be updated as well in order to keep them aligned with the updated databases, and all of this shows the work which precedes the creation of the actual visualizations – and I have not even started talking about the whole process of thinking about which visualizations are worth making, a topic which will be addressed in the next blog.
After the process of populating and updating the databases it is time to take the next step towards creating the visualizations, which is the prepare the data to be imported into GEPHI, the software we use to construct the visualizations. As GEPHI requires the data to be presented in a specific format which enables the software to connect the authors to the recipients of the letters and thus to construct the network, the data has to be exported from the database in a particular way.
Moreover, the way GEPHI thus looks at data poses interesting questions about how we view historical data ourselves. For instance, the issue of how to represent the fact that a letter had two or more authors in GEPHI raises the questions whether we should see these historical figures acting as one entity when writing such a letter. In other cases, especially when aiming to move beyond the ‘mere’ visualization of Bodley’s network by including other layers of information, such as the people and places mentioned in the letters, the question is how to capture the historical context and the wealth of the primary sources into a standardized piece of twenty-first-century software. Furthermore, the editorial decisions made by the research team in the development stage of the correspondence project meant that ‘correspondence’ was a fluid term: the bulk of the corpus comprises letters directly to or from Bodley, but also includes items sent in letter packets which, although epistolary in concept, do not necessarily have an addressee (or one that is immediately apparent, e.g. Bodley’s passport and cipher).
The examples given above bear witness to the fact that when using IT-software the researcher is obliged to engage in a dialogue between the software and the historical sources, and it is exactly at this point that IT-skills and the skills of a historian intersect. In addition, these examples serve as a reminder that while IT-software is able to create new insights and helps to address new research questions, a lot of extra work is necessary in order to gain the desired results, which in turn adds scholarly value to the technical resource. In this sense, it is important to remember that the tools embraced by the research taking place within the digital humanities do not magically provide extremely interesting results – rather, using some of these tools is like opening the box of Pandora. In this rapidly changing field of research, then, traditional skills such as scholarly diligence are needed more than ever.
This blog post introduces our newest member of the Centre for Editing Lives and Letters project team, Jaap Geraerts. Jaap is the research assistant on the ‘Joined-Up Early Modern Diplomacy’ project, and will be working to generate visualizations from the Bodley project data until the end of December 2013. Jaap is ideally suited to this role, nearing the completion of his PhD in the UCL History deparment which focuses on early modern marriage practices of elite Low Countries families, as well as having solid technical skills from both his higher education and previous work experience.
‘From 1588 to 1597 Thomas Bodley served as the English ambassador in the United Provinces and was stationed in The Hague, while also representing his country in the Dutch Council of State. In this period Bodley sent and received around a 1000 letters, and thanks to the arduous work of Dr Robyn Adams we have access to a wealth of data, such as the places and people mentioned in the letters and the names of the authors and recipients of the letters. My main task as the research assistant of this project is to use this data to provide meaningful and insightful visualisations, which means that the visualisations should increase our understanding of Bodley’s network of correspondents and of the information that was spread through this network (the so-called ‘data-flow’).
In order to get started with the project I began with a survey of the various visualisation projects within and without the Digital Humanities to get an idea of the different ways in which data can be visualised. The Digital Humanities are a hot topic at the moment, with on-going projects such as HISGIS, Mapping the Republic of Letters, Mapping Books, and of course the various projects undertaken here at CELL, to name but a few. Moreover, conferences and seminars aim to discuss the research undertaken in the Digital Humanities and the methodological implications of using computer software such as Geographic Information Systems and Social Network Analysis, among other things.
It immediately became apparent that many different ways to visualise data are used, ranging from boxplots to fancy images that show networks of correspondents and their physical locations. The way in which the data is presented is of huge influence on the insights provided by the visualisations, and an important part of this project will therefore be to think about how we best can present the data that is gathered from Bodley’s letters. In this project the visualisations will be done in Gephi, open-source software which is mainly used for Social Network Analysis. One of the advantages of Gephi is that it is constantly updated, making available new functionalities and thus keeping up with the latest developments within information technology as well as with the wishes of its users. Furthermore, the program is user-friendly and provides tools for the manipulation of the data, enabling the user to highlight different aspects of the network, such as the centrality of a specific person in a network. It is important that the software is capable of producing the visualisations we want, for although resorting to information technology for our scholarly needs, the desired visualisations are the outcome of our academic interests and do not depend on the capacities of specific software. The goal is not just to produce pretty pictures: after all, we are still historians!
One of the tasks I have set myself since joining the project is to familiarise myself with the context as well as the content of the network, and its foundation of manuscript correspondence. Early modern letters are a fascinating archival resource with a specific set of features which lend themselves well to networks and systems of mapping social interaction. One of my main priorities during this project is to push the boundaries of historical network analysis and data visualization, and see if our understanding of the aforementioned specifics of epistolary communication (i.e. relating to letters) can be enhanced by the technology available to us for producing visual connections and meaning. Watch this space!’
DM2E project partner Net7 is looking for a front end developer to join the Pundit core development team. Pundit is the award-winning semantic annotation tool for working with digitised manuscripts being developed as part of DM2E Workpackage 3.
If you are a developer interested in building cutting-edge tools for researchers then get in touch. More information on the job and how to apply can be found here.
We’re pleased to announce that Alois Pichler of the University of Bergen and who has been working to integrate the Pundit annotation tool with the Wittgenstein Archive held in Bergen has co-authored a paper in the journal Literary and Linguistic Computing. The article entitled “Sharing and debating Wittgenstein by using an ontology” discusses some of the challenges involved in building an ontology for research about the philosopher Ludwig Wittgenstein which has a special focus on Wittgenstein’s Nachlass.
It also looks at how ontologies can enable the tools being developed as part of DM2E, such as Pundit, to help teachers and researchers work with the arguments found within Wittgenstein’s works. At present an international group of scholars is using Pundit to annotate Wittgenstein’s work as part of a DM2E research experiment called the Wittgenstein Incubator.
University subscribers are able to access the article here.
This post gives a little background to the project that will generate data visualizations from the correspondence of Thomas Bodley. I hope to reveal the context of what Bodley was up to on the continent, and also get a little bit closer to the types of intelligence, news and reports he sent. This lies at the heart of the task to link the data embedded within the project: whether and how the contents of the letters can be reconstructed visually to create networks and patterns of information.
Thomas Bodley was nominated as the replacement for the outgoing English ambassador, and arrived in the Low Countries (now known as the Netherlands) in December 1588. His main role was to sit on the Dutch Council of State as one of two English representatives that Elizabeth I was permitted to appoint as part of the Treaty of Nonsuch of 1585. His brief was to represent English interests in the conflict against the Spanish and Catholic threats against England. He spent the next 9 years in The Hague where the Council was situated, constructing a solid network of correspondence with contacts back home in England and across northern Europe. He made journeys around the Low Countries from time to time, and only made the journey back home to England rarely.
His letters are typically a mix of military and political information: discussing the allegiances of Elizabeth’s fellow European sovereigns as to whether they would join the fight against Spain, charting the efforts of the Spanish to recapture towns that had rejected Spanish rule, and itemizing in scrupulous detail the movements of English, Dutch and enemy troops across the region. He was particularly punctilious in enumerating the quantity of troops, victuals and horses possessed by the enemy or promised by allies: crucial information for the English government when conducting a military campaign from afar.
Bodley was careful to report intelligence and news – even that which was unverified – in order that his political masters back home had the most up-to-date information to hand. A key practice of his was to copy out intercepted letters of intelligence and enclose them within his correspondence. This kind of letter-writing activity demanded substantial resources. Bodley had a secretary and aide, George Gilpin, who was a veteran of English diplomatic service which had grown out of his mercantile career. Writing and copying out letters together, they would have required copious amounts of paper, ink, quills for writing, knives to shape, cut and sharpen the quills, sand for blotting the ink, and many more accoutrements now lost to the modern experience of handwriting. The archival remains of Bodley’s correspondence during his embassy to the Low Countries suggest that he was sending a letter nearly every day. Many of these letters are 5 or more folio pages long, and a huge number exist in 2 or more copies: these were dedicated letter-writing professionals.
Most diplomats of the period were careful to keep a copy of sent correspondence for reference and security – letters could easily get lost through interception and bad weather. The first few lines of letters in this period were often given over to making reference to letters already received, and alerting the reader to any letters previously sent. The method of postage was a royal service which brought the post between the English court and her continental ambassadors. Like the post today, this service was subject to delays, reliant as it was on good weather for a channel crossing, and well-rested horses being available for transport between towns.
When he departed for the Low Countries, Bodley was handed a cipher so that if his letters were intercepted, they would be difficult to read. His patron admitted that the cipher was ‘not verie curiouslie made for avoidinge of trowble to us both but yet sufficient to serve our purpose’. (Most ciphers in this period were endlessly recycled and eminently breakable, and many European rulers had their own ‘black chambers’ staffed by code-breakers).
Yet the majority of his letters remain un-ciphered. Considering the sensitivity of the information contained in his diplomatic dispatches, this is surprising. However, the areas of the letters called the salutation (or opening line), and valediction (or closing line), often make reference to figures appointed to carry the letters personally between England and the Low Countries. These figures were called ‘bearers’, and it was their job to ensure that the letters were not intercepted or tampered with along their route to the recipient, and that they met with the minimum of delay. Along with an added measure of security, these figures add another facet to the correspondence of the early modern period: accompanying the letter, the bearer was frequently called upon by the recipient to furnish extra information; information which is for the most part now lost.
Thomas Bodley’s bearers – early modern ‘frequent flyers’, judging by their numerous cross-channel journeys – form interesting and ancillary nodes of this correspondence project and the task at hand to visualize the networks of information within. They are the human, often shadowy presence behind much of the correspondence, tracing and marking the physical route between recipient and correspondent in England and the Low Countries. Mentioned in the correspondence, and an active agent participating in the successful process and completion of the correspondence, they remain relatively un-credited. One of the purposes of the Joined Up Early Modern Diplomacy Project is to rehabilitate and recover information about these people (most often they are men). The exciting job at hand is to develop a means to visualize the networks of agents and correspondents in relation to their role within the letter network: can we work out a way of representing the bearer so their agency is visible despite not putting pen to paper?
In my previous blog post I briefly described the two main construction areas we are working on in order to provide semantic tagging functionality as part of the Annotorious image annotation library. First we need to add GUI elements that display proposed semantic tags to users which allows them to accept or reject tags. Second, we need a backend-service that proposes tags based on textual annotation and context features.
Work on the front-end is making good progress and the code base is growing. Recently we were working on UI design issues with the goal to come with a appealing and intuitive interface. We also published first demos on the Annotorious Github site.
At the moment, it is possible to use the Annotorious semantic tagging plugin with a running instance of Wikipedia Miner, which returns relevant Wikipedia articles for given textual input. We use this information to generate semantic tags, which are then proposed on the user interface. Connecting Annotorious with instances of DBPedia Spotlight, the Freebase API or any other commercial named entity recognition engine are possible extension points of the current front-end code base.
Besides connecting the Annotorious semantic tagging plugin with previously mentioned services, we will also continue working on Contextualism (https://github.com/behas/contextualism), which should be a light-weight alternative operating on simple gazeteers or taxonomies. We already defined the interface (see dev branch) and will work on the implementation in the upcoming weeks.