Understanding events in cultural heritage
At the eHumanities Research Group meeting held at the Dutch Meertens Institute on May 8th, Lora Aroyo presented ‘CrowdTruth for Digital Hermeneutics’, in which she explained some of the research she was involved with on event recognition in metadata of cultural heritage. This research is an outcome of the Agora project, in which Sound & Vision is also a partner.
Why are events important?
Events are highly important in contextualization of cultural heritage objects. Almost every object is associated with ‘when’, ‘where’, and ‘who’ type of information and having this event recognized will be of further help in assigning linked metadata to the object. Aroyo points out that the event recognition takes place in the metadata text associated with the object, not in the (audio)visual object itself. For the research two datasets of the Rijksmuseum and the Netherlands Institute for Sound & Vision were used.
What is the definition of an ‘event’?
One of the difficulties in this research is defining the concept of ‘the event’, Aroyo admits. The annotation-expert definition differs a lot from the lay person’s idea of what constitutes an event. In order to solve this problem, Aroyo has tried to ‘crowdsource’ the meaning of event, looking at the similarities but also where the differences were in the interpertation of events, in order to map the different perspectives of the crowd. For the research it was decided that the ‘event’ comprised four different aspects:
(1) Event type: Purpose - Arriving or departing - Motion - Communication - Usage - Judgment - Leadership - Succes or failure - Sending or receiving - Action - Attack - Political - Other
(2) Location type: Geographical (e.g. continent, region, country, city, state)
(3) Time type: Before - During - After - Repetitive - Timestamp - Date - Century - Year - Week - Day - Part of day - Other
(4) Participants type: Person - Organization - Geographical Region - Nation - Object - Other
How is event recognition crowdsourced?
In order to identify events in textual metadata, some texts were selected and users were asked to assign event information to this data. Aroyo mentioned two useful platforms for the crowdsourcing task:
Amazon Mechanical Turk: Mturk is a platform which calls itself ‘a marketplace for work’. As a worker, you can select surveys or automated tasks (such as annotation tasks) for money. As a requester, you can offer these annotation tasks and pay for people’s labour.
Crowdflower (currently known as Appen): A similar platform dealing with the crowdsourcing of tasks like data categorization and enhancement, but also things like content creation and image moderation. Crowdflower was the platform used in the 2010 Haiti disaster relief when over 12,000 text messages where translated and geotagged by crowdsourcing and sent through to local aid organisations.
One obvious problem in (paid) crowdsourcing annotation tasks for cultural heritage is the relatively large amount of spam, which can be recognized by nonsensical input in the ‘Other’ text fields and is then consequently filtered out.
What did the tasks look like?
The users on the platforms were shown an example image and accompanying text, and were then asked to perform a small annotation task:
Important outcomes and conclusions
Aroyo stresses the importance on looking at semantic disagreements in the events and trying to understand them in order to set up automatic processes for event regocnition in cultural heritage objects. Another very important aspect is the community management and maintaining strong links with useful annotation communities.
The outcomes from this research will also be implemented within a follow up project called DIVE, by building a user friendly interface which will allow historical event-based research and exploration. The collections used are the Delpher database (newspapers) from the National Library of the Netherlands and the catalogue from the Netherlands Institute for Sound & Vision.
More information:
Lora Aroyo’s slideshare
LoraAroyo.org
Crowdsourcing in the Cultural Heritage Domain: Opportunities and Challenges (Johan Oomen, Lora Aroyo)
Image credits:
Simon Fokke (1753), ‘De uit Slaverny verloste Manschap van 't zelve, komt aaen op 's Lands Scheeps-Timmerwerf te Amsteldam’ - Collectie Rijksmuseum