Research & Development EN blog
First batch upload to Wikimedia Commons using the GWToolset
Recently, the Netherlands Institute for Sound and Vision has finished uploading a batch of video’s to Wikimedia Commons using the GLAMwiki Toolset. The GWToolset is still under development and only in recent months have individuals and organizations begun using it. Sound and Vision would therefore like to report on their experience using the toolset.
The first upload using the GLAMwiki Toolset concerned about 500 videos from birds in the Netherlands, generously donated by the Stichting Natuurbeelden (Foundation for Nature Footage). The GLAMwiki Toolset, developed by Dan Entous from Europeana, allows cultural heritage organizations to effectively upload (parts of) their collections in batch to Wikimedia Commons, the media repository of the Wikimedia Foundation. These collections can then be used in articles on Wikipedia.
In order for a batch upload using the GWToolset to succeed some preparation is required:
Step 1: The files for upload have to be stored on a publicly accessible server.
For Sound and Vision, the already active platform Open Images (Open Beelden) served as the platform for publication. In order to upload batches of new items from the Sound and Vision catalogue to Open Images more effectively a python-script was created (by Themistoklis Karavellas) which uses a .csv list of TaakID’s (the Sound and Vision identifier) as input. It then gathers the meta-data from the Elastic Search engine of Sound and Vision. This meta-data is then being parsed and the corresponding item on the media host is added.
Step 2: A flat xml with the meta-data must be generated.
The Open Beelden platform works together with a system that gives open access to the metadata of its content. This is the OAI-PMH (The Open Archives Initiative Protocol for Metadata Harvesting), which was used for fetching the metadata added to Wikimedia. A lot of the meta-data in the hierarchical file XML output file of OAI-PMH is not necessary for the upload to Wikimedia Commons. A python-script was created which enables the flattening and ‘trimming’ of xml files. (First the normal XML, then the flattened XML)
Step 3: Permission for using the GWtoolset must be obtained.
The GWToolset is a powerful tool that, if used badly, can cause chaos on Wikimedia Commons. It is therefore required to ask for permission to use it both in the Beta (test) environment and in the production environment.
a) Commons Beta server: contact a developer or bureaucrat on beta to request the rights for the GWToolset user group on beta. You can ask in the commons IRC channel or contact them from these lists:
b) Commons Production server: leave a message on the Commons notice board to request rights for the GWtoolset. Please introduce yourself and motivate your request.
The upload must be mentioned on the partnership page of Wikimedia Commons. This is in order for the community to keep track of what is happening with the GWToolset.
Optional but recommended steps
- Subscribe to the GWToolset mailinglist
- Create a partner-template
- Create an institution-template
- A decision must be made on which categories to add to the batch upload. For more information on which categories to add, see this page.
- Prepare a mapping of the meta-data to one of the templates on Wikimedia Commons.
Step 4: Stimulate reuse
- Make mention of larger batch uploads in the appropriate places
a) The Village Pump on Wikimedia Commons
b) The Village Pumps of different Wikipedia Chapters depending on the language of the content.
c) Notify your local Wikimedia Chapter
d) On the project-pages of running projects of Wikimedia Chapters or community projects (in the case of these videos of birds a notice was posted to the portal for WikiLovesEarth)
e) Any relevant third parties
- Organize an editathon or other event to bring your collections under the attention of the Wikimedia community
- Set up a contest with small prizes that can be won by people who use media from your collections (example of contest page (in Dutch)).
Results from the first upload
Only two weeks ago we finalized the upload of over 500 videos of birds. The results have been fantastic. Before the new upload the category of the Foundation of Nature Footage on Wikimedia Commons contained 57 video's of which 12 were used in articles on Wikipedia. Only three days after the upload the category contains 587 items of which 253 had already been used in articles in both Dutch and other language versions of Wikipedia.
Working with the GWtoolset does require some preparation, but with the current workflow implemented Sound and Vision can now easily make more contributions to Wikimedia Commons. We are currently preparing an upload of over 1200 video's of newsreels from the '30s till '70s.
Natuurbeelden upload: The full collection can be found here.
People who use the items on the Dutch Wikipedia can win a number of small prizes. The contest page can be found here (in Dutch).
The scripts developed for editing the XML-file (OAI-PMH to GlamWiki flat) and the script for upload from catalogue to Open Images will soon be made available on Github
If you are interested in uploading video’s to Wikimedia Commons using the Open Images platform and infrastructure, feel free to contact Jesse de Vos at firstname.lastname@example.org to discuss the possibilities.