You are here:

Synchronizing Multilingual Wikis

CoSyne is a content synchronization system that assists users and organizations to maintain multilingual wikis. CoSyne allows users to explore the diversity of multilingual content, using a monolingual view: The system lets users who read a wiki page in one language view overlapping and non-overlapping information that is automatically translated from linked pages in other languages. CoSyne provides suggestions for content modification based on additional or more specific information found in other language versions, and enables seamless integration of automatically translated sentences. 

Meanwhile, it gives users the flexibility to edit, correct and control eventual changes to the wiki page. To support these tasks, CoSyne employs state-of-the-art machine translation and natural language processing techniques. Additional information is displayed to maintain the coherence of the page. More specific information is displayed as an alternative to existing content. The system highlights sentences that it suggests to add or replace, and users may accept, ignore or correct these suggestions.

The system is developed by a European R&D project that was introduced in an earlier blog post. The Netherlands Institute for Sound & Vision is a partner of the project and is, as an end-user, responsible for the integration and evaluation of the software. 

Multilingual System

The system is a distributed web application. A central server handles user requests, provides the business logic, and communicates with the following components:

  1. Structural analysis of wiki pages2, 8;
  2. Classification of user edits from page revision histories1;
  3. Cross-lingual textual entailment3, 4, 5;
  4. Context-sensitive machine translation6, 9.

The system is deployed at different research centers (as illustrated below). The integration server and the test wikis are deployed at the Netherlands Institute for Sound & Vision. The project's languages include Bulgarian, Dutch, English, German, Italian and Turkish.


Wiki Gadget

For wiki users, CoSyne is a gadget that can be activated via the user’s preferences page. The gadget communicates with the web application and provides the following features:

  • Explore: per-sentence display of overlapping and non-overlapping information from other language versions;
  • Suggest: highlighting of insertion points for additional information and sentences that may be replaced by more specific information from other language versions;
  • Edit: seamless integration of automatically translated sentences from other language versions, flexible editing and correction by users, standard preview and saving of changes to the wiki page.

The system considers three page revisions as input:

  • Target: the page that the user reads;
  • Source: a linked page written in another language;
  • Earlier source: optionally, an earlier revision of the source page, relevant when users synchronize pages on a regular basis.

Structural analysis identifies concepts and topics in order to align source and target content. It also provides an ordering of sentences for selecting insertion points on the target page.
If an earlier source revision is present, the two source revisions are compared and changes are classified as factual edits or fluency edits. 

Cross-lingual textual entailment determines whether a source sentence is equivalent, more specific, less specific or unrelated to a target sentence. Context-sensitive machine translation provides a suitable source translation for the given target context. The system may suggest to add a translated source sentence at a particular place on the page, to replace a target sentence with a translated source sentence, or to keep a target sentence as is.

Interface

The interactive user interface allows users to:

  • invoke the synchronization process and explore its results;
  • integrate translated sentences, edit and correct them;
  • preview the modified version and save changes to the wiki;
  • provide online feedback on system suggestions and on translation quality.

A simple synchronization example is illustrated below:

References

  1. Bronner, A., and Monz, C. User Edits Classification Using Document Revision Histories. In Proc. EACL (2012). 
  2. Fahrni, A., Nastase, V., and Strube, M. HITS’ Graph-based System at the NTCIR-9 Cross-lingual Link Discovery Task. In Proc. NTCIR-9 (2011). 
  3. Mehdad, Y., Negri, M., and Federico, M. Towards Cross-lingual Textual Entailment. In Proc. NAACL HLT (2010), 321–324.
  4. Mehdad, Y., Negri, M., and Federico, M. Using Blingual Parallel Corpora for Cross-lingual Textual Entailment. In Proc. ACL HLT (2011). 
  5. Mehdad, Y., Negri, M., and Federico, M. Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents. In Proc. ACL (2012).
  6. Monz, C. Statistical Machine Translation with Local Language Models. In Proc. EMNLP (2011), 869–879. 
  7. Monz, C., Nastase, V., Negri, M., Fahrni, A., Mehdad, Y., and Strube, M. CoSyne: a Framework for Multilingual Content Synchronization of Wikis. In Proc. WikiSym (2011), 217–218.
  8. Nastase, V., Strube, M., Brschinger, B., Zirn, C., and Elghafari, A. WikiNet: a Very Large Scale Multlingual Concept Network. In Proc. LREC (2010).
  9. Yahyaei, S., and Monz, C. Decoding by Dynamic Chunking for Statistical Machine Translation. In Proc. MT Summit XII (2009), 160–167.