BalkaNet - Design and Development of a Multilingual Balkan WordNet
This is the status of the final version of the monolingual Wordnets produced by the end of the project:
BalkanSearch - BalkaNet Prototype Search Engine
A working prototype of the search engine (under continuous improvement) can be accessed from here. Please check regularly for updates.
Systems & Tools
Cilix is a prototype tool for virtual data representation. Via Cilix software the semantic lexical data stored in various Wordnets are graphically represented and can be easily managed and comprehended by the end user, even if the latter has no lexicographic expertise. The current version of Cilix enables the navigation within Wordnets' concepts and terms providing at the same time information on the terms' semantics, the conceptual ontologies to which they belong as well as terms' semantic relatedness to other terms stored in Wordnets. The contents of Cilix are being continuously updated and enriched and currently contains Wordnets developed mainly for Balkan and the English languages.
A set of tools for the extraction and processing of linguistic information from machine-readable dictionaries has been developed. More specifically, these tools can perform:
Integrated Language Resource Management Tool (ILReMaT) supports the development of a wordnet in accordance with the EuroWordNet paradigm and enables its integration with other lexical resources. It has been designed as a complement to the VisDic software, a tool accepted by all participants in the BalkaNet project for wordnet development. It also enables the integration of a Wordnet with a bilingual word list during the development process, which can help in the translation of literal strings and in the checking of the existing cross-language relations.
The lemmatizer for the Greek language is a tool whose function is, when given as input a word in Greek, to analyze the word and to find its dictionary citation form. The lemmatizer has been used as the basis for the development of a tool that counts the occurrences of words in a greek corpus, in all their inflected forms. This tool given a number of texts in Greek creates a list giving the frequency of total occurrences of each word in the texts, regardless of the inflection type in which this word appears.
VisDic is a graphical application for viewing and editing dictionary databases stored in XML format. Most of the program behaviour and the dictionary design can be configured. There are 6 types of view adaptable for each dictionary independently. VisDic was primarily developed for browsing and editing wordnets. You can configure this tool for any type of dictionary - monolingual, translational, thesaurus or just a plain corpora. VisDic allows to work with up to 10 dictionaries at the same time. The program is being developed on Linux systems using GTK libraries and is also cross-compiled to Windows by cygwin compiler. Now it is available for both Linux and Windows platforms.
The WMS is a large scale, distributed, service-oriented Wordnet Management System that acts as the interconnection and communication liaison between a user and a number of interlinked Wordnets. Semantic information is being accessed through a distributed network of servers, forming a large-scale multilingual semantic network. WMS attempts to overcome the limitations of standalone Wordnet tools by enabling collaborative authoring and interconnection of individual cross-lingual Wordnets, and by providing developers with the means to facilitate the integration of Wordnet resources and services into third-party applications. WMS also defines a protocol for loosely coupling individual Wordnets, thus exhibiting openness and allowing of the handling of individual Wordnets with small cost. This distributed nature of the WMS paves also the ground to witness -with respect to multilinguality - network effects in Wordnets: a situation in which the utility of an individual Wordnet depends on the number of other Wordnets incorporated into the WMS.
Visit the WMS home page
The WordNet Validator (WNV) is a Web-based system for validation (and correction) of WordNet completeness and consistency. The system works with the VisDic xml-file format. The WordNet Validator has the following main functions:
The WordNet Validator can be used in the practical work of constructing the monolingual WordNets of Balkan languages, as well as for evaluation of the completeness and consistency of different WordNets.
George Orwell's 1984 Greek Corpus
In the framework of the BalkaNet a text corpus based on the greek translation of the text of George Orwell's Nineteen Eighty-Four has been created. More specifically the text has been fully aligned on the sentence level to the original English text and has then been annotated with morpho-lexical information and the citation form (lemma) of each word in the corpus, has been included next to each word.