BalkaNet - Design and Development of a Multilingual Balkan WordNet

Home
Consortium
Steering committee
Workplan and packages
Resources
Publications
Links

Wordnets

This is the status of the final version of the monolingual Wordnets produced by the end of the project:

 
Bulgarian
Czech
Greek
Romanian
Turkish
Serbian
Synsets
21441
28456
18461
19839
14626
8059
Nouns
14174
21009
14426
13345
11059
5919
Verbs
4169
5155
3402
4808
2725
1803
Adjectives
3088
2128
617
852
802
324
Adverbs
9
164
16
834
40
13
Literals
44956
43918
24366
33690
20310
13295
Literal/Synset
2.1
1.54
1.33
1.7
1.39
1.65
BC1s
1218
1218
1218
1218
1220
1219
BC2s
3471
3471
3462
3471
3479
3469
BC3s
3827
3827
3825
3827
3794
1369
Domain Specific Synsets
2065
304
238
286
300
305
Balkan Specific Synsets
220
257
309
151
103
117
Language Specific Synsets
116
257
52
545
204
206
Language Internal Relations
28599
25683
24368
25885
19834
12787



BalkanSearch - BalkaNet Prototype Search Engine


BalkanSearch is an original Internet search engine which tries to benefit from Balkan languages' Wordnets. It differs from other search engines in that it offers linguistic information services. Moreover, it benefits from the conceptual structure of Wordnets in classifying web pages according to their content.

A working prototype of the search engine (under continuous improvement) can be accessed from here. Please check regularly for updates.


Systems & Tools


Systems and tools that were developed and used during the BalkaNet project.

Tool
Developer
Cilix
RACTI
E-Dictionairies Tools
UOA
ILReMaT
MATF
Lemmatizer for Greek
UOA
VisDic
FI MU
WMS
DBLAB
WordNet Validator
IBL-DCMB
WordNetBank
DBLAB

 

Cilix

Cilix is a prototype tool for virtual data representation. Via Cilix software the semantic lexical data stored in various Wordnets are graphically represented and can be easily managed and comprehended by the end user, even if the latter has no lexicographic expertise. The current version of Cilix enables the navigation within Wordnets' concepts and terms providing at the same time information on the terms' semantics, the conceptual ontologies to which they belong as well as terms' semantic relatedness to other terms stored in Wordnets. The contents of Cilix are being continuously updated and enriched and currently contains Wordnets developed mainly for Balkan and the English languages.

Download the latest version of Cilix

E-Dictionairies Tools

A set of tools for the extraction and processing of linguistic information from machine-readable dictionaries has been developed. More specifically, these tools can perform:

  • Extraction of POS-related information.
  • Extraction of linked lemmata and lemmata acting as compounds.
  • Synonyms and antonyms extraction.
  • Search for antonymic relations in the definitions of lemmata.
  • Search for semantic relations.


Download the latest version of E-Dictionairies Tools

ILReMaT - Integrated Language Resource Management Tool

Integrated Language Resource Management Tool (ILReMaT) supports the development of a wordnet in accordance with the EuroWordNet paradigm and enables its integration with other lexical resources. It has been designed as a complement to the VisDic software, a tool accepted by all participants in the BalkaNet project for wordnet development. It also enables the integration of a Wordnet with a bilingual word list during the development process, which can help in the translation of literal strings and in the checking of the existing cross-language relations.

Download the latest version of ILReMaT

Lemmatizer for Greek

The lemmatizer for the Greek language is a tool whose function is, when given as input a word in Greek, to analyze the word and to find its dictionary citation form. The lemmatizer has been used as the basis for the development of a tool that counts the occurrences of words in a greek corpus, in all their inflected forms. This tool given a number of texts in Greek creates a list giving the frequency of total occurrences of each word in the texts, regardless of the inflection type in which this word appears.

Download the latest version of Lemmatizer for Greek

VisDic - Visual Dictionary

VisDic is a graphical application for viewing and editing dictionary databases stored in XML format. Most of the program behaviour and the dictionary design can be configured. There are 6 types of view adaptable for each dictionary independently. VisDic was primarily developed for browsing and editing wordnets. You can configure this tool for any type of dictionary - monolingual, translational, thesaurus or just a plain corpora. VisDic allows to work with up to 10 dictionaries at the same time. The program is being developed on Linux systems using GTK libraries and is also cross-compiled to Windows by cygwin compiler. Now it is available for both Linux and Windows platforms.

Visit the VisDic home page

WMS - WordNet Management System

The WMS is a large scale, distributed, service-oriented Wordnet Management System that acts as the interconnection and communication liaison between a user and a number of interlinked Wordnets. Semantic information is being accessed through a distributed network of servers, forming a large-scale multilingual semantic network. WMS attempts to overcome the limitations of standalone Wordnet tools by enabling collaborative authoring and interconnection of individual cross-lingual Wordnets, and by providing developers with the means to facilitate the integration of Wordnet resources and services into third-party applications. WMS also defines a protocol for loosely coupling individual Wordnets, thus exhibiting openness and allowing of the handling of individual Wordnets with small cost. This distributed nature of the WMS paves also the ground to witness -with respect to multilinguality - network effects in Wordnets: a situation in which the utility of an individual Wordnet depends on the number of other Wordnets incorporated into the WMS.

Visit the WMS home page
(You can use Username: balkanet Password: balkanet or create a new account)

WordNet Validator

The WordNet Validator (WNV) is a Web-based system for validation (and correction) of WordNet completeness and consistency. The system works with the VisDic xml-file format. The WordNet Validator has the following main functions:

  1. Automatic correction of xml syntax
  2. Validation of WordNet completeness and consistency
  3. Search for a given synset
  4. Visualization of semantic trees

The WordNet Validator can be used in the practical work of constructing the monolingual WordNets of Balkan languages, as well as for evaluation of the completeness and consistency of different WordNets.

Visit the WordNet Validator home page

WordNetBank


WordnetBank is a system for the collaborative semantic annotation of the occurrences of a set of lemmas in suitably preprocessed sentences. It was designed as a corpus semantic annotation module and used for the evaluation of the validity and completeness of the senses of the Greek Wordnet. The users of the system are called to assign via a GUI one or more senses of the lemma found in the respective Wordnet to the occurrence of the lemma in the sentence, and if this isn’t possible then to specify the reason why. After a suitable number of sentences/lemma occurrences pairs are processed by the user, all the information gathered during the annotations are presented to the user on demand, who can then proceed to actually decide which actions have to be taken on the Wordnet (if any) using the tool of choice.

Download the latest version of WordNetBank


Textual Resources

 

George Orwell's 1984 Greek Corpus

In the framework of the BalkaNet a text corpus based on the greek translation of the text of George Orwell's Nineteen Eighty-Four has been created. More specifically the text has been fully aligned on the sentence level to the original English text and has then been annotated with morpho-lexical information and the citation form (lemma) of each word in the corpus, has been included next to each word.

For more information contact Harry Kornilakis (UOA) at



Back to Top