The extraction of information stored in large amounts of documents is predominantly based on the vector space based approach developed in the early days of information retrieval. This approach, however, is struggling with serious problems, such as the problem of low precision: the user has to search through lots of irrelevant documents that the search engine has returned in order to find the piece of information he or she actually needs. A related problem is that the user interface of typical search engines usually only support search based on keywords, which is a serious restriction on the form of possible queries.
The search method based on "ontology-capsules" that we propose builds on a deeper and more general interpretation of the concept of search as well as on thorough considerations of the characteristics of man-machine synergy. In every act of search, the user has a preliminary idea concerning the information sought-for which is much richer and more complex than a mere bunch of keywords. This preliminary knowledge is semantic in nature, more or less well-structured and, naturally, schematic. Retrieval based on ontology capsules thus relies on the semantic competence of the person doing the search to a much grater extent than other search engines do. As opposed to the prevailing paradigm, in searching with the help of ontology capsules the search engine is given a partial (local) description of the sought-for piece of information in the form of an ontology fragment that the user creates as a model of what he or she is interested in. This ontology fragment will then guide the actual search, and with the help of linguistic heuristics it makes it possible to find much more accurate textual information than is possible by the prevailing methods.
Another characteristics of the system, due to its use of ontology fragments, is that it aims to go beyond the world of simple "type-in-the-search-box" interfaces, and offer an interactive interface to the user built on psychological principles that characterize human thinking.
The system to be realized as a prototype will search in documents (primarily: technical patents) written in English and Hungarian, yet, the core of the technology will comprise a language-independent operation package that enables the system to handle documents written in any other language, provided that the appropriate linguistic resources are available.