• k-Connect

    k-Connect

    KConnect provides medical-specific multilingual text processing services, consisting of semantic annotation, semantic search, search log analysis, document classification and machine translation.

    The overall objective of the KConnect project is to create a medical text Data-Value Chain with a critical mass of participating companies using cutting-edge commercial cloud-based services for multilingual Semantic Annotation, Semantic Search and Machine Translation of Electronic Health Records and medical publications. The commercial cloud-based services will be the result of productisation of the multilingual medical text processing tools developed in the Khresmoi FP7 project, allowing wide adoption of these tools by industry. The critical mass will be created by the KConnect Professional Services Community, which will consist of at least 30 companies by the end of the project. These companies will be trained to build solutions based on the KConnect Services, hence serving as multipliers for commercial exploitation of the KConnect services.The KConnect project will facilitate the straightforward adaptation of the commercialised services to new languages by providing toolkits enabling the adaptation to be done by by people having a software engineering skillset, as opposed to the rarer language engineering skillset. The KConnect services will also be adapted to handle text in Electronic Health Records, which is particularly challenging due to misspellings, neologisms, organisation-specific acronyms, and heavy use of negation and hedging. The consortium is driven by a core group of four innovative SMEs following complementary business perspectives related to medical text analysis and search. These companies will build solutions for their customers based on KConnect technology. Two partners from the medical domain will use KConnect services to solve their medical record analysis challenges. Two highly-used medical search portal providers will implement the KConnect services to innovate the services offered by their search portals. Through these search portals, the KConnect technologies will be used by over 1 million European citizens before the end of the project.

  • Self-Optimizer

    Self-Optimizer

    The goal of Self -Optimizer is to develop a tool that automatically without manual intervention optimizes semantic matching performance to a specific text query. This provides a new effective approach for a company to rely on an automatic process for retrieving prior art documentation and efficiently support decisions in their information workflows. Hence, the handling of inventions and R&D projects will be run with better quality control along with lower costs and time savings.

    Self-Optimizer is a Eurostars funded project with the aim of supporting the innovation process for industries active in various technical domains, e.g. telecommunications and pharmaceutical domains. Within the Self-Optimizer project, Uppdragshuset Sverige AB (UH)and Technische Universitaet Wien (TU Wien) will develop a tool which automatically optimizes indexes customized to a specific technical field. During a trial period, UH has used a manual approach for establishing sub-collections which reflect a specific technical domain. This manual approach has been successful but is very time consuming and needs to be converted into an automatic process. The automatic process will build upon the years of experience of text mining patent text at the Institute of Software Technology and Interactive Systems at the TU Wien. From 2009 to 2013, the CLEF-IP (text retrieval in the intellectual property domain) evaluation task was organized by TU Wien. CLEF-IP was launched to investigate and evaluate Information Retrieval techniques for patent retrieval.

    UH was founded in 2000 with a passion for the art of finding and which has continuously been developed. The company has subsequently grown and widened their range of services and products. Today, UH offers analysis of company strategies based upon intellectual property rights, high-quality advice concerning risks and possibilities in the innovation process, and development of smart administration systems that support innovation processes.

    In Self-Optimizer, we will combine the TU Wien expertise in patent text mining and information retrieval with the UH experience in supporting companies’ innovation processes with different types of searches (e.g. prior art search, freedom to operate, invalidity search, etc.). We will thereby enhance the traditional Boolean search method predominant in patent search with search technology based on automatic query formulation, language modelling and statistical semantics. Official Website

  • MUCKE - Multimedia and User Credibility Knowledge Extraction

    MUCKE - Multimedia and User Credibility Knowledge Extraction

    The project addresses multi modal search in Web and Social Media

    Web3.0 has already appeared in the public vocabulary over 5 years ago. While its definition remains unclear, what has become clear in the last half decade is that the web has become a support for social media. Directly from cameras, phones, tablets or computers, users are pushing multimedia data towards their peers and the world at large. MUCKE addresses this stream of multimedia social data with new and reliable knowledge extraction models designed for multilingual and multimodal data shared on social networks. It departs from current knowledge extraction models, which are mainly quantitative, by giving a high importance to the quality of the processed data, in order to protect the user from an avalanche of equally topically relevant data. It does so using two central innovations: automatic user credibility estimation for multimedia streams and adaptive multimedia concept similarity. Credibility models for multimedia streams are a highly novel topic, which will be cast as a multimedia information fusion task and will constitute the main scientific contribution of the project. Adaptive multimedia concept similarity departs from existing models by creating a semantic representation of the underlying corpora and assigning a probabilistic framework to them. The utility of these two innovations will be demonstrated in an image retrieval system. Extensive evaluation will be performed in order to assess the reliability of the extracted knowledge against representative datasets. Additionally, a new, shared evaluation task focused on user credibility estimation will be proposed. The two core innovations rely on innovative text processing, image processing and fusion methods. Text processing will concentrate on tasks such as word sense disambiguation, concept recognition and anaphora resolution. Image processing will include parsimonious content description, large scale concept detection and detector robustness. Multimedia fusion will focus on a flexible combination of text and image modalities based on a probabilistic framework. All proposed methods will be designed to take advantage of the structural properties of the social networks. Particular focus will be placed on the proposition of scalable algorithms, which cope with large-scale, heterogeneous data.