Projects

Current projects

Contextual advertising

The aim of this project is to develop a set of statistical algorithms matching content type of a searched page with the most appropriate add. The goal is to dynamically display to users ads to maximize the click through rate. This is a standard web problem researched by many online content providers. The novelty is in focusing on the new set of generative algorithms. To reach this goals the ad-network, which acts as the mediator between the publisher and advertiser, has to select the best ads to place in a web page, where by the best is meant the measure of relevance of the ad to the content of the particular page.The current state of the art algorithms are based on topic models and as recent development in the field of contextual advertisement shows there are still ways to improve these model to build up a better advertising system. The project focus is to build upon these models our own ad-network system that will be consequently used by the Czech company Seznam, which makes a unique opportunity to measure the performance and quality of the system in real-word application, which (as such) to our knowledge has not been published before.

The current state of the art algorithms are based on topic models  and as recent development in the field of contextual advertisement shows there are still ways to improve these model to build up a better advertising system. The project focus is to build upon these models our own ad-network system that will be consequently used by the Czech company Seznam, which makes a unique opportunity to measure the performance and quality of the system in real-word application, which (as such) to our knowledge has not been published before.

Team: Tomáš Tunys


Named Entity Recognition

This project tries to classify words in a sentence into predefined categories of entities. Such entity could be e.g. City, Name, Business brand or Landmark. Here is the example:

INPUT: Profits soared at Boeing Co., easily topping forecasts on Wall Street, as their CEO Alan Mulally announced frst quarter results.

OUTPUT: Profits soared at [Company Boeing Co.], easily topping
forecasts on [Location Wall Street], as their CEO [Person Alan Mulally]
announced first quarter results.

Our approach is supervised machine learning problem, thus labeled training set is required. We take advantage of the  www.freebase.com for Czech language. We will start with HMM for part-of-speech tagging, classification approach etc. The resulting model will be used for entity recognition in Seznam.cz's S-klik advertisement system to increase the quality of the current algorithms.

Team: Antonín Novák 


The Gamers Behaviour Analysis

In this project we are developing a statistical model to spot repeated patterns of users behaviour  We use data from a computer pool live tour game made by company Geewa. The game is simple: user plays a pool game against an opponent for virtual money. The winners are advancing to higher levels challenging more experienced players.  The task is to find common patterns in players’ behaviour  For example: at what time they proceed to higher level, what kind of opponents they prefer, when they buy better cue, when users exchange the virtual money for real dollars or the other way around etc. The purpose of the analysis is to suggest a statistical model helping to modify the game to increase the monetization.

Team: Ondřej Pluskal, Ondřej Borovec, Vladimír Kunc


Automatic Synonym Creation

Synonyms are words with the same or similar meanings. They are part of speech (such as nouns, verbs, adjectives, adverbs or prepositions), as long as both words are the same part of speech.

Synonyms are important for Natural Language Processing. For example in full text search one of the relevant signals for finding a relevant web page is the existence of the query words or similar words in the document. To achieve this the search query is rewritten using all possible synonyms to make the search as broad as possible.

The task in this project is to generate lists of synonyms from the text corpuses. We will try to adapt corpus based methods together with methods for solving textual similarity problems. The second very challenging task is the definition or criteria depicting how useful are the selected synonyms for successful search. 

Team: Tomáš Veselý


Learning to rank algorithms

Learning to rank is one of the modern machine learning algorithms used in many Natural Language Processing taks. In particular it is used for ranking the web pages in the web search. We want to design and test an algorithm for ranking URLs by relevance given a user search query The project will follow these milestones:

  • Analyze the most important learning to rank  algorithms adequate for the task

  • Analyze and describe applicable precision measures for this task

  • We will provide feature vector data sets with manually assigned ranked results for training and testing

  • Analyze the training data and the feature vector signals, estimate signals discriminative power

  • Implement the chosen algorithm

  • Test and evaluate the designed algorithm


team: Marek Modrý


People search

  • In this project we are developing methods for searching people by name and their affiliation. The aim is to develop a java based package for including to server as well as client people search apps. We will proceed following the next steps.
  • Basic functionality will use the simplest SQL DB supported search, implementing the Czech language specifics. Than the simple DB synchronization, delta updates will be implemented.
  • Advanced search will allow users to limit the search to people from one location.
  • Spelling correction - statistical algorithm will apply appropriate spelling correction algorithm.
  • Suggestion - statistical algorithm applying smart functionality like suggestion based on phonetic similarity, affiliation of the user, location etc.
  • Finally we will implement simple speech recognition to allow search by voice

Team: 
Tomáš Pikous, Tonda Novák


Eucalyptus

Eucalyptus is an open-source cloud operating system that allows to deploy a private IaaS cloud with the industry standard Amazon API on basically any Linux x86 hardware. At our university, we have built one such cloud from workstations in several computer classrooms. It serves us for experiments with the cloud environment as well as to leverage the computing power of the classroom machines for scientific computations of some other research groups. Now we are testing similar systems to determine the best upgrade path.

Team: Tomáš Vondra
Project home page, blog


Cloud Gunther - Tame Cloud with Fun

SaaS application for automated management and administration of a cloud platform. Application allows to put computational tasks to a queue and execute them automatically. The application is written in Ruby On Rails and uses the Amazon Web Services API. It makes the application compatible with all clouds with this de facto standard API. We use this application for experiments in image processing with Eucalyptus free cloud operating system.

Team: Tomáš Vondra
Project home page, blog


Kos Android Client

In this project we are designing and implementing an Android client for accessing the CTU Information system. The first phase will offer the user to display the academia staff and students profiles and the information about courses, indulging the signed in course. Second the application will allow to follow the most important university portals. The FIT team is implementing the KOS API, which is for us the basic interace to KOS information system. 

Team: Dušan Jenčík, Adam Šimek 



Cold-standby Cloud Server Backup

Everyone is talking about hybrid clouds, but few have seen them. This project aims to bring something that can fall under the designation „hybrid cloud“ to the smallest businesses, which are running their webserver built of commodity hardware at the office on an unreliable internet line. This setup is very cheap. If the server is reasonably loaded, it is cheaper than the public cloud. However to bring high availability, you’d have to double your investment. If the projects succeeds, even small companies will be able to afford 99.9% uptime through the use of a public cloud cold-standby server.

Team: Tomáš Vondra

Project home page, blog



TremAn Parkinson's Disease - tremor detection

In this project we build on a PhD work of Zdenka Sturcova and cooperate with Proff. Evzen Ruzicka. The original work developed image processing analysis of patient's tremor from a 20 sec. long video. The original program was developed for a Windows desktop and works in a batch mode. In this project we are converting the program to a real time version for tablets. As soon as the doctor points on a patient, the program shows the initial estimate of the tremor frequency. Next step is to enhance the program for estimating the tremor amplitude.

Team: Tomas Vesely


The Magic Cloud Provisioner

Tell it what you want to run in the cloud and it will tell you how many instances you need, what the cost will be, and what's more, it will run and configure the instances for you. At least that's the idea. The ingredients will be performance models of popular web programming language and database combinations, price lists of some cloud providers and an automatic server configuration system like Opscode Chef.

Team: Tomáš Vondra
Project home page, blog


CloudSim

A lot of articles on clopud computing are using the CloudSim framework to create the evaluation section of their articles. The simulator allows to prove theories about infrastructure, which is too large to build in a lab. We decided to harness the power of this simulator, prove its accuracy by comparing it to a lab setup of manageable scale and then simulate our autoscaling algorithms.

Team: Tomáš Vondra, Juan Francisco Munoz Castro
Project home page, blog