Recommender & Data Mining Playground

Student Project and Bachelor Thesis

A Recommendation Engine aims at filtering the most relevant items from the set of all items. Top-N Filtering means that the resulting collection of elements does not contain the whole data set, but only the first N elements of an ordered list. Therefore, the best recommendations only will be finally offered to the user. For instance, the 10 most watched videos or the 3 products with the best average rating will be presented to an anonymous user.

In Content Based Filtering an element is just compared to another element by considering their content information on meta data. So an item for instance can be very similar or dissimilar to another item. Therefore, each attribute has to be examined and compared to the adequate attributes of the other elements.

Collaborative Filtering (CF) is the most common approach for Web 2.0 technologies. The simple comparison of elements is extended by data on consumer behavior. Thus, a recommendation engine is able to predict items based on characteristics of other users. In CF the user is related to items – often by a value, for instance, the rating as numerical or a Boolean value for “like it" or “don't like it" or even the time a user spends watching this item. 


Your tasks:

You will implement a simple reference app (native Android, iOS or a web app) that collects data in everyday situations. Your main task is to develop different data mining approaches in order to automatically adapt this app to the users' needs. Thereby, you will need to analyze use cases, select appropriate approaches and implement these algorithms. Examples of approaches are:

  • Association Rules
  • Sequential Rule Mining 
  • Learning User Preferences
  • Rating Prediction (e.g. Slope One)
  • Pattern Recognition (SVM, Neural Networks, etc.)


Required skills:

  • Good programming/ prototyping skills in web technologies, such as HTML, JS, CSS or native app SDKs
  • Knowledge of server-side programming languages, such as Java, and common relational and/ or NoSQL database technologies
  • Optional: High-Level Understanding of Data Mining/ Recommendation Engines
  • Creative ideas, analytical skills and autonomous acting


Related Technologies:

  • Recommendation Engines
  • Predictive Data Mining


Related FAME Projects:


Contact:

Christopher Krauß 

Developing Linked Data Applications

Student Project

The nature of the World Wide Web has evolved from a web of linked documents to a web including Linked Data. Traditionally, we were able to publish documents on the Web and create links between them. Those links however, allowed us only to traverse the document space without understanding the relationships between the documents and without linking to particular pieces of information.

Linked Data allows us to create meaningful links between pieces of data on the Web. The adoption of Linked Data technologies has shifted the Web from a space connecting documents to a global space where pieces of data from different domains are semantically linked and integrated to create a global Web of Data. Linked Data enables operations to deliver integrated results as new data is added to the global space. This opens new opportunities for applications such as search engines, data browsers, and various domain-specific applications.

Over the course of this project the students will develop a system leveraging or supporting the field Linked Data. All details of the project will be defined together with the students depending on the size of the group, shared interests, technical skills, and the background. The group will go through the entire process of developing a working solution, from understanding and defining a problem, trough designing a solution, to developing, and finally testing a system. Students will be assisted by a supervisor at all stages of the project. 

Contact:

Marcin Wylot

Analysis and Suggestion: How User Interfaces in Recommendation Engines (should not) effect Personal Behavior

Student Project (1-3 participants) and Bachelor Thesis

The rapid development of computing power, the increasing convergence of multimedia and the progressively growing influence of social media services leverage start-ups and established providers to create new innovative services based on big data. These services stand and fall with the users' feedback, in terms of consumption data, comments, likes or ratings, no matter if it is a physical product or a service offering. Data Analysts try to process these vast quantity of data in order to get an all-encompassing knowledge about the concerned customers, the offerings and its usage. The gained information can be used for internal market research purposes, for instance to optimize the services and products as well as to plan the launch of new offerings. Moreover, it allows to create personalized services filtering the best fitting offer from a variety of available items to make the service usage more comfortable. 

FAME introduced an approach that allows for obtaining personalized recommendations for TV programs, the TV Predictor. It can be displayed directly on Internet-enabled SmartTVs or as a web application on multi-screen devices, such as SmartPhone, tablet, PC and Laptop. Since this recommendation engine is reusable in different contexts targeting the same domain of TV programs, we compared the mined data and made preliminary inferences on the psychological impact of user interfaces and recommendation engine related influence coefficients. As a result, we identified totally different rating scatters within the mined data.

Your tasks:

Your task is to analyze possible influence factors (e.g. effect of GUI elements, such as rating scales, top-n items vs. an unsorted list, choice of algorithms etc.), integrate a simple recommendation engine and prototype alternatives for specific components in order to conduct A/B-testing, measurements, surveys and interviews (e.g. think aloud). At the end, a criteria catalogue shall result in dos and don'ts for recommendation engines.

Required skills:

  • Good programming/ prototyping skills in a language of your choice
  • Optional: High-Level Understanding of Data Mining/ Recommendation Engines
  • Creative ideas, analytical skills and autonomous acting

Related technologies:

  • Recommendation Engines
  • Predictive Data Mining


Related FAME Projects:


Contact:

Gamification for Leveraging Recommendation Engines in Digital Learning Environments

Student Project (1-5 participants) and Master Thesis

Considering the faster technical improvement and the more extensive guidelines, standards and laws employed persons have to study further continuously from the time of graduation to the beginning age of retirement. The educational offer requires a strict schedule and learning discipline by the employed half-time students. A straight consequence of the lack of time is a short-time, exam-oriented learning strategy.

Our recommender system predicts learning objects and thus, extends Learning Management Systems (LMS). It focuses on a blended-learning approach for universities, chamber of crafts and adult education centers. Thereby, students can keep track of their individual predicted knowledge level on different learning objects at every point in time and get personalized learning recommendations based on the determined learning need value.


Your tasks:

In order to leverage the learner's intrinsic and extrinsic motivation, you shall choose specific Gamification elements for the integration into an existing Learning Companion Application. Thereby, each element shall be analyzed in theory and evaluated with real participants.

Required skills:
  • Good programming/ prototyping skills in web technologies, such as HTML, JS, CSS
  • Knowledge of server-side programming languages, such as Java, and common relational and/ or NoSQL database technologies
  • Optional: High-Level Understanding of Data Mining/ Recommendation Engines
  • Creative ideas, analytical skills and autonomous acting


Related technologies:

  • Gamification
  • Recommendation Engines
  • Predictive Data Mining


Related FAME Projects:


Contact:


Processing of Contextual Data: Personalized Predictions through Ontology-based Algorithms

Student Project (1-3 participants) and Master Thesis

The processing of Big Data in modern internet services quite often faces a lack of describing information of its users and the presented services and products. However, the prognosis of the user's behavior through Data Mining Algorithms should be viewed as crucial. The missing data must be captured via semantic analyses, abstracted as reusable ontologies and finally optimized for the usage of recommendation algorithms.

In order to improve web services data is being collected from the presented services and products as well as from the users and their way of conduct. The gained information can be used among others for market analyses, for the improvement and optimization of the offer and the products as well as for the personalization and individualization of the services which are provided for the respective user. The main area of application of this project is the improvement of Recommendation Engines which will provide the personalized recommendation for users and which will ensure a more effective and shorter decision process for the customer. 

Known issues of recommendation algorithms are a result of the so called “Cold Start Problem". This issue is caused by a lack of sufficient data of users, items or the content, which are essential for the calculation of context-sensitive influences. Along with this comes the “Sparsity Problem" which also exposes the problem of recommendation systems which are being provided with too little information of user feedback such as likes, evaluations and views. As a consequent collaborative and knowledge-based filtering algorithms are unable of precise prediction which is causing a decline of the customer satisfaction. If beyond that there also is a lack of metadata, the calculation of similarities through content-based filtering algorithms is likely to fail as well. 

Most of the needed data exists in the World Wide Web, however, it is stored unstructured and in text form of external service providers. There are a few current projects which are analyzing external sources to gain data, however, they mainly concentrate on cleaning up metadata which have been already set up and abstracted by ontologies which are feeding the User-Item-Matrix. 

In order to offset the above cited problems, I would like to recommend this project which key research question is as follows: Is it possible to gain significant data through learning semantic analyses of text sources which then can be transformed into usable data for recommendation systems?

Your tasks:

The objective of this thesis/ project is the upgrade of recommendation algorithms from the restart to the permanent operation of the system. For this it is necessary to gain descriptive data which shall improve the comparability of users and items, and to collect and train data of interests. 

The preference ontologies shall contain the information of main keywords of the user's texts and the negative or positive tone of his writings. The gained data can now be directly transferred to the recommendation systems. For example, if the semantic analysis states that a user prefers a particular movie, similar items he might like as well will be indicated to him in the future. Alternatively, the respective existing metadata of the system captured via different preference filtering approaches or Machine Learning must be trained, for example users who wrote positive comments about their mobile phones, liked action-movies. 

Required skills:

  • Good programming/ prototyping skills in a language of your choice
  • Optional: High-Level Understanding of Ontologies/ Data Mining/ Recommendation Engines
  • Creative ideas, analytical skills and autonomous acting


Related technologies:

  • Recommendation Engines
  • Predictive Data Mining


Related FAME Projects:


Contact:

Knowledge Discovery in Discussion Threads: Mining of Topical Correlations and Semantic Associations in Social eLearning Forums

Student Project (2-5 participants) and Master Thesis

Considering the faster technical improvement and the more extensive guidelines, standards and laws employed persons have to study further continuously from the time of graduation to the beginning age of retirement. The educational offer requires a strict schedule and learning discipline by the employed half-time students. A straight consequence of the lack of time is a short-time, exam-oriented learning strategy.

University lectures often offer online discussion forums for students to discuss and solve issues with other students and instructors. Correlating the participation of a student in a discussion forum to his performance in the course is subject of current research.

Your tasks:

Your task is to extract meaningful information from the learner's texts in order to automatically predict his knowledge level on specific topics, that helps both: students to get better learning recommendations and to observe their knowledge as well as teachers to adjust their lectures.

The following investigation approaches and expected results will be mentioned:

  • Distinction between textual passages (Corpora) of domain-based content. 
  • Identification of the Act of Speech, for example: is it a question, review or summary. 
  • Analysis of the textual tone (Sentiment) with regard to if the author is content or discontent regarding his written text. 
  • Identification of semantic key words through the analysis of common generic terms (Hyperonyme).
  • Distance calculation of texts in order analyze content similarities.
  • The Set-up of ontologies and the processing through the recommendation systems. 

In addition, there are more research questions and research requirements concerning the objective:

  • Is it possible to automatically distinguish different languages (German and English)? Can an upgrade guarantee the capacity to add more languages? Can colloquial language be recognized and normalized as well?
  • Is it possible to develop the system distributable and scalable in order to work with real-time-analysis?
  • Will the translation of existing language databases in graph-databases lead to an improvement of the performance and the accuracy? 
  • Is it possible to reuse the system also for other fields of application, for example semantic search?

Required skills:

  • Good programming/ prototyping skills in a language of your choice
  • Optional: High-Level Understanding of Natural Language Processing techniques, such as Part of Speech Tagging, Speech Acts, Sentiment Analysis and Semantics
  • Optional: High-Level Understanding of Data Mining/ Recommendation Engines
  • Creative ideas, analytical skills and autonomous acting

Related technologies:

  • Natural Language Processing
  • Predictive Data Mining

Related FAME Projects:


Contact: