Research Area GenAI for Media

Content Retrieval & Generative AI

© Fraunhofer FOKUS
Multi Modal Content Understanding

AI-based course advisor: Structuring Unstructured Data with LLMs — © Fraunhofer FOKUS
Structuring Unstructured Data with LLMs

© Fraunhofer FOKUS
LLM-powered Assistants

The rise of Large Language Models (LLMs) popularized new approaches of retrieving content from different modalities based on Semantic Search. Instead of relying mostly on the match of concrete search words within a document, LLM-based Semantic Search strategies allow to search for the actual meaning of texts instead of a set of characters.

Gen AI applications allow for a flexible way of interacting with unstructured data, for example by retrieving semantically relevant parts of a document to enrich the prompt of an LLM call. This allows for new ways of retrieving relevant content based on Semantic Similarity, which enables us to build flexible chatbot-based and content retrieval system. Another technique focuses on influencing output of LLMs by restricting the generation space to a predefined set of rules with Guided Generation. This allows for generating structured search queries from user queries in natural language, or to annotate unstructured texts following rules set be data specifications.

Vision Language Models (VLMs) expend these ideas into the realms of image and video data and allow for new applications of multi model content understanding.

Usage & Needs Analytics

Usage Analytics & Needs Analytics

HbbTV Research Toolkit Live Statistics — © Fraunhofer FOKUS

Prior to content creation, an automated demand analyses and identification of success factors (in order to precisely meet the end consumer’s needs) is initialized.

A significant aspect of utilizing AI methods for media is to analyze media usage. A few exemplary stakeholders and use cases that benefit from our solutions are described in the following:

Content providers: broadcasters and/or publishers, for example, are required to be well-informed about their audiences in order to select appropriate advertisements or optimize the content curation.
Operators want to understand whether their services work well and if undesired behavior is shown under specific conditions.
Service providers evaluate user flows in order to optimize the service’s appearance and design.
Educational staff analyzes the learning activities of students in order to adapt lectures and media offerings to the learners’ needs.

Smart Audience Measurements

Standard broadcast measurement solutions satisfy the demand for digital and real-time solutions in order to measure linear broadcasting on an international level. This solution gathers and compares key performance indicators with those of Over-the-Top (OTT) and digital distribution methods. All tracking data is available in real-time and accessible via the web-based HbbTV Research Toolkit. Additionally, an HbbTV-based overlay app can be used to monitor the channel’s live performance within the broadcaster’s linear TV program. The Research Toolkit offers customizable visualizations for live and historical data, report templates, automated reports.

Related Solution: HbbTV Measurement and TV Research Tool

Social Network Analysis

Fraunhofer FOKUS introduces an approach that analyzes texts from third party service providers, such as social networks and website reviews. This approach enriches the content and user data of recommender systems, thus allowing optimized personalization. To go beyond the scope of direct user input from social media networks (like Facebook), where users can like, share or rate items, the system analyzes opinions (in terms of sentiments), extracted keywords and preferences in the social community.

Publication: Preference Ontologies based on Social Media for compensating the Cold Start Problem

Content Creation & Adding Meta-Data

Deep Video Analytics Screenshot interface — © Fraunhofer FOKUS
Deep Video Analytics

In the content creation phase, intelligent tools support such productions by recommending and suggesting improvements to the medium and its content.

An important aspect of this lifecycle also concerns content metadata, which, if chosen correctly, can save large amounts of data rates and as a result, computational expenses as well.

To analyze such content, we’ve gone beyond analyzing standard video attributes (such as resolution, framerate, etc.) and deep into unique features using AI techniques:

Similar scene detection: while detecting similar scenes can be obvious to the human eye, the ‘backend’ functionality of detecting similar scenes allows our researchers to determine patterns within a video content as well as its correlation with resulting encoding data.
Scene change detection: as a result of detecting similar scenes, scene change detection spots unique changes between scenes via color threshold parameters, fade-in/out frames, etc. This analysis is also highly correlated with content encoding requirements – since certain aspects of media content may require higher or lower bitrates than standard recommended bitrates, for example. And as a result, provide further insight into video characteristics that may not be obvious to the human eye.
Video metadata and audio content: examples of such analysis include speech-to-text and semantic keyword extraction. With such analysis, our researchers can map the resulting information with (open) data to enable semantic search. This type of ‘speech recognition’ technology has not only been proven beneficial for the media industry (in terms of saving large amounts of data rates) but made great strides in the general fields of Big Data and Deep Learning as well.
Through the exploration of computer vision algorithms, additional extracted features that highly correlates with content quality include spatial and temporal information, color saturation, contrasts, pixel intensities, image energy/entropy, and other aesthetic qualities. More information to follow.

Content Encoding & Storage

AI and Machine Learning: Deep Encode Encoding Storage — © Fraunhofer FOKUS

After its creation, video streaming content needs to be encoded, which, in terms of complexity, requires title-specific encoding settings to achieve a certain visual quality.

Video streaming content differs in terms of complexity and requires title-specific encoding settings to achieve a certain visual quality.

Per-Title, Scene and Context-aware Encoding

The FAMIUM Deep Encode tool leverages different machine learning techniques in order to avoid computationally heavy test encodes. These techniques include regression and classification algorithms such as decision trees and Deep Neural Networks. Through constant validation and retraining of our models, the accuracy and efficiency of the algorithms are improved with each input and the entire system learns on its own.

Several current and future machine learning activities within the media context include:
Real-time predictions of bitrate, storage size and video quality (e.g., VMAF, PSNR) for incoming content
Prediction of appropriate encoding settings (e.g., ABR, 2-pass, etc.)
In-picture region of interest recognition

Offering & Discovery

A set of different approaches can be utilized in order to obtain predictions of the users' behaviors and preferences. Our researchers utilize state-of-the-art techniques and novel high-quality algorithms (based on recommender system techniques) to personalize media web services.

Adaptive systems are designed to adapt to its presentation layer, offered content selection or navigation support – in order to support customers’ needs. This feature is called “personalization”, because each user interacts with his/her personal user interface. In order to understand user demands and preferences, data must be collected and analyzed that best represents user behaviors. Based on this information, offerings can then be personalized for individuals.

Currently, most users are not able to grasp all offerings of a media web service at once – nor in a lifetime. A recommender system helps its users in deciding which products or media that may serve as any interest. Therefore, a set of different approaches can be utilized in order to obtain predictions of the users' behaviors and preferences.

To develop such systems, our researchers utilize state-of-the-art techniques and novel high-quality algorithms (based on recommender system techniques) to personalize media web services.

TV Predictor

In order to generate the optimal and most accurate recommendations, the recommendation system, “TV Predictor”, combines the best fitting algorithms via hybrid-switching, cascading or merging. The usage of these algorithms depends on the user’s request:

Find TV programs similar to the selected one by using common content-based filtering and unsupervised learning algorithms.
Retrieve program highlights for a specific time period based on the favorite programs of similar users and program rating predictions.
Calculate a personalized program guide by changing the channel automatically. The TV Predictor uses clustering techniques and rating predictions to identify programs that best fit the user’s interests.

Publication: TV predictor: personalized program recommendations to be displayed on SmartTVs

Smart Learning Dashboard — © Fraunhofer FOKUS

Adaptive Learning Platforms

Fraunhofer FOKUS develops Adaptive Learning Platforms, such as the Smart Learning App, where students use AI-powered components to keep track of their personal predicted knowledge level regarding various learning objects at any point in time and receive personalized learning recommendations to overcome individual learning weaknesses. In addition to content metadata, such as exam relevance, lecture times and prerequisites, adaptive platforms take several factors into account for each student and learning object:

User self-assessments
User interactions with the content
Assessment exercise performances
Forgetting curves
Classmate learning progress.

At the same time, teachers can make use of this data to get an overview of the students’ overall progress and keep up with any potential knowledge gaps.

Related Working Area: Learning Technologies

Content Delivery & Playback

© Fraunhofer FOKUS
Broadcast Probing System – Cloud based Monitoring of DVB-T/T2 networks

Screenshot FAMIUM SAND Dashboard — © Fraunhofer FOKUS

The last steps of the media life cycle focus on an optimal content delivery and the best user experience.

Broadcast Quality Assurance

Digital service broadcast suffers from misconfigurations at the source and constant dynamic changing of signal propagation conditions. The lack of feedback leads to undisclosed service degradation and decreased user experience. Our Broadcast Probing System offers Cloud-based real-time monitoring of broadcast networks by utilizing massively distributed low-cost probes. Whether controlled individually or in groups, the probes are securely instructed to execute scheduled jobs, like scanning, tuning and transport stream inspection.

Related Solution: Broadcast Probing System

Intelligent Video Analytics

SAND metric reporting enables video players (e.g., MPEG-DASH) to provide streaming performance information like average throughput, buffer level, representation switch events and initial playout delay (QoE metrics defined in ISO/IEC 23009-1). SAND-shared resource allocation allows network components to control how much bandwidth a client should use. This feature is useful in scenarios in which multiple DASH clients share the same network and compete for available bandwidth (e.g., stadiums, trains, airplanes, etc.). Artificial Intelligence algorithms support the technical staff in analyzing vast amounts of mined data, observe video playback patterns and produce notifications about outliers and anomalies of particular clients.

Data Science

Data is the New Currency

© Fraunhofer FOKUS
Artificial Intelligence Venn diagram

In order to improve web and media services, information from aforementioned services and products, users and their way of conduct is crucial for general understanding of usage, business analytics, improvement and overall optimization of the offer as well as personalization and individualization of the services.

In order to improve web and media services, data is collected from the aforementioned services and products, users and their way of conduct. The accrued information is crucial for general understanding of usage, business analytics, improvement and overall optimization of the offer as well as personalization and individualization of the services. Without the knowledge of users and behaviors, the current media industry would not survive. However, humans are not capable of manually processing and analyzing all the data. Therefore, software components may assist the human stakeholders, provide hints and suggestions for improvements and even make (semi-)automatic decisions.

A set of concepts and terms arose in the past few years: Big Data, Data Science, Artificial Intelligence, Recommender Systems, Machine Learning, Artificial Neural Networks and Deep Learning – just to mention a few. While the meaning of the term “Big” in Big Data changes with the advance of time and technology, the other terms have some common definitions:

Data Science is an academic discipline that processes data in order to extract knowledge.
Artificial Intelligence, in contrast, describes the concepts and methods of computer systems that perform tasks of humans or to imitate intelligent human behavior.
Machine Learning is a sub-class of Artificial Intelligence that improves its operations over time by learning from the data.
Artificial Neural Networks and Deep Learning are specialized methods within the Machine Learning discipline.

A typical Data Science Project

Fraunhofer FOKUS’ AI for Media research team is heavily experienced in realizing data-driven projects for different industries, produces end-to-end solutions and focuses on applicable data science topics.

The most important aspect of data science and engineering is defining valuable research questions that form the foundation for further data analysis. Fraunhofer data scientists help users/customers in understanding the core questions and assist in transferring business requirements into technical approaches. If it so happens that the customer data is readily available, the data itself can then be cleaned, structured, and integrated.

Next, different algorithms can be applied, for example, regression (e.g., adding new numerical data for forecasts) or classification (labeling data points). Therefore, Fraunhofer researchers develop and apply techniques for pattern recognition and prediction tasks through supervised and unsupervised learning through the usage of different flavors of Neural Networks, Support Vector Machines, Decision Trees, Random Forests, Gradient Boosted Decision Trees and many more.

These methods are then used to recognize and learn data patterns, which are then applied to unknown situations.

The training and optimization of ML models is enabled through our end-to-end machine learning pipeline, which includes automatic training, hyperparameter tuning and model serving. The pipeline builds on Tensorflow (enabling GPU and TPU usage) and produces accurate and real-time predictions. Results are automatically evaluated with the help of well-specified evaluation frameworks that can be adjusted to the customers’ needs. An evaluation framework defines the way data is fed to the algorithms in a comparable and reliable way, as well as its quality, accuracy and performance.

Finally, the results and algorithmic conclusions are presented to various stakeholders. Therefore, the data is first reprocessed, simplified, explained and then visualized to present pertinent results that appropriately answer the research questions – which is quite beneficial for the persons of interest in a given situation.

Fraunhofer FOKUS’ AI for Media research team is heavily experienced in realizing data-driven projects for different industries, produces end-to-end solutions and focuses on applicable data science topics.

Artificial Intelligence and Machine Learning in the Media Sector

More Information

The Media Life Cycle

Tabbed contents

Content Retrieval & Generative AI

Content Retrieval & Generative AI

Usage & Needs Analytics

Usage Analytics & Needs Analytics

Content Creation & Adding Meta-Data

Content Encoding & Storage

Content Encoding & Storage

Offering & Discovery

Offering & Discovery

Content Delivery & Playback

Content Delivery & Playback

Data Science

Data is the New Currency

A typical Data Science Project

Further AI and Machine Learning Activities

Contact Press / Media

Dr.-Ing. Christopher Krauß