Doctor's thesis

"A Study of Video Content Generation Technology Using Text-to-Video"

Narichika Hamaguchi

Department of Information Processing, Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology

Professor Itsuo Kumazawa

Mar. 2011

Abstract:

This paper describes Text-to-Video technology which generates video content from text. Specifically, Text-to-Video is based on the TVML(TV program Making Language). TVML is script language and generates CG(Computer Graphics)-based video content from the TVML script. Since Text-to-Video itself is a neutral technology without specific purpose, this paper focuses on four major research targets. We consider the issue and requirements of each Text-to-Video application, and develop and evaluate the trial systems. By this approach, universal value of the Text-to-Video is revealed.

The first research target is to realize easy-to-use video UGC production system for amateur users. For amateur users, usability and user-friendliness is more important rather than the quality of video content. For these objectives, we have developed an internet TV system called TV4U that user can type a script just like blogs, publish it as video content, and viewed by others. The video content are divided into three components―script, production directions, and material data set―and each is conceived as a separate method for production and distribution. The user writes a script on a word-processor, chooses the prepared directions, and simply by linking materials, he can easily create a TV program, up-load it to a server, and release it for viewing.

The second research target is to develop a hyper-video system on the video UGC platform and appropriate representation manner for hyper-video. The most typical feature of the Internet is hyperlinks between user’s content. Thus, we have developed a hyper-video system using TV4U that users can link directly among user’s content. On this system, we set design requirements for representations of Hyper-Video UGC from analogy of hypertext, and we proposed “TV-like Hyper-Video representation” based on the requirements. Finally, we did user evaluation experiments about this representation method and achieved results that the proposed method was highly user-friendly.

The third research target is to realize an efficient video content production system for professionals using Text-to-Video. Video content, such as TV program, is required quality and detailed controllability as well as economic, time and human efficiency. To achieve this goal, we have developed a real-time CG character controller “Ad-lib system”. Because Ad-lib system operates CG character in real-time, it can have an ad hoc dialogue with real casts on the show. And you can edit or modify the behavior of the CG character on the production site because it is script-based system. In addition, real human voice can be used as a CG character’s voice with lip-sync. Since Ad-lib system can run on a lap top computer, it is compact and useful system.

The fourth research target is to develop a web service of Text-to-Video. The cloud computing trend is growing these days, real-time CG rendering on the server-side is not far future services. Therefore, we have developed live server-side CG rendering system using Text-to-Video. User can access Text-to-Video function via the Internet from anywhere without limitation of devices and software. Because the interface can be simple, this web service is used not only the simple Text-to-Video, but also any other mash-up service is available by connecting to other external services. We evaluated the system, especially scalability of the service and delay of playback, and revealed the future issues to solve.

Through these four reseahes, we clarify the value of Text-to-Video technology and future challenges.

Page updated

Google Sites

Report abuse