10/10/2014: Together with Danny Bickson (GraphLab), John Hannon (Boxfish) and Hassan Sayyadi (Comcast Labs DC) I am organizing the First  Workshop on Recommendation Systems for Television and Online Video (RecSysTV) on October 10th, 2014, as part of the 2014 RecSys conference. Please check out the workshop web page for more information about the topic and the exciting program. We hope to see you there!

7/2013: I presented a talk at the yearly meeting of the National Cable and Television Association (NCTA) on how to combine TV usage statistics with social signals such as Twitter for improved popularity prediction of TV Shows. Here is the link to the paper. (Also, see here for an interesting take on our use of random forests :-) http://www.multichannel.com/blog/translation-please/best-ncta-s-tech-papers-part-2/373214)

8/31/2010: During the summer of 2010 I led a research team at the summer workshop of the Center for Language and Speech Processing at Johns Hopkins University. For 6 weeks we worked on combining text and video analysis to identify and localize complex actions in broadcast videos. You can find more detailed information about our project and our results here.


Research Manager at Comcast Labs, Washington, DC, a research lab which is part of CoMPASS (Comcast Meta-data Products and Search Services).

The best way to reach me is via email at either
jan_neumann AT cable DOT comcast DOT com (work) or jankneumann AT gmail DOT com (private).

Research Overview

Research Manager, Comcast Labs DC (2013 - now)

Since 2013, I manage the video content analysis group within Comcast Labs DC, but also expanded my focus to work on novel algorithms and product prototypes in the areas of personalized media recommendations,  large-scale machine learning and big data analysis, specifically focusing on applications to combine video content analysis with big data within Comcast.

Lead Researcher, StreamSage/Comcast (2009 - 2013)

For the last couple years I have been working as a Lead Researcher as part of CoMPASS (Comcast Meta-data Products and Search Services, see this news article for some additional information) where I research novel audio/video information retrieval and content discovery technologies to help millions of households discover video and music content on their TV, PC, Phone, and Mobile devices.
Before that.  I focused on computer vision and machine learning problems related to Search and Discovery of Multimedia Content in various domains such as news, sports, and entertainment.  We specifically focussed on video summarization and chaptering, object and person recognition, and activity classification in premium broadcast video using multi-modal approaches that combined methods from video, speech, sound and natural language processing.

Siemens Corporate Research (2004 - 2008)

At Siemens  I worked from August 2004 until December 2008 as a Research Scientist for the Real-time Vision and Modeling Department of Siemens Corporate Research (Princeton, NJ) and the Information & Communication Division of Siemens Corporate Technology (Munich, Germany). I mainly worked on pedestrian detection as part of a night vision system for cars (video of a comparison of our system against BMW and Daimler-Chrysler - in German) and for surveillance applications (see my CVPR 2007 paper).

Doctoral Research at the University of Maryland, College Park (1997 - 2004)

During my graduate studies in the Computer Vision Lab at the University of Maryland (Advisor: Yiannis Aloimonos) I studied the geometry and statistics of visual space-time, i.e. representations of 3D shape and movement that can be extracted from images. This work had applications in many areas of computer vision and graphics, for example it lead to
•  a better understanding of the geometric structure of the space of light rays ,
•  the design of novel image sensors, and the development of new vision algorithms that utilize the special properties of these new sensors (example 3D structure from motion),
•  new approaches to capture and analyze the 3D shape and motion of non-rigidly moving humans and objects,
also take a look at the 3D Photography Challenge that we organized as part of the Second International Symposium on 3DPVT (3D Data Processing, Visualization, and Transmission)
•  and methods rooted in geometry and statistics to track independently moving objects in videos.

More detailed information about my research can be found on my research overview page and in my publications.

While this site is under construction, you can find more information about my thesis work at my old university web page here.

Selected Publications

(see Publications for the full list)

  1. Recognizing Manipulation Actions in Arts and Crafts Shows using DomainSpecific Visual and Textual Cues.
    Benjamin Sapp, Rizwan Chaudhry, Xiaodong Yu, Gautam Singh, Ian Perera, Francis Ferraro, Evelyne Tzoukermann, Jana Kosecka and Jan Neumann.
    In 3rd International Workshop on Video Event Categorization, Tagging and Retrieval for Real-World Applications (VECTaR 2011 in conjunction with ICCV 2011), Nov 2011
    (link to published paper) (link to poster) 

  2. Language Models for Semantic Extraction and Filtering in Video Action Recognition.
    Evelyne Tzoukermann, Jan Neumann, Jana Kosecka, Cornelia Fermuller, Ian Perera, Frank Ferraro, Ben Sapp, Rizwan Chaudhry and Gautam Singh
    In Language-Action Tools for Cognitive Artificial Agents: Papers from the 2011 AAAI Workshop (WS-11-14), Aug 2011
    (link to published paper)

  3. Predicate Logic based Image Grammars for Complex Pattern Recognition.
    Vinay D. Shet, Maneesh Singh, Claus Bahlmann, Visvanathan Ramesh, Jan Neumann and Larry Davis.
    International Journal of Computer Vision (IJCV), Special Issue on Stochastic Image Grammars (In Press), 2010
    (Link to preprint)

  4. Bilattice based Logical Reasoning for Human Detection.
    Vinay D. Shet, Jan Neumann, Visvanathan Ramesh and Larry S. Davis.
    IEEE Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, USA, 2007
    (link to preprint) (link to conference poster)

  5. Microphone Arrays as Generalized Cameras for Integrated Audio Visual Processing.
    Adam O' Donovan, Ramani Duraiswami and Jan Neumann.
    Oral presentation at IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, USA, 2007
    (link to preprint) (slides with supplemental materials) (commercialization: http://www.visisonics.com/)
    (project web page) (movie .wmv)  (movie2 .wmv)
  6. Compound Eye Sensor for 3D Ego Motion Estimation.
    Jan Neumann, Cornelia Fermüller, Yiannis Aloimonos and Vladimir Brajovic.
    IEEE International Conference on Robotics and Automation, 2004, Vol. 4, Pages: 3712- 37174
    (link to published paper)
    (link to preprint)

  7. A hierarchy of cameras for 3d photography.
    Jan Neumann, Cornelia Fermüller, and Yiannis Aloimonos.

    Computer Vision and Image Understanding, Volume 96, Issue 3 , December 2004, Pages 274-293,
    Special issue on model-based and image-based 3D scene representation for interactive visualization
    (Link to published paper) (Link to preprint)

  8. Plenoptic video geometry.
    Jan Neumann and Cornelia Fermüller.
    Visual Computer, Volume 19, Number 6, Pages 395-404, October 2003.

    (Link to published paper)
    (Link to preprint)

  9. Spatio-temporal stereo using multi-resolution subdivision surfaces.
    Jan Neumann and Yiannis Aloimonos.
    International Journal of Computer Vision, 47(1/2/3):181-193, 2002.
    (Link to published paper) (Link to preprint)

"The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man." G.B. Shaw