Breitenstein M.D.,ETH Zurich |
Reichlin F.,LiberoVision AG |
Leibe B.,RWTH Aachen |
Koller-Meier E.,ETH Zurich |
And 2 more authors.
IEEE Transactions on Pattern Analysis and Machine Intelligence | Year: 2011
In this paper, we address the problem of automatically detecting and tracking a variable number of persons in complex scenes using a monocular, potentially moving, uncalibrated camera. We propose a novel approach for multiperson tracking-by-detection in a particle filtering framework. In addition to final high-confidence detections, our algorithm uses the continuous confidence of pedestrian detectors and online-trained, instance-specific classifiers as a graded observation model. Thus, generic object category knowledge is complemented by instance-specific information. The main contribution of this paper is to explore how these unreliable information sources can be used for robust multiperson tracking. The algorithm detects and tracks a large number of dynamically moving people in complex scenes with occlusions, does not rely on background modeling, requires no camera or ground plane calibration, and only makes use of information from the past. Hence, it imposes very few restrictions and is suitable for online applications. Our experiments show that the method yields good tracking performance in a large variety of highly dynamic scenarios, such as typical surveillance videos, webcam footage, or sports sequences. We demonstrate that our algorithm outperforms other methods that rely on additional information. Furthermore, we analyze the influence of different algorithm components on the robustness. © 2011 IEEE.
Novosad L.,Czech Technical University |
Ziegler R.,LiberoVision AG
Proc. of the IADIS Int. Conf. - Computer Graphics, Visualization, Computer Vision and Image Processing, CGVCVIP 2010, Visual Commun., VC 2010, Web3DW 2010, Part of the MCCSIS 2010 | Year: 2010
Virtual scenes used in state-of-the-art computer games and animated movies appear as realistic as possible by using the latest graphics algorithms and hardware available. This work aims to improve the realism in the way the scene is perceived by the user. Usually a pinhole camera is placed into the scene and the projected image is presented to the viewer. We want to improve the image which is presented to the user by simulating an eye, which is divided into different layers. Each one of these layers provides individual features. The approach is based on wave optics and is able to simulate effects of refraction, diffraction, high-dynamic range lighting, and depth-of-field. This simulation is implemented by using an NVidia CUDA device with its GPGPU capabilities and unified shader architecture. Our framework offers a simple interface which only requires access to the frame buffer and depth buffer. As a consequence it may be plugged into existing engines in a straightforward manner as a simple extension. © 2010 IADIS.
Liberovision Ag | Date: 2011-07-22
What is disclosed is a computer-implemented image-processing system and method for the automatic generation of video sequences that can be associated with a televised event. The methods can include the steps of: Defining a reference keyframe from a reference view from a source image sequence; From one or more keyframes, automatically computing one or more sets of virtual camera parameters; Generating a virtual camera flight path, which is described by a change of virtual camera parameters over time, and which defines a movement of a virtual camera and a corresponding change of a virtual view; and Rendering and storing a virtual video stream defined by the virtual camera flight path.
Liberovision Ag | Date: 2012-04-02
A method of processing image data includes providing an image sequence such as a video sequence, or a camera transition, identifying a region-of-interest in at least one image of the image sequence, defining a transition region around the region-of-interest and defining a remaining portion of the image to be a default region or background region, applying different image effects to the region-of-interest, the transition region and the background region.
Germann M.,ETH Zurich |
Popa T.,ETH Zurich |
Ziegler R.,LiberoVision AG |
Keiser R.,LiberoVision AG |
Gross M.,ETH Zurich
Proceedings - 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, 3DIMPVT 2011 | Year: 2011
We propose a data-driven, multi-view body pose estimation algorithm for video. It can operate in uncontrolled environments with loosely calibrated and low resolution cameras and without restricting assumptions on the family of possible poses or motions. Our algorithm first estimates a rough pose estimation using a spatial and temporal silhouette based search in a database of known poses. The estimated pose is improved in a novel pose consistency step acting locally on single frames and globally over the entire sequence. Finally, the resulting pose estimation is refined in a spatial and temporal pose optimization consisting of novel constraints to obtain an accurate pose. Our method proved to perform well on low resolution video footage from real broadcast of soccer games. © 2011 IEEE.