Time filter

Source Type

Beijing, China

Zhang N.,Ryerson University | Duan L.-Y.,Peking University | Li L.,CAS Institute of Computing Technology | Huang Q.,CAS Institute of Computing Technology | And 3 more authors.
ACM Transactions on Intelligent Systems and Technology | Year: 2012

Various innovative and original works have been applied and proposed in the field of sports video analysis. However, individual works have focused on sophisticated methodologies with particular sport types and there has been a lack of scalable and holistic frameworks in this field. This article proposes a solution and presents a systematic and generic approach which is experimented on a relatively large-scale sports consortia. The system aims at the event detection scenario of an input video with an orderly sequential process. Initially, domain knowledge-independent local descriptors are extracted homogeneously from the input video sequence. Then the video representation is created by adopting a bag-of-visual-words (BoW) model. The video's genre is first identified by applying the k-nearest neighbor (k-NN) classifiers on the initially obtained video representation, and various dissimilarity measures are assessed and evaluated analytically. Subsequently, an unsupervised probabilistic latent semantic analysis (PLSA)-based approach is employed at the same histogram-based video representation, characterizing each frame of video sequence into one of four view groups, namely closed-up-view, mid-view, long-view, and outer-field-view. Finally, a hidden conditional random field (HCRF) structured prediction model is utilized for interesting event detection. From experimental results, k-NN classifier using KL-divergence measurement demonstrates the best accuracy at 82.16% for genre categorization. Supervised SVM and unsupervised PLSA have average classification accuracies at 82.86% and 68.13%, respectively. The HCRF model achieves 92.31% accuracy using the unsupervised PLSA based label input, which is comparable with the supervised SVM based input at an accuracy of 93.08%. In general, such a systematic approach can be widely applied in processing massive videos generically. © 2012 ACM 2157-6904/2012/05-ART46 $10.00.

Discover hidden collaborations