Abstract
Soccer video content-based analysis remains a challenging problem due to the lack of structure in a soccer game. To automate game and tactic analysis, we need to detect and track important activities such as ball possession in a soccer video that is highly correlated to the camera’s field-view. In this paper, we present a system that tracks the camera’s field-view in a soccer video in real-time. It utilizes a host of content-based visual cues that are obtained by independent threads running in parallel. The result is visualized as an active rectangular bounding box that approximates the camera’s field of view superimposed on a virtual soccer field. Experimental results show that the system can reliably track the camera field-view as the game progresses.
ICASSP
2003 Lecture paper (4-page) pdf
download
The following are screen-dump of our win32 executable running on a Dell Pentium 4-1.7Ghz, 128MB RAM. The video selected is a 80 second MPEG clip selected from a digitized video of the FIFA 2002 World-Cup final. There is a rapid movement of tactical play effectively spanning from one end of the soccer field to the other. The system is able to track the camera field-view and maintain focus on the play area with respect to the entire field.
Figure 1: Play starts near mid-field
Figure 2 Play moves back to left goal
Figure 3 Play is intercepted at mid-field, moves rapidly towards left field
Figure 4 Play moves towards right goal area
Figure 5 The ball is floated from the top-right towards the center-goal area
Application
The ability to track camera view translates to the ability to track the tactical flow of play on field. This is useful for play/tactics summarization for coaching and team review purposes. On the other hand, it also offers a computational mechanism to track the excitement level of the game, as we would normally associate the goal-mouth area to be visual cues to high excitement.
Abstract
We present an efficient soccer video annotation system that combines automatic meta-data extraction and indexing tools with standard-compliant components such as open source XML, database and web-based JSP. Video frames are first processed to segment the shot boundaries. Additional multi-modal shot attributes are then computed. For visual, we extract the camera field-view movement to track the location of play relative to the soccer field. For audio, we detect excitatory commentator speech segments. Together with the shot key-frame, we can quickly browse the video for segments of interest, using an annotation GUI to key in textual entries conforming to the MPEG-7 standard. A client web-browser can access the enriched content by: (1) browsing from a chronological index; (2) searching for video segments based on player or event name; (3) a game summary in video and text. Subjective tests show that using our annotation system over an alternative one, a higher quality of video skim is obtained.
ICME
2003 Demo (2-page)
pdf download
Screen-Dump
Annotation Visual-based Shot Boundary Detection in action.
This incorporates an awareness or sensitivity to the
soccer video domain in that it also detects the soccer field lines to ascertain
whether play has significantly shifted with respect to the soccer field. In
soccer content annotation, this is important because significant tactical play
can be discerned from such movement. Our method enhances over traditional
color-histogram based methods which do not work well here because of the
dominance of green field color in soccer video.
Annotation Audio-based Commentary Excitement Detector in action.
This
incorporates an audio-based processing module which detects raised pitch in the
voiced commentary. The prosodic movement in the commentary is an accurate cue
to highlight moments.
is then returned by the system, which can be browsed by the end-viewer. He can inspect and verify the validity of the automatically generate result and add more textual annotation for the final description.
has also been created also to facilitate video streaming and browsing. This uses standard-compliant JAVA, JSP and .NET components.
Application
The system can be used for automated highlight detection for breaking news-casting by broadcast studio. The semi-automatic annotation system can also be used by content provider to disseminate real-time text summary of game highlights (via SMS) and video summary (via video-phone).
Abstract
In
this paper, we describe a working system that detects and segments goal-mouth
appearances of soccer video in real-time. Processing on sub-optimal quality
images after MPEG-decoding, the system constrains the Hough Transform-based
line-mark detection to only the dominant green regions typically seen in soccer
video. The vertical goal-posts and horizontal goal-bar are then isolated by
color-based region (pole)-growing. We demonstrate its application for quick
video browsing and virtual content insertion. Extensive test over ~15 hours of
MPEG-1 soccer video shows the robustness of our method
ACM
MM 2003 Short paper (4-page) pdf
download
Application
There is a global following to the game of soccer. Advertising is an attractive business model for broadcasters for high-profile games. Virtual advertising is about augmenting the advertising portfolio of a company wishing to increase global awareness of its brand or trade mark. Virtual advertising can also be demographics sensitive, meaning that advertising can be customized to tailor to the needs of the local audience or even to the individual home-viewer. Reliable detection of the goal-mouth provides a viable way to achieve that.
Screen-Dump
left goal-mouth detection result (bmp) right goal-mouth detection result (bmp)
Virtual advertising example 1:
original goal-post image (bmp) inserted goal-post image (bmp)
Virtual advertising example 2:
original goal-post image (bmp) inserted goal-post image (bmp)
Demo
There
are also 4 MPEG videos to demonstrate the workings of the real-time system.
These are captured using a SONY-DSC1 camera. Though the quality is not so good,
you may still get a good visual of the system in operation. The game between
The first video
shows a left-goal detection. The second video shows a right-goal
detection.
The
next video shows a virtual message implanted above
the left goal-mouth. The last video shows a similar
implant above the right goal-mouth.
A
complete demo (in WMV format, 29MB) can also be downloaded here.
Abstract
We describe soccer highlight
generation from only the audio stream in the video. A novel audio feature is used
to detect parts of the commentary corresponding to dominant and excited speech.
It is computed by a twice-iterated Composite Fourier Transform (CFT) on
short-time windows, wherein the magnitude spectrum of the first transform is
input to a second transform. Dominant speech portions are found to be robustly
characterized by increased density in the peak profile. We verify the
robustness of CFT via large scale empirical testing and explain its working
based on a pulse train postulate of dominant speech signal. Our audio-only
approach results in a compute-efficient system deployable on current generation
set-top-boxes and digital video recording devices.
ICME
2004 (Poster paper, 4-page) pdf
download
Abstract
We
describe efficient audio/visual features and their multimodal combination to
detect highlights in soccer video. A novel audio feature first detects dominant
speech portions in the commentary coincident with segments of high excitement
in the game. Verification is then performed in the visual domain by detecting
the presence of goal-mouth in the current shot and a high frequency of camera
shot change in the subsequent shots. The cascaded process filters spurious
candidate highlights from the noisy audio. The impressive results obtained on a
large video test-set belie the technical simplicity in the system, which may
now enable rapid generation of highlights on low-cost devices such as household
set-top-boxes.
ICPR 2004 Lecture
paper (4-page) pdf download
Application
·
Quick Soccer Video Highlights Browsing on
current generation low cost Set-top-boxes
·
Applicable on current generation Personal Video
Recorders (PVR)
Euro 2004 fever was throughout
1.
Manually set the Happauge
MPEG-1 video capture program to start and end recording at the scheduled
broadcast time. Note that a fully automatic system requires that the broadcast
programming schedule be exported on EPG.
2.
One of the team members wake up at the end of the
recording time to do the following:
a.
Use a MPEG-video editor to remove the half-time,
full-time advertisements. (We use the publicly available TMPGENC software)
b.
Run the detection software on the edited soccer-only
video.
c.
Retrieve the highlight segments and export to the GUI visualizer
We did
this for the Quarter-finals, Semi-finals and of course the Finals. A typical
work order looks something like this:
·
Set end of
recording to 5:30am (given that play may go into overtime, eg,
the QF game England vs Portugal lasted about 2.5hr)
·
·
Run detection software on new
MPEG file
To expedite processing, the
audio-only detection module is invoked (ICME 2004 paper), which runs at ~4x
video fps, ie, ~100fps
Meanwhile, go brush up and have breakfast!!
·
Rush to office
·
·
Some screen dumps:
Abstract
We describe novel methods to automatically augment content into video highlights
detected from soccer and tennis video. First, we extract generic and domain-specific
features from the video to isolate key audio-visual events that we have empirically
found to correlate well with the ground-truth highlights. Next, based on a set of
heuristics-driven rules to minimize view disruption, spatial regions in the image
frames of these video highlight segments are segmented for content augmentation.
Preliminary trials from subjective viewing indicate a high level of acceptance
for the content insertions.
Non-intrusive Content Insertion Framework
Homogeneous Region Segmentation
Samples of Static Regions Detected
Example of Dynamic Insertions using Color Quantization
PCM 2004 Invited
paper (4-page) pdf download
A
useful application of our sports highlight is in a mobile scenario, where
viewers who cannot catch up with the live action on TV may still be alerted to
interesting and note-worthy events that has happened in the game.
It is
within the wireless scenario that we incorporate the business case of Virtual
Content Insertion. Our argument is that traditional advertising model in the
broadcast media in the form of 15/30sec commercial clip is not viable, since
air-time is still substantially expensive, and will likely remain so for the
near future. A better means of advertising, in our opinion, is via the
Non-intrusive VCI way. Though the exposure is limited and may merely constitute
a branding exercise, rather than a full-blown advertising run, that is exactly
what we advocate: that the wireless media appears to only allow a branding form
of advertising model, at least for the moment.
An
immediate quality-based model is obvious: premium subscribers may see only the
original highlight video without the insertions, while users with a basic plan
must view those insertions. Hopefully the non-intrusive insertion will minimize
viewer’s enjoyment of the video.
The
key technical challenges in this work are (1). the real-timeliness requirement;
and (2) the system will not have the benefit of the entire video in making
adaptive selection of highlight segments. Our brief foray into the wireless
video world brings us to the Nokia phone, where both 3GPP and Real-video
formats are supported. Our preliminary architecture is as follows:
Sample
Video Viewable on Nokia Phones
Soccer highlight video (with insertion) (RM
format, 50kbps)
Soccer highlight video (with insertion) (RM
format, 100kbps)
Abstract
The global appeal of sports content is widely expected to be a key
driver content for DTV interactivity. Its regular structures are amenable for
automatic analysis and semantics extraction, leading to significant
opportunities in interactive advertising and time-shift viewing. This paper reviews
the state-of-art in automatic sports content analysis and reports on some recent
R&D results in the Institute for Infocomm Research (I2R).
We also envisage “desktop set-top-boxes”, where PC-cards are fully
capable of receiving DTV signals. Coupled with superior Java UI support, the
desktop arena can become a conducive and cost-efficient environment to
facilitate rapid development, testing and deployment of new and compelling iTV services.
Speech at Broadcast
Asia 2004 (7-page pdf)
Abstract
While most current approaches for sports video
analysis are based on broadcast video, in this paper, we present a novel approach
for highlight detection and automatic replay generation for soccer videos taken
by the main camera. This research is important as current soccer highlight
detection
and replay generation from a
live game is a labor-intensive process. A robust multi-level, multi-model event
detection framework is proposed to detect the event and event boundaries from
the video taken by the main camera. This framework explores the possible
analysis cues, using a mid-level
representation to bridge the gap between
low-level features and high-level events. The event detection results and mid-level
representation are used to generate replays which are automatically inserted
into the video. Experimental results are promising and found to be comparable
with those generated by broadcast professionals.
ACM Multimeda 2004 (full paper, 8-page) pdf
download
The following is a
simple visual comparison of the kinds of Sports MTV generated by Muvee and by
the Media Analysis Lab.
|
MA Lab |
Muvee |
|
MA Lab |
Muvee |
|
MA Lab |
Muvee |
Contact
Kongwah WAN
(kongwah@i2r.a-star.edu.sg)