REAL-TIME Camera FIELD-VIEW TRACKING IN SOCCER VIDEO

Abstract

Soccer video content-based analysis remains a challenging problem due to the lack of structure in a soccer game. To automate game and tactic analysis, we need to detect and track important activities such as ball possession in a soccer video that is highly correlated to the camera’s field-view. In this paper, we present a system that tracks the camera’s field-view in a soccer video in real-time. It utilizes a host of content-based visual cues that are obtained by independent threads running in parallel. The result is visualized as an active rectangular bounding box that approximates the camera’s field of view superimposed on a virtual soccer field. Experimental results show that the system can reliably track the camera field-view as the game progresses.

ICASSP 2003 Lecture paper (4-page)  pdf download

Screen-Dump

The following are screen-dump of our win32 executable running on a Dell Pentium 4-1.7Ghz, 128MB RAM. The video selected is a 80 second MPEG clip selected from a digitized video of the FIFA 2002 World-Cup final. There is a rapid movement of tactical play effectively spanning from one end of the soccer field to the other. The system is able to track the camera field-view and maintain focus on the play area with respect to the entire field.

Figure 1: Play starts near mid-field                                

          

Figure 2 Play moves back to left goal

Figure 3 Play is intercepted at mid-field, moves rapidly towards left field              

          

 

Figure 4 Play moves towards right goal area

 

 

Figure 5 The ball is floated from the top-right towards the center-goal area

 

Application

The ability to track camera view translates to the ability to track the tactical flow of play on field. This is useful for play/tactics summarization for coaching and team review purposes. On the other hand, it also offers a computational mechanism to track the excitement level of the game, as we would normally associate the goal-mouth area to be visual cues to high excitement.

 


 

AN EFFICIENT ANNOTATION SYSTEM FOR SOCCER VIDEO

Abstract

We present an efficient soccer video annotation system that combines automatic meta-data extraction and indexing tools with standard-compliant components such as open source XML, database and web-based JSP. Video frames are first processed to segment the shot boundaries. Additional multi-modal shot attributes are then computed. For visual, we extract the camera field-view movement to track the location of play relative to the soccer field. For audio, we detect excitatory commentator speech segments. Together with the shot key-frame, we can quickly browse the video for segments of interest, using an annotation GUI to key in textual entries conforming to the MPEG-7 standard. A client web-browser can access the enriched content by: (1) browsing from a chronological index; (2) searching for video segments based on player or event name; (3) a game summary in video and text. Subjective tests show that using our annotation system over an alternative one, a higher quality of video skim is obtained.

ICME 2003 Demo (2-page)  pdf download

 

Screen-Dump

Annotation Visual-based Shot Boundary Detection in action.

This incorporates an awareness or sensitivity to the soccer video domain in that it also detects the soccer field lines to ascertain whether play has significantly shifted with respect to the soccer field. In soccer content annotation, this is important because significant tactical play can be discerned from such movement. Our method enhances over traditional color-histogram based methods which do not work well here because of the dominance of green field color in soccer video.

 

 

 

 

 

 

 

 

 

 

 

 

 

Annotation Audio-based Commentary Excitement Detector in action.

This incorporates an audio-based processing module which detects raised pitch in the voiced commentary. The prosodic movement in the commentary is an accurate cue to highlight moments.

 

 

 

 

 

The Final Preview List

 is then returned by the system, which can be browsed by the end-viewer. He can inspect and verify the validity of the automatically generate result and add more textual annotation for the final description.

 

 

 

 

 

A web-based system

has also been created also to facilitate video streaming and browsing. This uses standard-compliant JAVA, JSP and .NET components.

 

 

 

 

 

 

 

Application

The system can be used for automated highlight detection for breaking news-casting by broadcast studio. The semi-automatic annotation system can also be used by content provider to disseminate real-time text summary of game highlights (via SMS) and video summary (via video-phone).

 


 

Real-time Goal-mouth Detection in MPEG Soccer Video

Abstract

In this paper, we describe a working system that detects and segments goal-mouth appearances of soccer video in real-time. Processing on sub-optimal quality images after MPEG-decoding, the system constrains the Hough Transform-based line-mark detection to only the dominant green regions typically seen in soccer video. The vertical goal-posts and horizontal goal-bar are then isolated by color-based region (pole)-growing. We demonstrate its application for quick video browsing and virtual content insertion. Extensive test over ~15 hours of MPEG-1 soccer video shows the robustness of our method

ACM MM 2003 Short paper (4-page)  pdf download

ACM MM 2003 Demo pdf download

 

Application

There is a global following to the game of soccer. Advertising is an attractive business model for broadcasters for high-profile games. Virtual advertising is about augmenting the advertising portfolio of a company wishing to increase global awareness of its brand or trade mark. Virtual advertising can also be demographics sensitive, meaning that advertising can be customized to tailor to the needs of the local audience or even to the individual home-viewer. Reliable detection of the goal-mouth provides a viable way to achieve that.

 

Screen-Dump

left goal-mouth detection result (bmp)                                                    right goal-mouth detection result (bmp)

           

 

Virtual advertising example 1:

original goal-post image (bmp)                                          inserted goal-post image (bmp)

                         

Virtual advertising example 2:

original goal-post image (bmp)                                          inserted goal-post image (bmp)

                    

 

Demo

There are also 4 MPEG videos to demonstrate the workings of the real-time system. These are captured using a SONY-DSC1 camera. Though the quality is not so good, you may still get a good visual of the system in operation. The game between Germany and Brazil in the FIFA-2002 world-cup final is used. In each video, the left screen shows the original video playing, and the right screen shows the processed video. For goal mouth detection, we demonstrate its segmentation by just showing the pixels of the goal-line, the 2 vertical upright and the horizontal cross-bar. For VCI, the system displays a virtual message implanted above the horizontal cross-bar.

The first video shows a left-goal detection. The second video shows a right-goal detection.         

The next video shows a virtual message implanted above the left goal-mouth. The last video shows a similar implant above the right goal-mouth.

A complete demo (in WMV format, 29MB) can also be downloaded here.

 

 


 

Robust Soccer Highlight Generation with a Novel Dominant-Speech Feature Extractor

Abstract

We describe soccer highlight generation from only the audio stream in the video. A novel audio feature is used to detect parts of the commentary corresponding to dominant and excited speech. It is computed by a twice-iterated Composite Fourier Transform (CFT) on short-time windows, wherein the magnitude spectrum of the first transform is input to a second transform. Dominant speech portions are found to be robustly characterized by increased density in the peak profile. We verify the robustness of CFT via large scale empirical testing and explain its working based on a pulse train postulate of dominant speech signal. Our audio-only approach results in a compute-efficient system deployable on current generation set-top-boxes and digital video recording devices.

ICME 2004 (Poster paper, 4-page)  pdf download

 

                                                                                                                                                                                          

Efficient Multimodal Features for Automatic Soccer Highlight Generation

Abstract

We describe efficient audio/visual features and their multimodal combination to detect highlights in soccer video. A novel audio feature first detects dominant speech portions in the commentary coincident with segments of high excitement in the game. Verification is then performed in the visual domain by detecting the presence of goal-mouth in the current shot and a high frequency of camera shot change in the subsequent shots. The cascaded process filters spurious candidate highlights from the noisy audio. The impressive results obtained on a large video test-set belie the technical simplicity in the system, which may now enable rapid generation of highlights on low-cost devices such as household set-top-boxes.

ICPR 2004 Lecture paper (4-page)  pdf download

 

Application

·        Quick Soccer Video Highlights Browsing on current generation low cost Set-top-boxes

·        Applicable on current generation Personal Video Recorders (PVR)

 

                                                                                                                            

An In-house Live Trial of Euro 2004 Soccer Highlight Extraction @ I2R

Euro 2004 fever was throughout Singapore in June this year. We wanted to take the opportunity to trial the soccer highlight detection work we have done so far. Unfortunately, we did not yet integrate the detection software with a live MPEG video capture card. Moreover, our detection software requires that only soccer video be input to the system. Ie, we need to manually filter out the half-time advertisements on the live broadcast stream. So we have to settle for a work-flow that requires manual intervention:

1.     Manually set the Happauge MPEG-1 video capture program to start and end recording at the scheduled broadcast time. Note that a fully automatic system requires that the broadcast programming schedule be exported on EPG.

2.     One of the team members wake up at the end of the recording time to do the following:

a.     Use a MPEG-video editor to remove the half-time, full-time advertisements. (We use the publicly available TMPGENC software)

b.     Run the detection software on the edited soccer-only video.

c.     Retrieve the highlight segments and export to the GUI visualizer

We did this for the Quarter-finals, Semi-finals and of course the Finals. A typical work order looks something like this:

·        10:00pm:    Set start of recording at 2:30am  (typical local start time in Singapore of the live game played in Portugal)

                             Set end of recording to 5:30am (given that play may go into overtime, eg, the QF game England vs Portugal lasted about 2.5hr)

·          5:30am:    Manually remove advertisements.

·          6:00am:    Finished ad-removal. (The TMPGENC takes time to write a new MPEG file)

          Run detection software on new MPEG file

          To expedite processing, the audio-only detection module is invoked (ICME 2004 paper), which runs at ~4x video fps, ie, ~100fps

          Meanwhile, go brush up and have breakfast!!

·          7:30am:    Detection completed. Export results to GUI viewer.

                             Rush to office

·          8:15am:    Hook up to the I2R plasma TV.

·          8:30am:    First few staff arrives in office, to view the highlight results

 

Some screen dumps:

   

 

                                                                                                                            

Automatic Sports Highlight Extraction with Content Augmentation

Abstract

We describe novel methods to automatically augment content into video highlights detected from soccer and tennis video. First, we extract generic and domain-specific features from the video to isolate key audio-visual events that we have empirically found to correlate well with the ground-truth highlights. Next, based on a set of heuristics-driven rules to minimize view disruption, spatial regions in the image frames of these video highlight segments are segmented for content augmentation. Preliminary trials from subjective viewing indicate a high level of acceptance for the content insertions.

Non-intrusive Content Insertion Framework

                                          Homogeneous Region Segmentation

                                          Samples of Static Regions Detected

                                          Example of Dynamic Insertions using Color Quantization

PCM 2004 Invited paper (4-page)  pdf download

                                                                                                                            

Mobile Sports Alert

A useful application of our sports highlight is in a mobile scenario, where viewers who cannot catch up with the live action on TV may still be alerted to interesting and note-worthy events that has happened in the game.

It is within the wireless scenario that we incorporate the business case of Virtual Content Insertion. Our argument is that traditional advertising model in the broadcast media in the form of 15/30sec commercial clip is not viable, since air-time is still substantially expensive, and will likely remain so for the near future. A better means of advertising, in our opinion, is via the Non-intrusive VCI way. Though the exposure is limited and may merely constitute a branding exercise, rather than a full-blown advertising run, that is exactly what we advocate: that the wireless media appears to only allow a branding form of advertising model, at least for the moment.

An immediate quality-based model is obvious: premium subscribers may see only the original highlight video without the insertions, while users with a basic plan must view those insertions. Hopefully the non-intrusive insertion will minimize viewer’s enjoyment of the video.

The key technical challenges in this work are (1). the real-timeliness requirement; and (2) the system will not have the benefit of the entire video in making adaptive selection of highlight segments. Our brief foray into the wireless video world brings us to the Nokia phone, where both 3GPP and Real-video formats are supported. Our preliminary architecture is as follows:

 

Sample Video Viewable on Nokia Phones

Soccer highlight video (with insertion) (RM format, 50kbps)

Soccer highlight video (with insertion) (RM format, 100kbps)

 

 

 

 

Automatic Sports Content Analysis – State-of-Art and Recent Results

Abstract

The global appeal of sports content is widely expected to be a key driver content for DTV interactivity. Its regular structures are amenable for automatic analysis and semantics extraction, leading to significant opportunities in interactive advertising and time-shift viewing. This paper reviews the state-of-art in automatic sports content analysis and reports on some recent R&D results in the Institute for Infocomm Research (I2R). We also envisage “desktop set-top-boxes”, where PC-cards are fully capable of receiving DTV signals. Coupled with superior Java UI support, the desktop arena can become a conducive and cost-efficient environment to facilitate rapid development, testing and deployment of new and compelling iTV services.

Speech at Broadcast Asia 2004 (7-page pdf)

 

 

 

Automatic Replay Generation for Soccer Video Broadcasting

Abstract

While most current approaches for sports video analysis are based on broadcast video, in this paper, we present a novel approach for highlight detection and automatic replay generation for soccer videos taken by the main camera. This research is important as current soccer highlight detection

and replay generation from a live game is a labor-intensive process. A robust multi-level, multi-model event detection framework is proposed to detect the event and event boundaries from the video taken by the main camera. This framework explores the possible analysis cues, using a mid-level

representation to bridge the gap between low-level features and high-level events. The event detection results and mid-level representation are used to generate replays which are automatically inserted into the video. Experimental results are promising and found to be comparable with those generated by broadcast professionals.     

 

ACM Multimeda 2004 (full paper, 8-page)  pdf download

 

 

 

Computer Generation of Sports Musiv TV

The following is a simple visual comparison of the kinds of Sports MTV generated by Muvee and by the Media Analysis Lab.

MA Lab

Muvee

Fast-pace Soccer (Euro 2004 final)

MA Lab

Muvee

Slow-pace Soccer (Euro 2004 final)

MA Lab

Muvee

Tennis (Ladies Wimbledon 2004 final)

 

 

 

Contact

Kongwah WAN  (kongwah@i2r.a-star.edu.sg)