ResearchChannel - Using Compression Models to Filter Spam and Exploiting Structural Information for Document Categorization
  Programs A to Z Premieres Webcast Schedule Where to Watch Contact Us Help
      Learn How to Watch ResearchChannel  
Programming Home > Engineering and Computer Science >

Using Compression Models to Filter Spam and Exploiting Structural Information for Document Categorization

Multimedia Presentation Launch Presentation
 
Share this video —
 
Produced by:
Microsoft Research

11/21/2005

Description: 
In the first part of this talk, I will present a spam filtering method based on statistical data compression models. The nature of these models allows them to be employed as Bayesian text classifiers based on character sequences. The models are fast to construct and incrementally updateable. I will present experimental results indicating that this method performs well in comparison to established spam filters, and that the method is extremely robust to noise, which should make it difficult for spammers to defeat. I will also give some examples, which show that the method is capable of picking up interesting, non-trivial patterns that are indicative of spam/ham.

The second part of this talk describes how to exploit structural information for document categorization. Classifier stacking can be used to exploit the structure of semi-structured documents for improved text categorization performance. In this approach, a meta-classifier is used to combine predictions based on different structural elements. It will be shown that this approach consistently outperforms a flat-text linear SVM on a number of standard text categorization datasets, often by a wide margin. I will present selected nomograms that visualize the resulting meta-classifier and give interesting insight into the characteristics of the datasets.

Speaker(s):
Andrej Bratko, researcher, Department of Intelligent Systems, Jozef Stefan Institute; co-founder, Klika Ltd.

Runtime:01:13:54

Rating:TV-G


Explore our more than 3,500 titles available online —
Arts and Humanities | Business and Economics | Computer Science and Engineering
Health and Medicine | K-12 and Education | Sciences | Social Sciences
-or-
Browse by Program Title | Browse by Series Title | Browse by University/Institution
 
Fibromyalgia An Update on Fibromyalgia

Milton Masciadri Inside Stories: Milton Masciadri

Dr. Paul Farmer Building a Community-based Health Care Movement

Sign up now for our monthly newsletter,
Think Forward
!
Name:   
Email:   

 

Home | About ResearchChannel | Retransmission | Terms of Use | Privacy Policy | Contact Us

Copyright © 2010 ResearchChannel. All Rights Reserved.