Description: This talk will be focused on the following three pieces of work that we have done:
(1) How to utilize unlabeled data in classification? In many real-world machine learning problems, such as web categorization, only few labeled examples can be available since labeling needs human labor, and unlabeled data are far easy to obtain. So, naturally, one may wonder if we can utilize unlabeled data in our classification tasks. I will present a simple, powerful and mathematically clean approach to this problem, and demonstrate its good experimental results provided by the third party on a number of machine learning benchmarks. Our approach has been considered as state of the art in machine learning literature.
(2) How to partition directed graphs like the Web? Spectral clustering for undirected graphs has been being extensively studied since a mathematician Fiedler’s seminal work in 1970’s. The spectral method is so powerful that many people have attempted to generalize it to directed graphs. Among them the most popular one is perhaps Jon Kleinberg’s HITS algorithm for both ranking web pages and detecting web communities. I will show how we thoroughly solve this problem via Markov chain theory, and also the application of our approach to real-world web data. This approach can be implemented with several lines of Matlab code.
(3) How to rank objects like images and texts? Link-based ranking has enjoyed a huge success in web search engines. However, in practice, many types of data have no link structure but being modeled as vectors in Euclidean spaces, for instance, texts and images. A principled way of ranking those kinds of data is to explore and exploit their intrinsic geometrical or manifold structure. I will show how we address this issue in a simple mathematical framework. Our approach has been widely used by different communities from image retrieval to bioinformatics.
Speaker(s):
Dengyong (Denny) Zhou, research scientist, Machine Learning Department, NEC Laboratories America
|