Wednesday, October 31, 2007

Building Fat Pages - Introducing LSI

Something I've been working on lately - although this stuff has been around for at least 2 years - is LSI, or Latent Semantic Indexing. Essentially, computers are able to figure out what we are saying from context.

Dr. Andy Williams has recently released a course just on this point. I've just gotten it myself and will be working over to make sense of it. It actually covers what I've been talking about in terms of natural keywords (or keywords being natural). Google apparently is taking this quite a bit further in figuring what a page is "worth" to the reader and the community.

As I learn more about this, I'll let you know. Suffice to say that as Google evolves to mirror/duplicate what we humans do, a lot of the cheap/easy ways of exploiting the Internet for fast bucks will disappear.

Dr. Williams wrote a nice, short how-to on the subject as well.

Google Latent Semantic Indexing: "In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant. This simple method correlates surprisingly well with how a human being, looking at content, might classify a document collection. Although the LSI algorithm doesn’t understand anything about what the words mean, the patterns it notices can make it seem astonishingly intelligent."

No comments:

Popular Posts

Blog Archive