Keyword
Density - More Than Meets the Eye
Author:
Roberto Grassi
One of the
standard elements of web page optimization is Keyword
Density: up until very recently the ratio of keywords to rest of
body text was generally deemed to be one of the most important
factors employed by search engines to determine a web site's
ranking.
However, this basically linear approach is gradually changing
now: as mathematical linguistics and automatic content
recognition technology progresses, the major search engines are
shifting their focus towards "theme" biased algorithms that do
not rely on analysis of individual web pages anymore but,
rather, will evaluate whole web sites to determine their topical
focus or "theme" and its relevance in relation to users' search
requests.
This is not to say that keyword density is losing in importance,
quite the contrary. However, it is turning into a lot more
complex technology than a simple computation of word frequency
per web page can handle.
Context analysis is now being determined by a number of
auxiliary linguistic disciplines and technology, for example: *
semantic text analysis * textlexical database technology *
distribution analysis of lexical components (such as nouns,
adjectives, verbs) * evaluation of distance between semantic
elements * AI and data mining technology based pattern
recognition; * term vector database technology etc.
All these are now contributing to the increasing sophistication
of the relevance determination process. If you feel this is
beginning to sound too much like rocket science for comfort, you
may not be very far from the truth: it seems that the future of
search engine optimization will be determined by what the
industry is fond to term the "word gurus".
A sound knowledge of fundamental linguist methodology plus more
than a mere smattering of statistical calculus will most
probably be paramount to achieve successful search engine
rankings in the foreseeable future. Merely repeating the well
worn mantra "content is king!", as some of the lesser qualified
SEO professionals and very many amateurs are currently doing,
may admittedly have a welcome sedative effect by creating a
feeling of fuzzy warmth and comfort. But to all practical
purposes it is tantamount to whistling in the dark and fails
miserably in doing justice to the overall complexity of the
process involved.
It should be noted that we are talking presence AND future here:
many of the classical techniques of search engine optimization
are still working more or less successfully, but there is little
doubt that they are rapidly losing their cutting edge and will
probably be as obsolete in a few months' time as spamdexing or
invisible text - both optimization techniques well worth their
while throughout the 90s - have become today.
So where does keyword density come into this equation? And how
is it determined anyway?
There's the rub: the term "keyword density" is by no means as
objective and clear-cut as many people (some SEO experts
included) will have it! The reason for this is the inherent
structure of hypertext markup language (HTM) code: as text
content elements are embedded in clear text command tags
governing display and layout, it is not easy to determine what
should or should not be factored into any keyword density
calculus.
The matter is complicated further by the fact that the meta tags
inside a HTML page's header may contain keywords and description
content: should these be added to the total word count or not?
Seeing that some search engines will ignore meta tags altogether
(e.g. Lycos, Excite and Fast/Alltheweb), whereas others arestill considering them (at least partially), it gets even more
confusing. What may qualify for a keyword density of 2% under
one frame of reference (e.g. including meta tags, graphics ALT
tags, comment tags, etc.) may easily be reduced to 1% or less
under another.
Further questions arise: will meta tags following the Dublin
Convention ("D.C. tags") be counted in or not? And what about
HTTP-EQUIV tags? Would you really bet the ranch that TITLE tags
in tables, forms or DIV elements will be ignored? Etc., etc.
Another fundamental factor generating massive fuzziness left,
right and center, is the issue of semantic delimiters: what's a
"word" and what isn't? Determining a lexical unity (aka a
"word")
by punctuation is a common though pretty low tech method which
may lead to some rather unexpected results.
Say you are featuring an article by an author named "John Doe"
who happens to sport a master's degree in arts, commonly
abbreviated as "M.A.". While most algorithms will correctly
count "John" and "Doe" as separate words, the
"M.A." string is
quite another story. Some algorithms will actually count this
for two words ("M" and "A") because of the period (dot) is
considered a delimiter - whereas others (surprise!) will not.
But how would you know which search engines are handling it in
which way? Answer: you don't, and that's exactly where the
problems start.
The only feasible approach to master this predicament is trial
and error. The typical beginner's inquiry "What's the best
keyword density for AltaVista?", understandable and basically
rational as it may be, is best answered with the fairly
frustrating but ultimately precise: "It all depends - your
mileage may vary." It is only by experimenting with keyword
densities under standardized, comparable conditions yourself
that you will be able to come to significant and viable
conclusions.
To get going, here are some links to pertinent programs that
will help you determine (and, in one case, even generate)
keyword densities.
About the
author:
An all time classic of client based keyword density software is Roberto
Grassi's powerful KeyWord Density Analyzer (KDA). It is immensely configurable
and offers a fully featured free evaluation version for download. Find it here: http://www.grsoftware.net/search_engines/software/grkda.html
(Expect to pay appr. $99 for the registered version.)