MovableType Plugin: MTKeywords
MTKeywords is a Movable Type plugin that compiles a list of fairly relevant keywords from the aggregate of the body of an entry, its title, and its comments. While the default purpose of the entry’s Keywords field is to populate the page’s Keywords meta tag, I found that extra data entry field to be most advantageously used for other purposes. Plus, there hasn’t been a way to auto-generate a list of relevant keywords. Thus, MTKeywords was born!
If you’re curious how the plugin works, it simply gathers the text from the three basic sources, removes any HTML, filters out more than 100 short common words, calculates the unigram and bigram counts for every word and word pair, and reports back the most common n-grams. Both the input and the results are case-sensitive when gathering unigrams. Obviously, this version is designed for English-language blogs, but if you modify the code and substitute a 100 or so of the most common words (including cardinal numbers, prepositions, articles, contractions, and to-be verb conjugations) it should still work pretty well. I doubt it will work for non-Western languages without major modifications, though. Sorry!
Impact on rebuilding seems negligible; I rebuilt the individual archives of this entire blog (consisting at the time of publication of just over 400 entries) in 46 seconds — the same elapsed time whether I used the MTKeywords plugin or not.
To see an example of the plugin in action, my brief synopsis of A Connecticut Yankee in King Arthur’s Court yielded keywords of “king arthur, connecticut yankee, yankee, Connecticut, Yankee, long, connecticut, King, Arthur, only, myself, court, king, fun, classic, Twain, century, england, complete, twain, disbelief, gun, England, took, arthur”. Not perfect, but fairly accurate.
And my history of Wake Island produced “wake island, pan american, peale island, history wake, united states, island, wake, Wake, Island, japanese, Japanese, american, history, American, great, pan, Pan, years, water, war, prisoners, atoll, civilian, been, islands”. The longer the blog entry is (and the more on-topic the comments are), the more representative the results will be.
By the way, I fully recognize that few search engines use the Keywords meta tag anymore, but hope springs eternal for a comeback once the algorithms to reduce keyword spammers are improved.
Download
You can get the latest version of MTKeywords by downloading Keywords.txt.
Installation
Save file as Keywords.pl in your Movable Type plugins directory and set the permissions to 755.
Usage
Add the <$MTKeywords$> tag into the HEAD section of your Individual Archive Template.
<meta name="keywords" content="<$MTKeywords$>" />
<$MTKeywords delimiter="|"$> — specify your own delimiters.
<$MTKeywords caseSensitive="false"$> — consolidates ‘Richard’ and ‘richard’ into just ‘richard’.
<$MTKeywords includeBigrams="false"$> — skips displaying of common word pairs.
Version History
1.01: 04/08/2006; add parameters for delimiter, case sensitivity, and bigram usage
0.99b: 01/10/2005; fixed problem with Perl v5.8x
0.99a: 11/08/2004; included more basic file and web extensions to exclude
0.99: 10/19/2004; initial release
@rbird: Check your comments for spam. The plugin “compiles a list of fairly relevant keywords from the aggregate of the body of an entry, its title, and its comments.” It includes keywords from comments so that the keyword list can properly evolve as your online conversation grows. If you want to completely prevent any comments from being included in the list, comment out the following lines:
my $comments = $entry->comments; for my $comment (@$comments) { $body .= " ".$comment->text; }(I haven’t actually tested the above modification, but it should work with no problems.)
I’m guessing you must have already figured out that comment spam was the cause of the problem, because the plugin seems to be producing a set of keywords that you should expect.
I always am amused when I see people using my plugins in ways I never imagined. It never occurred to me to display the keywords on the page! I wonder if that will help the page’s Google ranking or hinder it…
Excellent, Richard. That did the trick. Thanks a million. (the entry you referenced above is actually a static archive, not populated using your plugin.)
It seems that your script was pulling content from ALL comments, even those that have been “junked” in MovableType.
I understand the value of including comments as content, but not when it also includes junked comments. Is there a way to make the inclusion only *published* comments?
I’m sure there is. Unfortunately, my old MovableType v2.63 didn’t have the ability to “junk” comments, one of the many reasons I jumped over to WordPress about a year and a half ago and never looked back. So, I can not develop and properly test that additional functionality myself. If someone does come up with a solution, I will gladly keep it posted here.
With the release of Movable Type v4.0 about six months ago, and its retention of unpublished comment spam, this plugin no longer functions as-is out of the box. Since I transferred all of my blogs to WordPress about two years ago, and I no longer have access to a Movable Type blog installation of any version, I have decided to officially halt further development of the MTKeywords plugin.