MovableType Plugin: MTKeywords
MTKeywords is a Movable Type plugin that compiles a list of fairly relevant keywords from the aggregate of the body of an entry, its title, and its comments. While the default purpose of the entry’s Keywords field is to populate the page’s Keywords meta tag, I found that extra data entry field to be most advantageously used for other purposes. Plus, there hasn’t been a way to auto-generate a list of relevant keywords. Thus, MTKeywords was born!
If you’re curious how the plugin works, it simply gathers the text from the three basic sources, removes any HTML, filters out more than 100 short common words, calculates the unigram and bigram counts for every word and word pair, and reports back the most common n-grams. Both the input and the results are case-sensitive when gathering unigrams. Obviously, this version is designed for English-language blogs, but if you modify the code and substitute a 100 or so of the most common words (including cardinal numbers, prepositions, articles, contractions, and to-be verb conjugations) it should still work pretty well. I doubt it will work for non-Western languages without major modifications, though. Sorry!
Impact on rebuilding seems negligible; I rebuilt the individual archives of this entire blog (consisting at the time of publication of just over 400 entries) in 46 seconds — the same elapsed time whether I used the MTKeywords plugin or not.
To see an example of the plugin in action, my brief synopsis of A Connecticut Yankee in King Arthur’s Court yielded keywords of “king arthur, connecticut yankee, yankee, Connecticut, Yankee, long, connecticut, King, Arthur, only, myself, court, king, fun, classic, Twain, century, england, complete, twain, disbelief, gun, England, took, arthur”. Not perfect, but fairly accurate.
And my history of Wake Island produced “wake island, pan american, peale island, history wake, united states, island, wake, Wake, Island, japanese, Japanese, american, history, American, great, pan, Pan, years, water, war, prisoners, atoll, civilian, been, islands”. The longer the blog entry is (and the more on-topic the comments are), the more representative the results will be.
By the way, I fully recognize that few search engines use the Keywords meta tag anymore, but hope springs eternal for a comeback once the algorithms to reduce keyword spammers are improved.
Download
You can get the latest version of MTKeywords by downloading Keywords.txt.
Installation
Save file as Keywords.pl in your Movable Type plugins directory and set the permissions to 755.
Usage
Add the <$MTKeywords$> tag into the HEAD section of your Individual Archive Template.
<meta name="keywords" content="<$MTKeywords$>" />
<$MTKeywords delimiter="|"$> — specify your own delimiters.
<$MTKeywords caseSensitive="false"$> — consolidates ‘Richard’ and ‘richard’ into just ‘richard’.
<$MTKeywords includeBigrams="false"$> — skips displaying of common word pairs.
Version History
1.01: 04/08/2006; add parameters for delimiter, case sensitivity, and bigram usage
0.99b: 01/10/2005; fixed problem with Perl v5.8x
0.99a: 11/08/2004; included more basic file and web extensions to exclude
0.99: 10/19/2004; initial release
Since keyword selection is case insensitive, a blog about search engines might return AltaVista, Altavista, and altavista as separate keywords — which seems to be repeating keywords, but it’s not. If that’s not what you meant, then I need to see an example. - RDL
New Look, More Protection
Starting yesterday afternoon, I completed a set of upgrades and installations to this weblog. I upgraded the templates and decided to stick with a white background (much easier to read), and installed a bunch of plug-ins to protect the weblog…
Hi, I’m trying to get my page/s other than the index page to hold meta tags of description and keywords. I think this plugin might work are there any other ways to do this?
That’s what this plugin is designed for. I am not aware of any alternatives — the reason I built this in the first place
- RDL
I love the plugin. I was trying to use it for doing “related tags”, but because it is case insensitive, it repeats stuff. For example: Radio shack blog Related tags: radio shack, Radio, radio, shack, Shack, computer It should just have Radio Shack. Not radio shack, radio, and shack. It seems to covert any word with at least 1 capital letter into all lowercase and count that as a keyword. So it often repeats any word capitalized. For instance. Yahoo Messenger might do this: Yahoo Messenger,yahoo messsenger,Yahoo,Messenger,yahoo,messenger. Any way to make the script not do this?
Due to a vast number of requests to remove the case sensitivity, I’ve created a new v1.01 that allows just that, plus a few more features! See the new usage instructions above.
By the way, don’t be alarmed by the fact that this particular page/blog is created with WordPress; I still have several MovableType blogs and maintain the MTKeywords plugin for use with them!
Thanks for the upgrade, richard.
Richard this has been a great plugin and I use it within my blog templates. Is there any way to limit the number of keywords that it generates?
For example, can it limit the list to 10 keywords?
Haven’t tested it, but you could try changing the text “$keywordcount < 25″ (do a search for it, minus the quotes) with “$keywordcount < 10″. That might work…
I really like this plugin, Richard. It’s a great time saver. I’ve noticed, however, that it is returning some keywords that don’t exisit in the content. For example, I have a blog page about children’s cold medicines, and your plugin is returning, “viagra, cialis, bestmed,” and so on.
Why? And is there a way to build an exclusion list into the script?