MovableType Plugin: MTKeywords

MTKeywords is a Movable Type plugin that compiles a list of fairly relevant keywords from the aggregate of the body of an entry, its title, and its comments. While the default purpose of the entry’s Keywords field is to populate the page’s Keywords meta tag, I found that extra data entry field to be most advantageously used for other purposes. Plus, there hasn’t been a way to auto-generate a list of relevant keywords. Thus, MTKeywords was born!

If you’re curious how the plugin works, it simply gathers the text from the three basic sources, removes any HTML, filters out more than 100 short common words, calculates the unigram and bigram counts for every word and word pair, and reports back the most common n-grams. Both the input and the results are case-sensitive when gathering unigrams. Obviously, this version is designed for English-language blogs, but if you modify the code and substitute a 100 or so of the most common words (including cardinal numbers, prepositions, articles, contractions, and to-be verb conjugations) it should still work pretty well. I doubt it will work for non-Western languages without major modifications, though. Sorry!

Impact on rebuilding seems negligible; I rebuilt the individual archives of this entire blog (consisting at the time of publication of just over 400 entries) in 46 seconds — the same elapsed time whether I used the MTKeywords plugin or not.

To see an example of the plugin in action, my brief synopsis of A Connecticut Yankee in King Arthur’s Court yielded keywords of “king arthur, connecticut yankee, yankee, Connecticut, Yankee, long, connecticut, King, Arthur, only, myself, court, king, fun, classic, Twain, century, england, complete, twain, disbelief, gun, England, took, arthur”. Not perfect, but fairly accurate.

And my history of Wake Island produced “wake island, pan american, peale island, history wake, united states, island, wake, Wake, Island, japanese, Japanese, american, history, American, great, pan, Pan, years, water, war, prisoners, atoll, civilian, been, islands”. The longer the blog entry is (and the more on-topic the comments are), the more representative the results will be.

By the way, I fully recognize that few search engines use the Keywords meta tag anymore, but hope springs eternal for a comeback once the algorithms to reduce keyword spammers are improved.

Download
You can get the latest version of MTKeywords by downloading Keywords.txt.

Installation
Save file as Keywords.pl in your Movable Type plugins directory and set the permissions to 755.

Usage
Add the <$MTKeywords$> tag into the HEAD section of your Individual Archive Template.
<meta name="keywords" content="<$MTKeywords$>" />

<$MTKeywords delimiter="|"$> — specify your own delimiters.
<$MTKeywords caseSensitive="false"$> — consolidates ‘Richard’ and ‘richard’ into just ‘richard’.
<$MTKeywords includeBigrams="false"$> — skips displaying of common word pairs.

Version History
1.01: 04/08/2006; add parameters for delimiter, case sensitivity, and bigram usage
0.99b: 01/10/2005; fixed problem with Perl v5.8x
0.99a: 11/08/2004; included more basic file and web extensions to exclude
0.99: 10/19/2004; initial release

If you liked this, you might also be interested in:

Responses

44 Responses to “MovableType Plugin: MTKeywords”

  1. Response #1
    Opinionated Báštárd (IP) on October 25th, 2004 at 2:02 pm

    Slightly better form for the meta tag would have a / before the closing > like the rest of the template has. Also, the meta tag should go in the “HEAD” section…

  2. Response #2
    richard on October 27th, 2004 at 9:34 pm

    Well, you caught an OOPS! and mentioned one of those very obvious instructions that it didn’t even occur to me to point out to the usually clueless masses. And here I am, a veritable preacher for “well-formed-ness”! Points taken. Text updated. Thanks! - RDL

  3. Response #3
    sami (IP) on November 1st, 2004 at 10:42 am

    nice, does it work on 3.12?

  4. Response #4
    richard on November 3rd, 2004 at 9:31 am

    I know of no reason why it shouldn’t work on 3.X, but it has not been tested. - RDL

  5. Response #5
    David Lawrence (IP) on November 30th, 2004 at 8:51 pm

    Is there any reason why examining the source code of *some* of my MT pages, after installing Keywords.pl would show this, an empty set of comments: content=”" (from http://thedavidlawrenceshow.com/002336.html, a page filled with words about the newscaster in Cleveland that got naked for a sweeps-month stunt) and other pages *would* have the results of your plugin. Does it take time to spread through a large blog? All pages have plenty of text to examine. Thanks for this plug-in!
    [We narrowed the "problem" down to the fact that on his errant entries he was only including real-time feeds from other sources using a JavaScript RSS reader, and not writing his own body of text. MTKeywords cannot index keywords from text from sources other than MT. - RDL]

  6. Response #6
    Chris Pirillo (IP) on December 9th, 2004 at 1:19 am

    Useless use of a variable in void context at /www/sites/mt.lockergnome.com/mt/plugins/Keywords.pl line 33. What’s that?

  7. Response #7
    Tim (IP) on December 10th, 2004 at 4:28 pm

    I am getting the following when I run it: Use of uninitialized value in concatenation (.) or string at plugins/Keywords.pl line 38. It repeats three times. It still generates the keywords, but I’m wondering if the error can be fixed. FYI I’m using MT 3.12 or whatever the current release is.

  8. Response #8
    Anonymous (IP) on December 29th, 2004 at 1:49 am

    I just upgraded to MT3.14 and have run the upgrade srcipt and it pointed out the following: **** WARNING: Parentheses missing around “my” list at plugins/Keywords.pl line 33. **** WARNING: Useless use of a variable in void context at plugins/Keywords.pl line 33. Thought you would want to know!

  9. Response #9
    Ian (IP) on January 2nd, 2005 at 6:05 pm

    Nice plugin - thanks - but I’m receiving the same “Useless use” error as others. I have MT3.14 and Perl 5.8.0. I hope this helps.

  10. Response #10
    richard on January 10th, 2005 at 9:34 pm

    The errors above should be corrected with the new version 0.99b. - RDL

  11. Response #11
    tim (IP) on January 16th, 2005 at 9:00 pm

    I’m using MT 3.14 and it looks like you plugin is working smoothly. Thanks

  12. Response #12
    Robert Andrews (IP) on February 4th, 2005 at 9:36 am

    Interesting. At present, I’m serving all header tags (ie. , ) apart from from an include module. Every page and archive type gets the same. So, if I add ” /> to my header include file, will it work on pages that are not individual entry archive pages? Will it mess anything up on those pages, or will it just not work? Thanks. Looks good.

  13. Response #13
    richard on February 4th, 2005 at 4:36 pm

    My guess is that it probably won’t do anything on non-individual-archive pages. Try it and let me know. - RDL

  14. Response #14
    Sascha Carlin (IP) on February 17th, 2005 at 12:38 pm

    Here is a list of german stop words, derived from the list of phpBB.de: http://www.itst.org/web/stuff/germanstop.txt

  15. Response #15
    user (IP) on February 24th, 2005 at 12:28 pm

    great plugin! thanks, but i think it is not a good idea to make it case-sensitive, there are some duplicated keywords (just in other cases) it generates and it is not good and have no point from the search engine (and human) point of view (imho)

  16. Response #16
    richard on February 26th, 2005 at 9:34 pm

    Unfortunately, case sensitivity is about as important as keyword metadata is today. While most major search engines no longer support either meta tag keywords or case sensitivity, it was not long ago that AltaVista, HotBot, MSN Search, and Northern Light (to name a few) relied on case sensitivity when used in combination with quotes or on advanced search pages, returning potentially different results for searches such as “mtkeywords” and “MTKeywords” (OK, a bad and contrived example!) Another possible benefit of case insensitivity is that heavily used mixed-case words get weighted more in the list of keywords. But food for thought is that maybe I’ll add case usage as an option on a later release. - RDL

  17. Response #17
    Steve (IP) on February 28th, 2005 at 8:28 am

    I installed the plugin ten minutes ago, and rebuilt a couple of entries and it looks like it’s working pretty good. However, looking over the keywords it selected, there are still quite a few common words I’d want excluded. I suspect I can go into the Keywords.pl file and add more common words (yes? no?), but my Perl syntax is a bit rusty and I’d worry about making a syntax error. Is there a simple way to get the plugin to read the “common words” from an external .txt list? That way I can maintain and update what is essentially an external dictionary of common words. I can probably even set up MT to write the .txt file from a template, so I can do all the updating from within MT. Any thoughts? (ps - my ultimate goal in this is to actually set up a seperate MT template that collates only keywords from entries as they are posted. Then I was going to try to use the MT Zeitgeist plugin on that template to show a display of the most common themes mentioned on my blog.)

  18. Response #18
    Al-Muhajabah’s Movable Type Tips (IP) on February 28th, 2005 at 7:02 pm

    using keywords and tags to help visitors find related content

    Another way I’m using alternate search templates is to display lists of all entries that have the same keyword as the current entry. To do this, I’m making use of the KeyWordList plugin, as well as the Glue plugin for…

  19. Response #19
    John Swods (IP) on March 1st, 2005 at 4:05 pm

    Easy to install, works great — thanks for writing a great plug-in!!

  20. Response #20
    TIPS (IP) on March 6th, 2005 at 3:17 pm

    using keywords and tags to help visitors find related content

    Another way I’m using alternate search templates is to display lists of all entries that have the same keyword as the current entry. To do this, I’m making use…

  21. Response #21
    Ravensky’s Blog (IP) on March 23rd, 2005 at 9:31 pm

    Getting things done

    Alrighty, so now I have a nice template, made by Neil Turner. He’s made some very nice themes and I even used to use this once a long time ago when I was still new with Movable Type. I’ve also…

  22. Response #22
    bopuc/weblog (IP) on April 7th, 2005 at 3:11 am

    Yahoo! Term extraction for MT

    So Jonas has gotten this working for WordPress, but I have some ideas of how to use it, somewhat differently, with Movable Type. I couldn’t code Perl (or anything else really) to save my life so here is just the…

  23. Response #23
    Dan Wolfgang (IP) on April 18th, 2005 at 10:04 am

    This is a cool tag, but it doesn’t do exactly what I want. I like using the Keywords field for tags. As I “tag” more and more entries, I think more and more about the backlog of untagged entries I have. With that in mind, it’d be cool if this could somehow be used to populate any existing empty Keywords field to sort of play catch-up. Any ideas on the possibility of that?

  24. Response #24
    Chris Short (IP) on May 21st, 2005 at 6:50 pm

    The question is can I change the seperator and the number of keywords presented?

  25. Response #25
    richard on May 21st, 2005 at 8:13 pm

    The last two code sections gather the results into the $result variable. The first section includes up to the first 4 common bigrams (determined by $keywordcount < 5) and the second section pads the results to the 24th instance of a keyword (determined by $keywordcount < 25). Change the desired keywordcounts to be whatever you wish. Both sections stitch the results together with $result .= “, “. You can change the comma/space into whatever separator you desire. - RDL

  26. Response #26
    Chris Short (IP) on May 22nd, 2005 at 9:10 am

    Thanks. Next question, is there a way to keep the same keyword from repeating more than three times?

  27. Response #27
    richard on May 22nd, 2005 at 11:52 am

    No, not at this time. The plugin is intended to be case sensitive. - RDL

  28. Response #28
    Chris Short (IP) on May 22nd, 2005 at 12:41 pm

    Also, my Main Index page isn’t generating keywords.

  29. Response #29
    richard on May 23rd, 2005 at 10:29 am

    As designed. The MTKeywords plugin is for use within individual archive templates. - RDL

  30. Response #30
    Chris Short (IP) on May 23rd, 2005 at 2:29 pm

    Um… my question was about keyword repetition. Not case sensitivity.

  31. Response #31
    richard on May 25th, 2005 at 9:34 pm

    Since keyword selection is case insensitive, a blog about search engines might return AltaVista, Altavista, and altavista as separate keywords — which seems to be repeating keywords, but it’s not. If that’s not what you meant, then I need to see an example. - RDL

  32. Response #32
    Kai’s Weblog (IP) on August 6th, 2005 at 7:51 am

    New Look, More Protection

    Starting yesterday afternoon, I completed a set of upgrades and installations to this weblog. I upgraded the templates and decided to stick with a white background (much easier to read), and installed a bunch of plug-ins to protect the weblog…

  33. Response #33
    apryl (IP) on August 22nd, 2005 at 9:12 am

    Hi, I’m trying to get my page/s other than the index page to hold meta tags of description and keywords. I think this plugin might work are there any other ways to do this?

  34. Response #34
    richard on August 24th, 2005 at 2:45 pm

    That’s what this plugin is designed for. I am not aware of any alternatives — the reason I built this in the first place :-) - RDL

  35. Response #35
    Tom Keating (IP) on March 16th, 2006 at 11:56 am

    I love the plugin. I was trying to use it for doing “related tags”, but because it is case insensitive, it repeats stuff. For example: Radio shack blog Related tags: radio shack, Radio, radio, shack, Shack, computer It should just have Radio Shack. Not radio shack, radio, and shack. It seems to covert any word with at least 1 capital letter into all lowercase and count that as a keyword. So it often repeats any word capitalized. For instance. Yahoo Messenger might do this: Yahoo Messenger,yahoo messsenger,Yahoo,Messenger,yahoo,messenger. Any way to make the script not do this?

  36. Response #36
    richard on April 8th, 2006 at 9:52 am

    Due to a vast number of requests to remove the case sensitivity, I’ve created a new v1.01 that allows just that, plus a few more features! See the new usage instructions above.

    By the way, don’t be alarmed by the fact that this particular page/blog is created with WordPress; I still have several MovableType blogs and maintain the MTKeywords plugin for use with them!

  37. Response #37
    Chris Short (IP) on August 19th, 2006 at 4:44 am

    Thanks for the upgrade, richard.

  38. Response #38
    Stuart (IP) on August 5th, 2007 at 4:23 pm

    Richard this has been a great plugin and I use it within my blog templates. Is there any way to limit the number of keywords that it generates?

    For example, can it limit the list to 10 keywords?

  39. Response #39
    richard on August 5th, 2007 at 7:33 pm

    Haven’t tested it, but you could try changing the text “$keywordcount < 25″ (do a search for it, minus the quotes) with “$keywordcount < 10″. That might work…

  40. Response #40
    Richard Bird (IP) on October 25th, 2007 at 12:11 pm

    I really like this plugin, Richard. It’s a great time saver. I’ve noticed, however, that it is returning some keywords that don’t exisit in the content. For example, I have a blog page about children’s cold medicines, and your plugin is returning, “viagra, cialis, bestmed,” and so on.

    Why? And is there a way to build an exclusion list into the script?

  41. Response #41
    richard on October 25th, 2007 at 1:00 pm

    @rbird: Check your comments for spam. The plugin “compiles a list of fairly relevant keywords from the aggregate of the body of an entry, its title, and its comments.” It includes keywords from comments so that the keyword list can properly evolve as your online conversation grows. If you want to completely prevent any comments from being included in the list, comment out the following lines:

    	my $comments = $entry->comments;
    	for my $comment (@$comments) {
    		$body .= " ".$comment->text;
    	}
    

    (I haven’t actually tested the above modification, but it should work with no problems.)

    I’m guessing you must have already figured out that comment spam was the cause of the problem, because the plugin seems to be producing a set of keywords that you should expect.

    I always am amused when I see people using my plugins in ways I never imagined. It never occurred to me to display the keywords on the page! I wonder if that will help the page’s Google ranking or hinder it…

  42. Response #42
    Richard Bird (IP) on October 25th, 2007 at 1:39 pm

    Excellent, Richard. That did the trick. Thanks a million. (the entry you referenced above is actually a static archive, not populated using your plugin.)

    It seems that your script was pulling content from ALL comments, even those that have been “junked” in MovableType.

    I understand the value of including comments as content, but not when it also includes junked comments. Is there a way to make the inclusion only *published* comments?

  43. Response #43
    richard on October 25th, 2007 at 1:57 pm

    I’m sure there is. Unfortunately, my old MovableType v2.63 didn’t have the ability to “junk” comments, one of the many reasons I jumped over to WordPress about a year and a half ago and never looked back. So, I can not develop and properly test that additional functionality myself. If someone does come up with a solution, I will gladly keep it posted here.

  44. Response #44
    richard on February 11th, 2008 at 1:52 pm

    With the release of Movable Type v4.0 about six months ago, and its retention of unpublished comment spam, this plugin no longer functions as-is out of the box. Since I transferred all of my blogs to WordPress about two years ago, and I no longer have access to a Movable Type blog installation of any version, I have decided to officially halt further development of the MTKeywords plugin.