MovableType Plugin: MTKeywords

MTKeywords is a Movable Type plugin that compiles a list of fairly relevant keywords from the aggregate of the body of an entry, its title, and its comments. While the default purpose of the entry’s Keywords field is to populate the page’s Keywords meta tag, I found that extra data entry field to be most advantageously used for other purposes. Plus, there hasn’t been a way to auto-generate a list of relevant keywords. Thus, MTKeywords was born!

If you’re curious how the plugin works, it simply gathers the text from the three basic sources, removes any HTML, filters out more than 100 short common words, calculates the unigram and bigram counts for every word and word pair, and reports back the most common n-grams. Both the input and the results are case-sensitive when gathering unigrams. Obviously, this version is designed for English-language blogs, but if you modify the code and substitute a 100 or so of the most common words (including cardinal numbers, prepositions, articles, contractions, and to-be verb conjugations) it should still work pretty well. I doubt it will work for non-Western languages without major modifications, though. Sorry!

Impact on rebuilding seems negligible; I rebuilt the individual archives of this entire blog (consisting at the time of publication of just over 400 entries) in 46 seconds — the same elapsed time whether I used the MTKeywords plugin or not.

To see an example of the plugin in action, my brief synopsis of A Connecticut Yankee in King Arthur’s Court yielded keywords of “king arthur, connecticut yankee, yankee, Connecticut, Yankee, long, connecticut, King, Arthur, only, myself, court, king, fun, classic, Twain, century, england, complete, twain, disbelief, gun, England, took, arthur”. Not perfect, but fairly accurate.

And my history of Wake Island produced “wake island, pan american, peale island, history wake, united states, island, wake, Wake, Island, japanese, Japanese, american, history, American, great, pan, Pan, years, water, war, prisoners, atoll, civilian, been, islands”. The longer the blog entry is (and the more on-topic the comments are), the more representative the results will be.

By the way, I fully recognize that few search engines use the Keywords meta tag anymore, but hope springs eternal for a comeback once the algorithms to reduce keyword spammers are improved.

Download
You can get the latest version of MTKeywords by downloading Keywords.txt.

Installation
Save file as Keywords.pl in your Movable Type plugins directory and set the permissions to 755.

Usage
Add the <$MTKeywords$> tag into the HEAD section of your Individual Archive Template.
<meta name="keywords" content="<$MTKeywords$>" />

<$MTKeywords delimiter="|"$> — specify your own delimiters.
<$MTKeywords caseSensitive="false"$> — consolidates ‘Richard’ and ‘richard’ into just ‘richard’.
<$MTKeywords includeBigrams="false"$> — skips displaying of common word pairs.

Version History
1.01: 04/08/2006; add parameters for delimiter, case sensitivity, and bigram usage
0.99b: 01/10/2005; fixed problem with Perl v5.8x
0.99a: 11/08/2004; included more basic file and web extensions to exclude
0.99: 10/19/2004; initial release

If you liked this, you might also be interested in:

Responses

44 Responses to “MovableType Plugin: MTKeywords”

Pages:« 1 [2] 3 4 5 » Show All

  1. Response #11
    tim (IP) on January 16th, 2005 at 9:00 pm

    I’m using MT 3.14 and it looks like you plugin is working smoothly. Thanks

  2. Response #12
    Robert Andrews (IP) on February 4th, 2005 at 9:36 am

    Interesting. At present, I’m serving all header tags (ie. , ) apart from from an include module. Every page and archive type gets the same. So, if I add ” /> to my header include file, will it work on pages that are not individual entry archive pages? Will it mess anything up on those pages, or will it just not work? Thanks. Looks good.

  3. Response #13
    richard on February 4th, 2005 at 4:36 pm

    My guess is that it probably won’t do anything on non-individual-archive pages. Try it and let me know. - RDL

  4. Response #14
    Sascha Carlin (IP) on February 17th, 2005 at 12:38 pm

    Here is a list of german stop words, derived from the list of phpBB.de: http://www.itst.org/web/stuff/germanstop.txt

  5. Response #15
    user (IP) on February 24th, 2005 at 12:28 pm

    great plugin! thanks, but i think it is not a good idea to make it case-sensitive, there are some duplicated keywords (just in other cases) it generates and it is not good and have no point from the search engine (and human) point of view (imho)

  6. Response #16
    richard on February 26th, 2005 at 9:34 pm

    Unfortunately, case sensitivity is about as important as keyword metadata is today. While most major search engines no longer support either meta tag keywords or case sensitivity, it was not long ago that AltaVista, HotBot, MSN Search, and Northern Light (to name a few) relied on case sensitivity when used in combination with quotes or on advanced search pages, returning potentially different results for searches such as “mtkeywords” and “MTKeywords” (OK, a bad and contrived example!) Another possible benefit of case insensitivity is that heavily used mixed-case words get weighted more in the list of keywords. But food for thought is that maybe I’ll add case usage as an option on a later release. - RDL

  7. Response #17
    Steve (IP) on February 28th, 2005 at 8:28 am

    I installed the plugin ten minutes ago, and rebuilt a couple of entries and it looks like it’s working pretty good. However, looking over the keywords it selected, there are still quite a few common words I’d want excluded. I suspect I can go into the Keywords.pl file and add more common words (yes? no?), but my Perl syntax is a bit rusty and I’d worry about making a syntax error. Is there a simple way to get the plugin to read the “common words” from an external .txt list? That way I can maintain and update what is essentially an external dictionary of common words. I can probably even set up MT to write the .txt file from a template, so I can do all the updating from within MT. Any thoughts? (ps - my ultimate goal in this is to actually set up a seperate MT template that collates only keywords from entries as they are posted. Then I was going to try to use the MT Zeitgeist plugin on that template to show a display of the most common themes mentioned on my blog.)

  8. Response #18
    Al-Muhajabah’s Movable Type Tips (IP) on February 28th, 2005 at 7:02 pm

    using keywords and tags to help visitors find related content

    Another way I’m using alternate search templates is to display lists of all entries that have the same keyword as the current entry. To do this, I’m making use of the KeyWordList plugin, as well as the Glue plugin for…

  9. Response #19
    John Swods (IP) on March 1st, 2005 at 4:05 pm

    Easy to install, works great — thanks for writing a great plug-in!!

  10. Response #20
    TIPS (IP) on March 6th, 2005 at 3:17 pm

    using keywords and tags to help visitors find related content

    Another way I’m using alternate search templates is to display lists of all entries that have the same keyword as the current entry. To do this, I’m making use…

Pages: « 1 [2] 3 4 5 » Show All