PostNuke

Flexible Content Management System

News

Moving on: Better PostNuke ShortURLs

Contributed by on Jul 09, 2005 - 11:48 PM

Up until now, PostNuke URLs have consisted of an index file in the site's root with a long convoluted query string that is both user- and search-engine unfriendly, meaning the search engines choke on them and thus doesn't index PostNuke sites well, while making the URLs hard to post to people in email or in forums. For instance, a news link looks like this:

/index.php?name=News&file=article&sid=123&mode=thread&order=0&thold=0



For some time now, PostNuke users have cried out for better Search-Engine Friendly URLs, and for the past few years, the only thing available has been a theme hack first detailed by Karateka (possibly E. Soysal before that, the links in the article are dead) way back in 2002, since worked on by ColdRolledSteel (Craig Saunders), and consequently me.

The advent of the ShortURL hack has seen sites hosted on Apache servers with the URL Rewriting module (mod_rewrite) enabled get URLs like

/Article123.html



for the above link, where certain assumptions have been made about the default settings for mode, thread and threshhold. A big improvement, but not very descriptive, and it comes at the cost of heavy post-processing of the site's content for links. Also, Search Engines use link keyword relevance in their rankings, and Article123 doesn't say much about the link, except that it's an article with the id 123.

As Karateka pointed out at the time in his article, a problem in implementing friendlier URLs with virtual directories is that all paths in PostNuke are relative, ie relative to the site root folder where index.php is located, and fixing it then would have required extensive changes in the core. That is, a URL like /Example/view.html would result in the browser looking for all links relative to its present location, ie in the nonexistant subfolder called Example, and subsequently it would fail to find the linked stylesheets, images etc, and all links from the page would similarly fail.



Unfortunately this situation has not changed in the intervening years, but as PostNuke modules are becoming API-compliant, they reference the same system function to build their URLs, so fixing this function and other associated functions to use root-relative links(1) will fix all compliant module URLs. But that leaves all other links, like images, Javascript, and stylesheets. The move to templating with Xanthia (for themes) and pnRender (for modules) is also making it easier, since Xanthia templates use a Xanthia variable to reference the theme's image directory path. So fixing Xanthia and pnRender will fix most paths in Xanthia themes. The exception are stylesheet and Javascript link paths and any links in the theme header, for which new path variables need to be introduced, so some updating of Xanthia themes is required. This makes the transition period to PN 0.8 an ideal time to introduced these changes, since few Xanthia themes have been released so far, and core modules are only just being converted to pnRender.

I stopped work on ShortURLs some time ago (before pn0.75) on the advice that a core module was being developed; however I have seen no evidence of this to date, and there is no indication in the upcoming PN 0.76 or CVS that there is anything coming. I got curious a month or so ago, and was somewhat dismayed at what I found.



Since then no progress seems to have been made on PostNuke ShortURLs. In fact, the current Xanthia filter hack has regressed, becoming bloated with complex and wholly unnecessary Regular Expression rules, many badly written with duplication and a number of bugs, especially in the accompanying htaccess file, going from the 15 rules proposed by Karateka to a massive 89. So, I set out to try and fix it, but ended up revisiting the idea of a core implementation using virtual directories to more logically structure the URLs in a way that is not only Search-Engine Friendly, but more User-Friendly.



Along the way, I've also been sidetracked and made a direly-needed new themable tab system for the Administration area based on AlistApart.com's Sliding Doors technique and consequently overhauled most of the Admin templates and a few User templates too, partly out of necessity due to the new Adminpanel, partly because they badly needed it. Those of you who have tried the pn0.76 Release Candidates would know that the templated output in them leaves something to be desired, drab and somewhat unprofessional-looking due to all the styling and CSS-classes having been ripped out, leaving a basic grey and white look with overly large headings and no theme tables for backgrounds. Hardly what you would call of Release Candidate quality. So pnRender and its plugins have been fixed to allow the use of Xanthia-like theme-colour tags as well as a tag for root-relative paths needed for ShortURLs, and the opentable functions have been fixed so that proper themed borders can be used. In fact most of the changes are in fixed templates, plugins, and module files.



My proposed implementation still retain the Xanthia filter for backwards compatibility with older themes, modules and blocks, but has been wholly rewritten and pared down to 24 rules, including a rule to fix all links to be root-relative. As PostNuke is in transition to be fully pnAPI-compliant by PostNuke 0.8, the remaining ones can gradually be removed altogether as themes, modules and blocks are updated. There's also a version for AutoTheme.



This particular scheme is experimental and may be tweaked or improved upon. It seeks to reduce the reliance on the Regular Expression(2) post-processing for links and introduce more user-friendly URLs that have more relevance for people and search engines alike by using virtual directories to visually distinguish sections of the site by module and function, such as

/Example/View.html



and for the News articles introduce Category, Topic, and Title information in the link:

/Category/Topic/ArticleXXX-title-of-story.html



For instance for a news story in the category Computers and the topic Postnuke called "PostNuke Shorturls", you'd have the URL

/Computers/Postnuke/Article123-PostNuke-Shorturls.html



This is a clear, concise and informative link that tells the user and search engine alike something about the link before going there, while retaining backwards compatibility with links of the old ShortURL scheme. It more closely emulates the way we think and organise information, using the folder analogy where we have a clearly-labelled Computer category folder, under which we have the various sub-categories - Topics - with various articles. In this case, we're using a virtual file anchored by the word "Article", clearly identifying it as such, followed by the article number and title. There is backwards compatibility, so that older links for Article123.html will still work.



In this instance I've excluded the News keyword altogether for brevity in favour of the Category and Topic keywords which insinuate News anyway, though there is nothing against being consistent with all the other ShortURLs and having the Module appear first, as in

/News/Computers/Postnuke/Article123:-PostNuke-Shorturls.html



This is for the special case of the core News module though, a more generic method is needed overall for URLs with various unknown parameters passed in the query string. This implementation uses the scheme:

/Module/Function-Param1:Value1-Param2:Value2... -ParamN:ValueN.(p)htm(l)

where the Query string parameters are tagged onto the virtual filename grouped by colons and separated by hyphens, the idea being to use commonly-used characters we might normally use in a list to make it look as natural and readable as possible. It may be a less-commonly used character than the hyphen is needed, like the tilde ~ character, since some parameter values may use a hyphen, in particular usernames. This is not a problem if passed as the last parameter, where it may contain any character. So if the module developer kept this in mind, it might not be an issue. I'm not aware of it being one so far. The PostCalender ShortURL plugin deliberately places uname, if present, last.



The extension is not necessary, but used for convenience. The 3 types used are either one of html, htm, or phtml, the latter useful to distinguish when you want to link to real HTML files on the site. The extensions as well as the option to use ShortURLs or not is set in the Settings panel, though I've only offered the option of html and phtml, since frankly the MS DOS-holdover extension htm annoys me.

Older URLs are marked with a + before the Function name, as in

/PNphpBB2/+profile-mode:editprofile.html



so that the server can translate it correctly. If the directory doesn't actually exist, entering

/Example/



will redirect to the Example module main page (Apache only)

/Example/main.phtml



which in return gets rewritten invisibly to

/index.php?module=Example&func=main



Otherwise, if it does exist, the index file of the relevant directory will be opened.

Similarly, with

/HTML/filename.html



if the file exists, it will be opened, else PN will look for

/index.php?module=HTML&func=filename



It is still possible to tag on query strings like

/ModName/main.phtml?theme=seabreeze



or

/ModName/main-theme:seabreeze.phtml



will both be translated to

/index.php?module=ModName&func=main&theme=seabreeze



There are any number of possible ShortURL systems, the simplest being to simply chop the URL into virtual directories, like /News/123/ from the above News example as some do. Xaraya uses a variant of this for news, though it doesn't use mod_rewrite, so appears like

/index.php/news/123



Again, this is concise, but contains few meaningful keywords other than the module name News. You can combine the two methods for News and have

/News/Category/Topic/123/title-of-article



which works very well, but loses some of the elegance of the above philosophy, since the latter part breaks up the virtual file into 3 with no anchor words, which is not how we organise information.

For generic URLs, there are a number of methods; for instance Mambo, another CMS, use generic ShortURLs like

/component/option,com_newsfeeds/catid,5/Itemid,7/



for a News URL like

/index.php?option=com_newsfeeds&catid=5&Itemid=7



where the querystring values are grouped by commas and separated by forward slashes (virtual directories). It is a ShortURL, though in this case not shorter, and doesn't have any useful keywords, other than "newsfeed", and is not very human-readable. For a generic URL, this is somewhat unavoidable, but can be better than that.

This implementation also contain a way to customise ShortURLs on a per-module basis through a file called shorturls.php placed in the module folder (see the Example module), such as the News URLs, or 3rd party modules like PostCalendar, which instead of the full URL like

/index.php?module=PostCalendar&func=view&tplview=&viewtype=day&Date=20050405&pc_username=&pc_category=&pc_topic=&print=



with the above generic ShortURLs would be rendered as

/PostCalendar/view-viewtype:day-Date:20050405.html



but with customised URLs become

/Calendar/05-04-2005/day.html





The beauty is, though, once we've created the groundwork in the core of PostNuke, any implementation will be fairly easy.



1) Root-relative links: Links relative to the server site root (eg /nuke/filename.html), which stays static, as opposed to relative to the present file (eg filename.html).

2) Regular Expression (RegEx): A complex pattern-matching language that can look a bit like a mathematical formula, used in the Xanthia ShortURL filter at /modules/Xanthia/plugins/outputfilter.shorturls.php.





----------------------------------------------------------------

If this were Mambo, I'd charge you 80 Euros for all this (the price for SEF Advance), but because you're all such nice people (except that guy up the back, you know who you are :) ), I'll let you have it for free.



A PDF of the ReadMe included in the package, but with additional screenshots, is found here (570kb).



I've also written a more technical ReadMe on installing ShortURLs, included in the package under the docs folder, and also found here.



here's a test of the tab system using the Aqua theme. It also comes with an XP-styled theme and the default-CSS-based one. I hope you like it, because it took a lot of work to perfect.



OK, screenshots: Well, no point having screenshots of URLs, so here's some of the tab system and modified SeaBreeze and PostNukeBlue themes' Admin templates instead:



1. The main adminpanel in PostNukeBlue with the Aqua-themed tabs, hovering over the Settings panel.

2. Same as above, but with the Theme Override set under Modify Config and with a tabs.css stylesheets in the theme's style folder. The rounded corners are only visible in Mozilla/FireFox.

3. The Luna tab theme in SeaBreeze, hovering over the 3rd Party tab.

4. The Xanthia Admin tabs using Aqua tabs in PostNukeBlue, hovering on Theme Settings.





And finally, the downloads:

I started out fixing PN0.75, so there are 2 downloads: One for PN0.75, and one for PN0.76rc4. I'll update it once the PN0.76 final is released.

Please backup your site before installing these patches, since a lot of system files are replaced. The PostNuke 0.76rc4 ShortURL package is rather large, consisting of some 400 files in a 1Mb zip file. The PN0.75 package has some 170 files and is around 800kb. Most of the changes are drop-in changes that doesn't necessitate updating of modules, but there are some exceptions in the PN0.76 package, in particular the Settings and Polls modules, where you need to first go to the Module list, regenerate, and update. Specific patches for popular 3rd party templated modules like AutoTheme and PNphpBB2 are included, but only a limited number of 3rd party modules have been tested with this package. No changes are made to the database, but it is still a good idea to back that up as well. You have been warned.



PostNuke 0.75 ShortURL package (833kb)

PostNuke 0.76rc4 ShortURL package (1Mb)



Two of the updated core themes:

PostNukeBlue (249kb)

SeaBreeze (120kb)





Feel free to discuss this proposal in the forums.



Enjoy!





Martin Andersen 8/7/2005
7300