In non-CMS-powered websites, it’s common to create a bunch of HTML or PHP files that link to each other. To keep things organized, we usually name our files descriptively and bundle them into folders (aka directories). Once we upload them to our web server, they then become available at addresses that reflect these file and folder names.
So depending on the complexity of our site, we might end up with URLs that look like these:
http://example.com/about.html http://example.com/tutorials/making-a-web-page/ http://example.com/crazy-stuff/interactive-projects/project4/step2.php
The only tricky rule here is that when a URL ends in a slash (like the second example above), most web servers will load
index.html if either exists.
It’s important to recognize that these human-readable (or “clean”) URLs carry important information:
- Clean URLs can tell the user the name of the page they are on (e.g.
about.html). If a user bookmarks the URL, emails it to a friend, or saves it in a text file for later, they will still have a good idea of what the page contains without having to click on it.
- Clean URLs help the user understand the structure of the site: in the second and third examples of the site, the user might even try the shortcut of deleting part of the URL and hitting return, in order to view pages at
- Clean URLs give search engines like Google a better chance of understanding what content is located at each URL, and probably raising those pages’s rankings for given set of search keywords.
On big, database-driven sites, it’s common to see URLs that use unusual characters, coded language, and long strings of characters and numbers:
Here are some pretty bad real-world examples (my personal favorite is the LinkedIn one, which is what you get when you click on Joe Smith’s name after doing a search).
http://www.amazon.com/Making-Mirrors/dp/B006ZDU7QI/ref=sr_1_1?ie=UTF8&qid=1329151184&sr=8-1 http://search.yahoo.com/search;_ylt=A0oG7klsOzlPFWYApwBXNyoA;_ylc=X1MDMjc2NjY3OQRfcgMyBGFvAzAEZnIDeWZwLXQtNzAxBGhvc3RwdmlkA3Z3N3ZZa29HN3Y3dlVxbmNUczc3T3dBRFl5MHl2azg1TzJ3QUN4RXUEbl9ncHMDMTAEbl92cHMDMARvcmlnaW4Dc3JwBHF1ZXJ5A2pvZSBzbWl0aARzYW8DMQR2dGVzdGlkA0g0NjU-?p=joe+smith&fr2=sb-top&fr=yfp-t-701 http://www.linkedin.com/profile/view?id=34735695&authType=NAME_SEARCH&authToken=9hD5&locale=en_US&srchid=7a0a9838-fb41-4cf8-9a67-1d74433f4c37-0&srchindex=2&srchtotal=10883&goback=%2Efps_PBCK_joe+smith_*1_*1_*1_*1_*1_*1_*2_*1_Y_*1_*1_*1_false_1_R_*1_*51_*1_*51_true_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2&pvs=ps&trk=pp_profile_name_link
URLs like these are optimized for the databases that control the websites, not for the actual people who need to use those sites. They suffer from poorer Google rankings and aggravate users who have to work with them through common tasks like reading, storing, collecting, bookmarking, copying/pasting, etc.
There’s a reason for the ugliness
Wordpress uses a MYSQL database for storage and displays database content through PHP templates. This means that every time a post or page is displayed, Wordpress is assembling the HTML “on the fly.” There is no collection of static HTML files whose filenames would appear in the URLs — instead, page contents are retrieved through the same
GET requests that we used when developing forms last semester.
The URL for a
GET request begins with the address of a normal PHP file and is followed by a question mark, a variable name, the equals sign, and the variable contents. For example, a weather page might show different data for different zip codes, and the corresponding URLs might look like this:
As we saw with PHP forms, this is the way that a single PHP file (
showweather.php) can display different outputs. It receives the location variable as part of the URL, uses it in some what to retrieve weather data, and then outputs HTML that is appropriate for that URL.
The default (and “true” in some sense) URL of a typical Wordpress post looks like one of these:
This is in fact very similar to the weather example above:
index.php is the PHP file that can access a database and is set up to display different content depending on the variable it receives, while
p (which stands for Post) is the variable that will contain the ID number for each post. Because
index.php is always the default file shown in a directory, Wordpress usually drops it and just uses the
Clean URLs with Dynamic URL Rewriting
Whenever possible, we want to get rid of the ugly and noninformative default URLs and use “clean” URLs that look more like the simple, old-school, directory-path-leading-to-a-filename format.
Even though we won’t actually be creating directories and files, the web server software Apache has a module called
mod_rewrite module is installed on most (but not all) web servers, and it enables the server to generate and recognize human-readable URLS and map them to the correct posts in the Wordpress database.
Wordpress gives us a number of clean URL structures out of the box. Choose Settings > Permalinks and you’ll see them:
All except the first count as “clean”. On watkinswebdev.com, which is hosted by Dreamhost, security settings require that once we’ve chosen our structure, we need to edit or create our
.htaccess file in the root folder of our Wordpress installation. We’ll go over this in class, but note that if you ever change your Permalink Settings, you’ll need to update the
.htaccess file with the new code as well.
The Custom Structure option enables you might to choose your own format. One that I have used often is
/%category%/%postname%/. This would result in permalink URLs that looked something like this:
If you want to explore other possibilities, take a look at the Wordpress Codex’s page Using Permalinks for the rundown on all the possible variables.