Connectivity of the Stat/Math Website

The Web (and hypertext in general) brings the great advantage of connectivity to the world of information. Our websites may be organized in whatever fashion is most appropriate, be it Network, Database, Linear, or Tree (as in our case). Hypertext allows us to break out of the structure without damaging it - to create cross-references, links to more information, tips for special cases, etc.

When matched with a clear navigational scheme, hypertext allows a user to choose their own level of content. They may wander where thier interests lie while knowing exactly where they are.

In reality, hypertext can also obscure the content by providing "too many" side-roads. If connectivity is too high, the structure starts to disappear, and users get lost. Ina ddition, lists of links can become overwhelming to use and maintain.

It has been historically difficult to visualize the connectivity of a website. This may be due to the non-technical (or at least atheortical) backgound of many webmasters.

The notion of a "connectivity matrix" comes from applied mathematics. It is a matrix (a table of numbers) filled with ones and zeros. Each page on the website is assigned a row and a column. For example, your homepage might be assigned to row 1, column 1. Each page is assigned a number (usually done by sorting their URLs alphabetically).

The site is then traversed, starting with the homepage. For each page we visit, we pay attention to the row of the matrix associated with that URL. For example, suppose the homepage is assigned row 28. Then we scan the homepage, and place a "1" in each column of row 28 that the homepage points to. If the homepage points to a page named "foo", with row index 76, we should place a "1" at row 28, column 76. All other positions in the matrix are filled with zeros.

The task is usually automated by a web robot, with the data being imported to Matlab for analysis.

Interpretation

It is best to view the matrix graphically. The images below were created using Matlab's spy command. Notice that if a page is a dead end (no outgoing links), the row will be blank. A sitemap page would have nearly all of its row filled. L-shaped areas that are symmetric along the diagonal are indicators of a section that has been cross-linked to itself.

Note that the matrix represents the parts of your site you can reach with one click. If you square the matrix, you are using two clicks, and cubing means three clicks. You can see the connectivity of your unfold as you raise the matrix to higher powers.

Eventually the matrix will stop changing. This is called the equilibrium point. If your matrix stops changing at 6 clicks, then you know that 7 clicks will do the same.

Another nice piece of information you can gleam from your matrice is the percentage of page-page combinations you can reach in n clicks. For example, suppose your site has 300 pages. 100 percent coverage would mean all 90,000 combinations would be possible. The percentage can be found for a matrix by counting the number of non-zero entries, and then dividing by the square of the number of pages. The Matlab command size(find(matrix)) is quite useful for this task. Find the percentage for 1,2,3... n clicks, up the equilibrium point. Below is a chart of the Stat/Math Center's percentages.

website chart

Gallery of Matrices

Each of these matrices was generated using our site-mapping tool and Matlab.

0310 matrix 0614 matrix 0811 matrix