Stuart
Moulthrop and
The
Hypertext should be an ideal tool for the
association of ideas in professional discourse; but the simplistic concept of
linking in the present World Wide Web does not deliver on this promise. To
remedy this situation we propose the citescape, a mechanism for representing
within a document links leading to it from later documents. This representation
maps the discursive "space" in ways that are vital to knowledge
building in professional communities. We give a technical description of the
citescape function and discuss its implications, contrasting this solution with
more restrictive alternatives. In a concluding thought experiment, we take the
problems implicit in citescapes as indicative of problems facing the Web as a
whole.
World Wide Web, HTML, knowledge representation,
electronic publishing, collaboration.
It is well
understood that knowledge in professional communities is not simply discovered,
but rather evolves through systematic association of claims. In what Bolter
calls "the late age of print" [BOLTER], this process necessarily
involves documents. What biologists, sociologists, lawyers, engineers,
physicians, and other knowledge workers know is largely embodied in their
professional literatures. The advancement of disciplinary knowledge depends on
social interaction among writers of mutually-referring documents [LATOUR]. The
value of new claims is established by patterns of support and dissent
manifested in papers, reports, reviews, opinions, and other public
communications. A claim survives and acquires influence or "reach"
only if it becomes attached to a network of references [KAUFER].
Proponents of hypertext
have portrayed it as a valuable tool for this social evolution of knowledge.
For Bush, automated retrieval and linking seemed a promising solution for the
great diversification of scientific information [BUSH]. For Nelson, hypertext
promised wider and more general access to written knowledge [NELSON]. Bolter
associated hypertext with a paradigm shift in literate culture, a transition
from fixed hierarchies to dynamic networks [BOLTER]. Forecasts concerning the
World Wide Web have been similarly enthusiastic, and with some reason [DECRAN].
If success is measured by numbers of users, documents, and links, then the Web
is an overwhelmingly successful implementation of hypertext. Some major bodies
of knowledge are significantly better connected than they were at the start of
this decade.
Success can also be
measured qualitatively, however, and by this index the Web may prove less
impressive. We might ask, what good is all this connection? As Bolter notes,
hypertext requires a multi-dimensional approach to information, introducing a
basic metaphor of "writing space." In recent years hypertext
designers and knowledge theorists have argued for greater richness and
sophistication in the conceptual, functional, and design space of hypertext
[KAPL94, KOLB, MARSHA]. Yet the writing space of today's Web is notably
limited, especially in encompassing those documentary relationships on which
communal knowledge-building depends.
In pre-Web days, hypertext
systems were often criticized for failing to afford the same functionality as
print [CARLSO]. Now we might reverse that complaint: hypertexts on the Web
generally resemble print documents far too closely. As Furuta and Marshall observe
, Web hypertexts are "passive" objects, representing information in
relatively fixed form, much as do periodicals and books [FURUTA]. Links among
these documents are one-way structures pointing away from the current location.
If the Web may be said to function as a language, that language has only one
verb (to go) and one predicate (go there). Because of this limitation, links in
Web hypertexts suffer the same temporal constraint that affects references in
print documents: they can only refer backward in time to prior sources. At
present, we have no way to provide for a "there" which isn't there
yet.
These limits can be
challenged, of course. The simplest expedient is to update Web documents
frequently -- the author changes her "Under Construction" sign to
read "Perpetually Under Construction" and resolves to write links to
later work as it comes along. Aside from being tedious, this strategy
reinforces another serious flaw that the Web has inherited from print: changes
in the document's links require the author's active engagement. If an author
chooses not to update, perhaps because she is distracted by other work, new
connections go unmapped.
A more interesting solution
has been proposed by the Digital Libraries group at
While the Stanford proposal
augments Web hypertext in important ways, it does not fully meet the
requirements of associative knowledge building. In the Stanford system, only
authenticated users may see annotations. This approach has its advantages,
which we discuss in the final part of this paper, but it also introduces
crucial limitations. To begin with, restricting access to specific groups
raises questions about group constitution and regulation. Unless membership can
be quickly and simply adjusted, groups will tend to be small, homogeneous, and
static. This is consistent with the earliest conception of hypertext (Bush's
Memex, which was meant for individuals and small teams). It does not conform to
more recent thinking, which stresses possibilities for "generalism"
and cross fertilization among disciplines [NELSON, BOLTER]. Indeed, the notion
of private annotations and pathways seems badly out of step with the open and
heterogeneous character of the Web.
As an alternative to restricted annotations, we propose a different device for associating Web documents. We call this mechanism a citescape, which may be defined generally as a dynamic, linked representation within a document which contains all the pages with hypertextual references (HREFs) to that document. The term is derived in analogy with "landscape" and connotes a visual survey or mapping of the document space that surrounds a given piece of writing. Figure 1 shows a prototype citescape.
Figure 1: Page-Specific
Citescape
The citescape has no exact
counterpart in print tradition, though it bears a family resemblance to a
citation index. As the illustration shows, however, there are crucial
differences between citescapes and citation indices.
Citation indices are large, complex works compiled laboriously and expensively. They often lag behind current research by several years. By contrast, the citescape is generated on demand (note the REFRESH button and dateline in Figure 1). Information in the citescape is as current as its source database. This database is continually and automatically updated (see the technical discussion in section 3). Entries in the citescape are live hypertext links, not passive records as in a printed index. Finally and perhaps most important, the citescape is integrated into its subject document: anyone who can see the document can see the existing citescape and generate a fresh one if desired. No special privileges are required.
Figure 2: Temporal
Relations Among Documents
By virtue of its
availability, integration and content, the citescape would add significant
value to a Web document. Since it contains live links, the citescape provides
quick access to other texts consituting the writing space surrounding the
subject document. These links may be valuable even if they are never followed.
Readers of a technical or professional communication could compare the number
of links with the date of the document's first appearance on the Web. This
could give a rough sense of the document's importance or suggest that it has
been neglected. Likewise, the names and origins of texts with links to the
subject document might also give crucial insights. In professional literatures,
an idea is known not so much by the company it keeps as by the company that
keeps it in circulation.
These benefits accrue
mainly to readers, but authors could derive value from the citescape as well.
Research workers would obviously benefit from an automated literature survey
and clipping service. These functions could also benefit creators of commercial
Web documents, who could use lists of links pointing to their documents to
quantify market impact. Using page-specific citescapes (described in section
3), authors could tell what parts of their documents were drawing the greatest
interest or generating the most controversy. This information could help
considerably with revision.
It is prudent to ask
whether the functions we propose can be carried out with resources currently
available on the Web. The Lycos spider retrieves and stores most of the
information citescapes would require, including the titles of documents and the
Uniform Resource Locators (URLs) they include. Its powerful search engine
enables users to query the database with considerable flexibility. Yet the
results of various combinations of search techniques, displayed in the Table,
show that for all its utility as a research tool, Lycos cannot currently
perform key citescape functions.
Although the Web is
relatively young, it is already a rich resource for research in a number of
knowledge domains, especially those concerned with electronic technologies and
communication. The documents we used for this test have been available for only
6-24 months, but each had already received a number of citations. These case
studies examine the reach of Schank's book-length hypertext, Engines for
Education [SCHANK], and two of Moulthrop's articles, "You Say You Want
a Revolution? Hypertext and the Laws of Media," published in Postmodern
Culture, a peer-reviewed electronic journal [MOUL91], and "It's Not
What You Think," a hypertextual letter to the editors of Newsweek
[MOUL95]. Although we cannot claim that this survey yields generalizable data,
the three publications represent a range of traditional intellectual genres,
the kinds of written records scholars and researchers use as the basis for
furthering intellectual work.
For each work, the Lycos
database was queried four times, each time with a different set of search
terms. Searches were conducted using the author's name, the title of the target
work, elements of the work's URL, and a combination either of author and title
or of author and URL element. Each search employed as many constraints as
possible to limit the number of irrelevant hits. Thus author's names and
keywords from titles were constrained so that only exact matches would be
retrieved. Whenever a search employed more than one search term, only those
hits containing all terms with a high adjacency factor were returned.
Table: Lycos-based
Searches for Citations
All searches except one
yielded some results. The search using unique elements of the URL for Schank's Engines
for Education proved fruitless because the URL depends on punctuation marks
which Lycos strips out in its search algorithms.
The variations in hits by
search method suggest that authors of Web documents use heterogenous styles for
citing other works on the Web. Such heterogeneity also characterizes works in
print and reflects rhetorical and stylistic differences between knowledge
domains. To be fully functional, however, a mechanism for aggregating all links
into a specific document should be as general as possible so that variations in
style do not thwart its purposes.
Although it is possible to
use Lycos and similar search mechanisms to track the reach of a document, the process
is difficult and the results uncertain. Search strategies need to be carefully
crafted to exploit the power of the database and search engine while avoiding
their limitations and prohibitions. The strategies must also take into account
some semantic features of the information that can be used to identify the
target document. The results obtained will certainly include a large number of
irrelevant hits, many of which will need to be investigated before they can be
eliminated from the list.
The most serious limitation
of Lycos, however, remains its distance from the target document. As we see it,
a key feature of citescapes is their incorporation within the target document.
Such inclusion maps the intellectual terrain of which the document is a part and
permits the document to grow in complexity as the various enterprises to which
it is important also grow and change.
The
mechanism we propose for implementing citescapes consists of three components:
a citescape database server, a Common Gateway Interface program, and a proposed
extension to Hypertext Markup Language (HTML). Each of these parts is explained
in detail below:
The citescape mechanism
requires a comprehensive, regularly updated database which contains the
destination URL from all hypertextual references (HREFs) in all pages on all
publicly accessible servers on the Web. The database also contains aggregation
information (arguments to PARTOF, see 3.3 below). The information is gathered
automatically by a survey daemon of the type used by current search databases
like Lycos and WebCrawler. The search mechanism seeks an exact match to the URL
in the query request.
This is obviously the most
elaborate part of the proposal. As the previous section shows, however, it is
clearly feasible.
This program identifies the
type of citescape being requested, issues a query to the Citescape Server,
creates a citescape page in the current document if necessary, and places the
results of the query in that page.
Citescape queries may be of
two types. A page-specific query (the default) returns all URLs containing
links to the present Web page. Figure 1 shows results of a page-specific
citescape query.
It may often be desirable to generate a citescape for an aggregation of Web pages. This is done with a document-specific query, which returns all URLs having links to any page within the current hypertextual document, a document defined here as a coordinated set of pages (see 3.3 below). Figure 3 shows a document-specific citescape.
Figure 3:
Document-Specific Citescape
Many documents on the Web
consist of numerous pages connected by networks of hypertext links. Kaplan's
hypertext "E-Literacies," for example, comprises approximately 35
pages and 180 links, most of which refer to other pages within the document
[KAPL95]. Current implementations of HTML offer no way to identify a page as
part of an aggregate structure.
The attribute PARTOF,
appearing within the tag of an HTML page, fills this gap. The argument of
PARTOF is the name of the document to which the present page belongs. The usage
PARTOF="W3-95" indicates that the current page is a component of a
hypertext called W3-95. Since hypertextual linking allows authors to use a
single page within several documents, PARTOF accepts multiple arguments
separated by commas. PARTOF="W3-95, spaceProgram, Perisites2" indicates
that the present page is a component of two other documents, spaceProgram and
Perisites2, as well as W3-95.
We prefer to let authors
decide whether a page is a legitimate part of a hypertextual document, as
opposed to simply being reachable by a link from that document. A good rule of
thumb might be whether or not the page in question contains links to other
pages in the subject document. Authors may disregard this rule if they are
interested in broader aggregations.
Pages with no PARTOF
attribute are treated as separate entities. A document-specific citescape
request on such a document defaults to a page-specific request.
Critics of
electronic writing often complain that it confers too much anonymity [STOLL, TUMAN].
Suppose we find on the Web a lengthy technical paper about magnetohydrodynamics
(MHD), full of data and elaborate equations. As people unacquainted with the
field, we might assume this is the work of an engineer or physicist -- only to
be told that it was written by a very bright 12-year-old as appendix to an
amateurish science fiction novel. The data are invented, the equations flawed
and fundamentally meaningless. The joke is on us. In print, the critics argue,
this mistake would never happen. Readers are protected by editors, reviewers,
publishers, and other gatekeepers absent from the Web.
A citescape might provide
partial protection from this trouble. If we find no subsequent links on the
citescape, we might view the paper in question more skeptically. If the links
we find all seem to be from science fiction writers, or from 12-year-olds, the
game would likely be up.
However, we can construct
equally plausible scenarios in which citescapes make it harder to separate
intellectual signal from noise. Suppose the paper on MHD is indeed the work of
a rigorously trained researcher in engineering physics. However, the researcher
has ventured outside his main specialty and is writing in a highly speculative
vein (thus the absence of co-authors). Let us suppose the citescape for this
paper contains three links. The first two are from hypertexts written by
mainstream academic researchers, containing comments sharply critical of the
author's ideas on MHD. When we follow the third link, we find a rambling, semi-coherent
tract about cattle mutilations. This author seems to think flying saucers are
powered by MHD. Given these results, how should we characterize the original
paper: as an interesting theoretical venture or as the sort of pseudo-science
that appeals to cranks?
This scenario suggests that
citescapes will not necessarily improve professional discourse on the Web. They
could even do the opposite, exposing serious work to intrusions from the
lunatic fringe. In the Web equivalent of "spamming," popular or
important work might be peppered with links from authors interested mainly in
self promotion. In one sense a citescape functions as a window on a surrounding
discursive space; but in another sense it is an open door, possibly inviting
unwanted guests. If our goal is to maintain strict control over knowledge
claims, then the private, restricted meta-links envisioned by the Stanford
Digital Libraries Group could be more appropriate.
But if strict control of
information is paramount, why trade print for hypertext in the first place?
From its inception, hypertext has been described as a powerful tool for
associating ideas. To illustrate uses of his Memex system, Bush speculated that
it would allow researchers to connect chains of diverse ideas, moving from Turkish
crossbows to the properties of various woods to the vagaries of strategic
doctrine [BUSH]. Bolter notes that in hypertext there is "no reason not to
include disparate materials in one electronic network" [BOLTER, p.7].
"An electronic book," he writes, "is a structure that reaches
out to other structures, not only metaphorically, as does a printed book, but
operationally" [p.87].
Much of the potential value
of hypertext stems from this facility for connection. Interdisciplinary
thinking represents a primary source of intellectual breakthrough and critique.
Kaufer and Carley note that "authors associated with the most authority
and change are not rooted within a single intellectual community. Instead, they
are authors on the move, the maverick, the eccentric, the outsider, the
intellectual migrant, trained in one community and rising to fame after finding
their way to another" [KAUFER, p.394]. Kekulé von Stradonitz discovered
the benzene ring largely because he trained in architecture and switched to chemistry
[ULMER]. Mandelbrot's mathematical insights on fractal geometry yielded
important crossovers in economics, population genetics, and biology [GLEICK].
If Penrose's recent speculations are correct, then artificial intelligence and
neuropsychology have much to learn from quantum physics [PENROS]. As Kaufer and
Carley show, print has facilitated these cross-fertilizations. Hypertext could
conceivably do much more; but only if we understand its difference from print.
The noise problem can be
dealt with simply enough. Once they have access to citescapes for their texts,
Web authors can create edited versions, screening out links they consider
inappropriate, malicious, or even embarrassing. These authorized citescapes
would coexist hypertextually with the unedited versions. Readers might be
encouraged to use the canonical citescape instead of the raw cut, but they
would be free to compare the two and draw their own conclusions. This scheme
lets authors filter out anything they find too noisy, but also preserves the
de-selected information in case it comes from an emerging genius and not a
hopeless crank.
We believe
the citescape mechanism is a viable technical proposal. At the same time,
questions about its implications also suggest an important thought experiment.
Suppose that the citescape function were already available. Would it be
perceived as an overall benefit or harm to the World Wide Web? What sorts of Web
users would be likely to adopt this function, and which would reject it? What
would rejection say about the Web and the uses for which we intend it?
Citescapes pose a clear
alternative to more localized structures of association. As we have indicated,
these restrictive structures promise more homogeneity, better noise
suppression, and tighter authorial control over electronic writing. Such
qualities may prove more desirable to future Web users than the flexibility,
heterogeneity, and noisiness that citescapes support.
Noise suppression always
has a cost, however. Restrictive mechanisms would likely inhibit intellectual
"migration," reinscribe strict disciplinary boundaries, and thus
deter innovation. The present Web offers a viable if unruly alternative to the
regimes of print. Citescapes augment this alternative, aiming to support the
wide circulation of knowledge implicit in both the Web and the hypertext
concept itself.
Is hypertext really what we
want? Or would we prefer "electronic books" and "digital
libraries" -- mechanisms that restore a pre-Internet social order? These
are not simply technical questions. Technologies have social implications, just
as social agendas inevitably shape technologies. The conceptual problems posed
by citescapes may well model issues about communication and control that are
salient for cyberspace in general.
Bolter, J.D. (1991) Writing space: the
Computer, hypertext, and the history of writing. Erlbaum.
Bush, V. (1945) As we may think. The
Atlantic Monthly, July. URL: http://www.csi.uottawa.ca/~dduchier/misc/vbush/as-we-may-think.html.
Carlson, P. (1990) The rhetoric of hypertext, Hypermedia
2:109-31.
December, J. and N. Randall (1994) The World
Wide Web unleashed, SAMS.
Furuta, R. and
Gleick, J. (1987) Chaos: making a new
science. Viking.
Kaplan, N. and S. Moulthrop. (1994) Where no
mind has gone before: ontological design for virtual spaces. ECHT '94
proceedings. European Conference on Hypermedia Technology.
Kaplan, N. (1995) E-literacies: politexts,
hypertexts, and other cultural formations in the late age of print. Computer-mediated
communication magazine 2(3). URL: http://sunsite.unc.edu/cmc/mag/1995/mar/kaplan.html.
Kaufer, D. and K. Carley. (1993) Communication
at a distance: the influence of print on sociocultural organization and change.
Erlbaum.
Kolb, D. (1995) Socrates in the labyrinth:
hypertext, argument, philosophy. Eastgate systems.
Latour, B. (1987) Science in action: how to
follow scientists and engineers through society. Harvard UP.
Marshall, C., F. Shipman, and J. Coombs. (1994)
VIKI: spatial hypertext supporting emergent structure. ECHT '94 proceedings.
European Conference on Hypermedia Technology.
Moulthrop, S. (1995) It's not what you think:
Newsweek's tech-no mania. URL: http://www.charm.net/~sam/inwyt/inwyt.html.
Moulthrop, S. (1991) You say you want a
revolution? hypertext and the laws of media. Postmodern culture 1:3.
URL: http://jefferson.village.virginia.edu/pmc/issue.591/moulthro.591.
Nelson, T. (1987) Literary machines.
Mindful press.
Penrose, R. (1994) Shadows of the mind: the
search for the missing science of consciousness.
Röscheisen, M., C. Mogensen, and T. Winograd.
(1995) Beyond browsing: shared comments, SOAPs, trails, and on-line
communities. 1995 World Wide Web Conference. URL: http://www-diglib.stanford.edu/diglib/pub/reports/brio_www95.html.
Schank, R. (1994) Engines for education.
Erlbaum. URL: http://www.ils.nwu.edu/~e_for_e/.
Stoll, C. (1995) Silicon snake oil.
Doubleday.
Tuman, M. (1992). Word perfect: literacy in
the computer age. U.
Ulmer, G. (1990). Teletheory: grammatology
in the age of video. Routledge.
Thanks to
Christine Boese of Rensselaer Polytechnic Institute for pointing out the
temporal aspects of the citescape mechanism.
URL: http://iat.ubalt.edu/moulthrop/essays/citescapes/citescapes.html