Irony on the rocks with a twist...

3.30.2004

What Lies Beneath the Web?

Last year, I was given the opportunity to do a heck of a lot of research using the Web. As a result of this research, I developed a keen sense of where to find things and which search engines were appropriate for which types of searches. Contrary to popular belief, Google is not always the best choice, (although it must be said that few search engines spend so much of their time reinventing their own wheel as Google does).

Like anything else, the Web has layers. The deeper into those layers one delves, the more fascinating it becomes. It is something similar to the geologists who can look at the layers in rock and tell you how old they are. The web constantly reinvents itself at the "top layer," compressing and pushing the layers underneath downward.

One of the pivotal moments in the development of the web was the addition of meta tags, or keywords to web pages. This made it really easy for search engines to compile and reference lists of web pages by similar subjects. Of course it wouldn't take long for the Spam and porn industry to catch on, and start coming up with all sorts of ways to scam the search engines into giving their sites more "hits."
Meta tags brought websites "to the surface," but in so doing, it pushed others down.

Traditionally, search engines relied either on calling up meta tags, or on a predetermined list of websites chosen by humans. Soon the search engines would evolve to include both these techniques, and beyond. These new search engines were called Meta-search engines; such as Metacrawler, one of the first I ever used.

The nature of the web began to change. Sites which had commercial content and were more constantly updated stayed closer the surface, while sites which were not so well maintained began to fall away. Many of the sites that were pushed down were maintained by more transient populations, such as universities, government agencies, and other non-commercial entities. Many of these sites contained massive amounts of useful information, but they were no longer able to be easily called up by doing a more "modern" browser search. Imagine being a non-swimmer in a pool for the first time. More than likely, you'll stay to the shallow end. Maybe you'll never feel the need to head to deeper waters, but there it is, just the same.

The deep web consists of a wealth of sites that have sunk below the more "popular" layer of the web. Alot of these sites contain huge amounts of database information that are extremely relevant in research applications. Instead of simply leaving these poor sites to their demise, new techniques were developed to bring them back to the surface.

This led to the advent of a new kind of search engine: the trawler. Traditionally, trawlers are boats, primarily used in commercial fishing. The boats drag large nets behind them, scooping up schools of fish, shrimp, whatever. In the days before deep water diving, a trawler could be outfitted with a series of hooks on a chain to literally "drag" bodies and objects off the bottom in any deep water. The Web Trawler is software that uses a number of different methods to search for web sites. A very good explaination is available at http://turbo10.com/trawler.html Turbo 10.

I discovered that the web was really a series of webs, one built right on top of the other, the way some civilizations built their cities upon the ruins of previous civilizations. It was like discovering the World Wide Web for the first time all over again. I had started looking more closely into Internet-based research for my wife, who is pursuing her doctorate, and for my supervisor, who needed to compile some statistical data in a hurry. I ended up writing a very long guide to internet research on an extremely broad range of topics. I began to distribute it to friends, collegues, and fellow students. I found it very useful to group these sites loosely by their main objectives, and I gave them a rating based upon functionality, commercialism, and ease of use. One of these days, I might even get around to publishing it on the web for everyone to use.

Some more places to learn about the Deep Web:

Virtual Private Library Deep Web Research Site

SUNY- Albany Library Site on the Deep Web

Salon Article on The Deep Web

About.Com site on the Deep Web

An excellent white paper on the Deep Web

Same white paper as above in Adobe Acrobat .pdf format

TANSTAAFL!

0 comments

Irony on the rocks with a twist...

3.30.2004

What Lies Beneath the Web?

TANSTAAFL!

0 Comments:

Site Feed(via ATOM)

LINKS

ARCHIVES