The article by David Hawking was all about search engines. Search engines index and answer billions of queries per day. They provide high quality answers and reject low value content. The major search engines named in this article are Google, Yahoo, and Microsoft. A large, geographically distributed infrastructure is neccessary in order to support a search engine. Search engines use crawling algorithms to compile lists of URLs. Crawlers use links in documents to find high quality websites. Documents without links are often not searched by crawlers. Crawlers can be prone to system problems and failures, and spammers in addition to a failure to consider unlinled documents. The second part of this article concerned the methods used by crawlers to index documents, usually by creating an inverted file that is stored, often compressed, in memory. Search engines often maintain lists of common queries in order to return search results quickly.
The next two articles concerned the deep, hidden, or invisible web as opposed to the surface web. Search engines usually do a poor job of accessing quality content from the deep web since article in the deep web are often html documents without links. The article about the Open Archives Initiative Protocol for Metadata Harvesting discusses the open access method for gaining federated access to eprint archives through metadata harvesting and aggregation. OAI's goal is to develop and promote interoperability standards and efficient dissemination of content. The OAIPMH protocal is based on common standards and was funded by grants. OAIPMH attempts to provide better communication between data providers who build repositories and collections with important content and services providers who are harvesters that build services for collections and contents. The final article, which was a bit dated concerned the use of a for cost product called "Brightlight" which claimed to be a search engine capable of searching the entire web, the surface and the deep web.
Senior Computer Class 2011 Photo Album
14 years ago

1 comment:
The David Hawking article was interesting, a billion queries being answered per day is alot questions. The open archives article and federated access was also interesting.
Post a Comment