Why is it difficult to find images on the Web?
There are two basic reasons which explain why images are difficult to find.
First, the Internet and the Web were originally created to share text-based, as opposed to visual, information. Although the Internet has existed since 1969 and the World Wide Web since 1990, the creators of both intended primarily to promote and share textual information via hypertext and hyperlinks. Therefore, the majority of Internet search tools (or automated agents) have largely focused upon analyzing and indexing text, not images. As the Web has expanded over the past 17 years, however, more and more multimedia and visual content is online.
Second, the World Wide Web is enormous. But, how big is it? There are certainly millions, perhaps billions, of images available on the Web and more and more are added every day. In fact, according to a landmark study of the Internet conducted by Inktomi and the NEC Research Institute in January 2000, there were more than 1 billion unique pages on the Web (Inktomi-NEC Web Study, currently unavailable online). In 2004, Google.com listed that there were 4,285,199,774 web pages in its index. This represents a rough increase of about 1 billion pages added to the Web each year. (As of August 2005, Google.com stopped sharing this data.) Despite the difficulty in estimating the number of Web pages, the Netcraft Web Server Survey indicates that, as of October 2007, there were 142,805,398 sites. Considering that there were also over 200 million domain names as of January 2004 and now nearly 490 million domain names as of July 2007 (ISC Internet Domain Survey), searching for anything---whether images or text---can indeed be a time-consuming and frustrating task. You can literally spend hours searching for content, unless you know where and how to search effectively!
What information will this presentation provide?
Within this context, this presentation will introduce users to the general issues, technologies, and resources associated with retrieving digital images on the Web, including the following:
Standard image file formats
"Image Bots" or automated image searching tools
Meta image resources, archives, and databases
Subject-specific resources
Image copyright issues
For future reference, a bibliography of other noteworthy paper- and Web-based resources is also provided at the end of this presentation. The author always welcomes comments, corrections, and/or questions at heidi.abbey@psu.edu.
![]()
© 1999-2007.
Heidi N. Abbey. All Rights Reserved. |
Abbey
|