What are search engines?Brief History
The first Web search engine was Wanlex, a now-defunct index collected by the World Wide Web Wanderer, a web crawler developed by Matthew Gray at MIT in 1993. Another very early search engine, Aliweb, also appeared in 1993. JumpStation (released in early 1994) used a crawler to find web pages for searching, but search was limited to the title of web pages only. One of the first "full text" crawler-based search engines was WebCrawler, which came out in 1994. Unlike its predecessors, it let users search for any word in any webpage, which became the standard for all major search engines since. It was also the first one to be widely known by the public. Also in 1994 Lycos (which started at Carnegie Mellon University) was launched, and became a major commercial endeavor.
Soon after, many search engines appeared and vied for popularity. These included Magellan, Excite, Infoseek, Inktomi, Northern Light, and AltaVista. Yahoo! was among the most popular ways for people to find web pages of interest, but its search function operated on its web directory, rather than full-text copies of web pages. Information seekers could also browse the directory instead of doing a keyword-based search.
In 1996, Netscape was looking to give a single search engine an exclusive deal to be their featured search engine. There was so much interest that instead a deal was struck with Netscape by 5 of the major search engines, where for $5Million per year each search engine would be in a rotation on the Netscape search engine page. These five engines were: Yahoo!, Magellan, Lycos, Infoseek and Excite.
Search engines were also known as some of the brightest stars in the Internet investing frenzy that occurred in the late 1990s. Several companies entered the market spectacularly, receiving record gains during their initial public offerings. Some have taken down their public search engine, and are marketing enterprise-only editions, such as Northern Light. Many search engine companies were caught up in the dot-com bubble, a speculation-driven market boom that peaked in 1999 and ended in 2001.
Around 2000, the Google search engine rose to prominence. The company achieved better results for many searches with an innovation called PageRank. This iterative algorithm ranks web pages based on the number and PageRank of other web sites and pages that link there, on the premise that good or desirable pages are linked to more than others. Google also maintained a minimalist interface to its search engine. In contrast, many of its competitors embedded a search engine in a web portal.
By 2000, Yahoo was providing search services based on Inktomi's search engine. Yahoo! acquired Inktomi in 2002, and Overture (which owned AlltheWeb and AltaVista) in 2003. Yahoo! switched to Google's search engine until 2004, when it launched its own search engine based on the combined technologies of its acquisitions.
Microsoft first launched MSN Search (since re-branded Live Search) in the fall of 1998 using search results from Inktomi. In early 1999 the site began to display listings from Looksmart blended with results from Inktomi except for a short time in 1999 when results from AltaVista were used instead. In 2004, Microsoft began a transition to its own search technology, powered by its own web crawler (called msnbot).
As of late 2007, Google was by far the most popular Web search engine worldwide.  A number of country-specific search engine companies have become prominent; for example Baidu is the most popular search engine in the People's Republic of China and guruji.com in India.
How Web search engines work
A search engine operates, in the following order
1. Web crawling
Web search engines work by storing information about many web pages, which they retrieve from the WWW itself. These pages are retrieved by a Web crawler (sometimes also known as a spider) — an automated Web browser which follows every link it sees. Exclusions can be made by the use of robots.txt. The contents of each page are then analyzed to determine how it should be indexed (for example, words are extracted from the titles, headings, or special fields called meta tags). Data about web pages are stored in an index database for use in later queries. Some search engines, such as Google, store all or part of the source page (referred to as a cache) as well as information about the web pages, whereas others, such as AltaVista, store every word of every page they find.
This cached page always holds the actual search text since it is the one that was actually indexed, so it can be very useful when the content of the current page has been updated and the search terms are no longer in it. This problem might be considered to be a mild form of linkrot, and Google's handling of it increases usability by satisfying user expectations that the search terms will be on the returned webpage. This satisfies the principle of least astonishment since the user normally expects the search terms to be on the returned pages. Increased search relevance makes these cached pages very useful, even beyond the fact that they may contain data that may no longer be available elsewhere.
When a user enters a query into a search engine (typically by using key words), the engine examines its index and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the text. Most search engines support the use of the boolean operators AND, OR and NOT to further specify the search query. Some search engines provide an advanced feature called proximity search which allows users to define the distance between keywords.
The usefulness of a search engine depends on the relevance of the result set it gives back. While there may be millions of webpages that include a particular word or phrase, some pages may be more relevant, popular, or authoritative than others. Most search engines employ methods to rank the results to provide the "best" results first. How a search engine decides which pages are the best matches, and what order the results should be shown in, varies widely from one engine to another. The methods also change over time as Internet usage changes and new techniques evolve.
Most Web search engines are commercial ventures supported by advertising revenue and, as a result, some employ the practice of allowing advertisers to pay money to have their listings ranked higher in search results. Those search engines which do not accept money for their search engine results make money by running search related ads alongside the regular search engine results. The search engines make money every time someone clicks on one of these ads.
Revenue in the web search portals industry is projected to grow in 2008 by 13.4 percent, with broadband connections expected to rise by 15.1 percent. Between 2008 and 2012, industry revenue is projected to rise by 56 percent as Internet penetration still has some way to go to reach full saturation in American households. Furthermore, broadband services are projected to account for an ever increasing share of domestic Internet users, rising to 118.7 million by 2012, with an increasing share accounted for by fiber-optic and high speed cable lines.
List of major search engines
The search engines below are all excellent choices to start with when searching for information.Google
Google has a well-deserved reputation as the top choice for those searching the web. The crawler-based service provides both comprehensive coverage of the web along with great relevancy. It's highly recommended as a first stop in your hunt for whatever you are looking for.
Google provides the option to find more than web pages, however. Using on the top of the search box on the Google home page, you can easily seek out images from across the web, discussions that are taking place on Usenet newsgroups, locate news information or perform product searching. Using the More link provides access to human-compiled information from the Open Directory (see below), catalog searching and other services.
Google is also known for the wide range of features it offers, such as cached links that let you "resurrect" dead pages or see older versions of recently changed ones. It offers excellent spell checking, easy access to dictionary definitions, integration of stock quotes, street maps, telephone numbers and more. The Google Toolbar has also won a popular following for the easy access it provides to Google and its features directly from the Internet Explorer browser.
In addition to Google's unpaid editorial results, the company also operates its own advertising programs. The cost-per-click AdWords program places ads on Google as well as some of Google's partners. Similarly, Google is also a provider of unpaid editorial results to some other search engines.
Google was originally a Stanford University project by students Larry Page and Sergey Brin called BackRub. By 1998, the name had been changed to Google, and the project jumped off campus and became the private company Google. It remains privately held today.Yahoo
Launched in 1994, Yahoo is the web's oldest "directory," a place where human editors organize web sites into categories. However, in October 2002, Yahoo made a giant shift to crawler-based listings for its main results. These came from Google until February 2004. Now, Yahoo uses its own search technology.
In addition to excellent search results, you can use tabs above the search box on the Yahoo home page to seek images, Yellow Page listings or use Yahoo's excellent shopping search engine. Or visit the Yahoo Search home page, where even more specialized search options are offered.
The Yahoo Directory still survives. You'll notice "category" links below some of the sites lists in response to a keyword search. When offered, these will take you to a list of web sites that have been reviewed and approved by a human editor.
It's also possible to do a pure search of just the human-compiled Yahoo Directory, which is how the old or "classic" Yahoo used to work. To do this, search from the Yahoo Directory home page, as opposed to the regular Yahoo.com home page. Then you'll get both directory category links ("Related Directory Categories") and "Directory Results," which are the top web site matches drawn from all categories of the Yahoo Directory.
Sites pay a fee to be included in the Yahoo Directory's commercial listings, though they must meet editor approval before being accepted. Non-commercial content is accepted for free. Yahoo's content acquisition program also offers paid inclusion, where sites can also pay to be included in Yahoo's crawler-based results. This doesn't guarantee ranking, Yahoo promises. The CAP program also bring in content from non-profit organizations for free.
Like Google, Yahoo sells paid placement advertising links that appear on its own site and which are distributed to others. Yahoo purchased Overture in October 2003.
Overture was formerly called GoTo until late 2001. More about it can be found on the Paid Listings Search Engines page. Overture purchased AllTheWeb (below) in March 2003 and acquired AltaVista (below) in April 2003. Now Yahoo owns these, gained as from its purchase of Overture.
Technology AltaVista and AllTheWeb was combined with that of Inktomi, a crawler-based search engine that grew out UC Berkeley and then launched as its own company in 1996, to make the current Yahoo crawler. Yahoo purchased Inktomi in March 2003.Live Search
Live Search (formerly Windows Live Search) is the name of Microsoft's web search engine, successor to MSN Search, designed to compete with the industry leaders Google and Yahoo. The search engine offers some innovative features, such as the ability to view additional search results on the same web page (instead of needing to click through to subsequent search result pages) and the ability to adjust the amount of information displayed for each search-result (i.e. just the title, a short summary, or a longer summary). It also allows the user to save searches and see them updated automatically on Live.com.
The service was previously powered by LookSmart results and gained top marks for having its own team of editors that monitored the most popular searches being performed to hand-pick sites believed to be the most relevant. The system worked well.Ask
Ask Jeeves initially gained fame in 1998 and 1999 as being the "natural language" search engine that let you search by asking questions and responded with what seemed to be the right answer to everything.
In reality, technology wasn't what made Ask Jeeves perform so well. Behind the scenes, the company at one point had about 100 editors who monitored search logs. They then went out onto the web and located what seemed to be the best sites to match the most popular queries.
In 1999, Ask acquired Direct Hit, which had developed the world's first "click popularity" search technology. Then, in 2001, Ask acquired Teoma's unique index and search relevancy technology. Teoma was based upon the clustering concept of subject-specific popularity.
Today, Ask depends on crawler-based technology to provide results to its users. These results come from the Teoma algorithm, now known as ExpertRank.AllTheWeb.com
Powered by Yahoo, you may find AllTheWeb a lighter, more customizable and pleasant "pure search" experience than you get at Yahoo itself. The focus is on web search, but news, picture, video, MP3 and FTP search are also offered.
AllTheWeb.com was previously owned by a company called FAST and used as a showcase for that company's web search technology. That's why you sometimes may sometimes hear AllTheWeb.com also referred to as FAST or FAST Search. However, the search engine was purchased by search provider Overture in late April 2003, then later become Yahoo's property when Yahoo bought Overture.AOL Search
AOL Search provides users with editorial listings that come Google's crawler-based index. Indeed, the same search on Google and AOL Search will come up with very similar matches. So, why would you use AOL Search? Primarily because you are an AOL user. The "internal" version of AOL Search provides links to content only available within the AOL online service. In this way, you can search AOL and the entire web at the same time. The "external" version lacks these links. Why wouldn't you use AOL Search? If you like Google, many of Google's features such as "cached" pages are not offered by AOL Search.HotBot
HotBot provides easy access to the web's three major crawler-based search engines: Yahoo, Google and Teoma. Unlike a , it cannot blend the results from all of these crawlers together. Nevertheless, it's a fast, easy way to get different web search "opinions" in one place.
HotBot's "choose a search engine" interface was introduced in December 2002. However, HotBot has a long history as a search brand before this date.
HotBot debuted in May 1996, it gained a strong following among serious searchers for the quality and comprehensiveness of its crawler-based results, which were provided by Inktomi, at the time. It also caught the attention of experienced web users and techies, especially for the unusual colors and interface it continues to sport today.
HotBot gained more notoriety when it switched over to using Direct Hit's "clickthrough" results for its main listings in 1999. Direct Hit was then one of the "hot" search engines that had recently appeared. Unfortunately, the quality of Direct Hit's results couldn't match those of another "hot" player that had debuted at the same time, Google. HotBot's popularity began to drop.
Even worse, HotBot also suffered by being owned by Lycos (now Terra Lycos). Lycos had acquired HotBot when it purchased Wired Digital in October 1998. Lycos failed to make search a priority on its flagship Lycos site as well as HotBot through much of 1999 and 2000, as it focused instead on adding "portal" features. The company refocused on search in late 2001, making significant improvements to the Lycos site and, as noted, reworked the HotBot site at the end of 2002.AltaVista
AltaVista opened in December 1995 and for several years was the "Google" of its day, in terms of providing relevant results and having a loyal group of users that loved the service.
Sadly, an attempt to turn AltaVista into a portal site in 1998 saw the company lose track of the importance of search. Over time, relevancy dropped, as did the freshness of AltaVista's listings and the crawler's coverage of the web.
Today, AltaVista is once again focused on search. Results come from Yahoo, and tabs above the search box let you go beyond web search to find images, MP3/Audio, Video, human category listings and news results. If you want a lighter-feel than Yahoo but to still have Yahoo's results, AltaVista is worth considering.
AltaVista was originally owned by Digital, then taken over by Compaq, when that company purchased Digital in 1998. AltaVista was later spun off into a private company, controlled by CMGI. Overture purchasing the search engine in April 2003, then it later became part of Yahoo when Yahoo bought Overture.Gigablast
Compared to Google, Yahoo or even Teoma, Gigablast has a tiny index of the web. However, the service is constantly gaining new and interesting features. Give it a whirl, if you want to try something experimental yet dependable.LookSmart
LookSmart is primarily a human-compiled directory of web sites. It gathers its listings in two ways. Commercial sites pay to be listed in its commercial categories, making the service very much like an electronic "Yellow Pages." However, volunteer editors at the LookSmart-owned Zeal directory also catalog sites into non-commercial categories for free. Though Zeal is a separate web site, its listings are integrated into LookSmart's results.
LookSmart launched independently in October 1996, was backed by Reader's Digest for about a year, and then company executives bought back control of the service.
LookSmart also bought the WiseNut crawler-based search engine in April 2002. WiseNut's are offered through the LookSmart via its Web tab above the search box. Unlike its competitors, the WiseNut crawler has often been out of date, sometimes for months at a time.
Finally, the real gem at LookSmart can be found via its Articles tab. That provides access to content from thousands of periodicals.Lycos
Lycos is one of the oldest search engines on the web, launched in 1994. It ceased crawling the web for its own listings in April 1999 and instead provides access to human-powered results from LookSmart for popular queries and crawler-based results from Yahoo for others.
"Fast Forward" lets you see search results in one side of your screen and the actual pages listed in another. Relevant categories of human-compiled information from the Open Directory appear at the bottom of the search results page.Netscape Search http://search.netscape.com
Owned by AOL Time Warner, Netscape Search uses Google for its main listings, just as does AOL's other major search site, AOL Search. So why use Netscape Search rather than Google? Unlike with AOL Search, there's no compelling reason to consider it. The main difference between Netscape Search and Google is that Netscape Search will list some of Netscape's own content at the top of its results. Netscape also has a completely different look and feel than Google. If you like either of these reasons, then try Netscape Search. Otherwise, you're probably better off just searching at Google.Open Directory http://dmoz.org/
The Open Directory uses volunteer editors to catalog the web. Formerly known as NewHoo, it was launched in June 1998. It was acquired by AOL Time Warner-owned Netscape in November 1998, and the company pledged that anyone would be able to use information from the directory through an open license arrangement.
While you can search at the Open Directory site itself, this is not recommended. The site has no "backup" results that kick in should there not be a match in the human-compiled database. In addition, the ranking of sites during keyword searching is poor, while alphabetical ordering is used when you choose to "browse" categories by topic.
Instead, to scan the valuable information compiled by the Open Directory, consider using the version offered by Google, the Google Directory. Here, keyword searching uses Google's refined relevancy algorithms and makes use of link analysis to better propel good pages from the human database to the top. In addition, when viewing sites by category, they will be listed in PageRank order, which means the most popular sites based on analyzing links from across the web will be listed first.