Teoma merges with Ask.com

Greek39 asked 7 months ago
Teoma has just merged with Ask.com. Ask.com is claiming its’one of the world’s most powerful search engine. Here is some info I fing the crawler the most interesting:

“History

Teoma has been the heart of Ask search technology since 2001. The power of the Teoma algorithm, now known as ExpertRank, makes Ask one of the world’s most powerful and unique search engines. Two significant events helped develop Ask search technology. First, in 1999, Ask acquired Direct Hit, a Massachusetts company that had developed the world’s first “click popularity” search technology, which was licensed to MSN and Lycos, among others. Second, in 2001, Ask acquired Teoma, a 10-person start-up out of Rutgers University in New Brunswick, New Jersey, and its unique index and search relevancy technology. Teoma was the first, and is still the only, major search technology based upon the clustering concept of subject-specific popularity: ExpertRank. In fact, Teoma means “expert” in Gaelic.

How it works

Ask’s ExpertRank algorithm provides relevant search results by identifying the most authoritative sites on the Web. With Ask search technology, it’s not just about who’s biggest: it’s about who’s best. Our ExpertRank algorithm goes beyond mere link popularity (which ranks pages based on the sheer volume of links pointing to a particular page) to determine popularity among pages considered to be experts on the topic of your search. This is known as subject-specific popularity. Identifying topics (also known as “clusters”), the experts on those topics, and the popularity of millions of pages amongst those experts — at the exact moment your search query is conducted — requires many additional calculations that other search engines do not perform. The result is world-class relevance that often offers a unique editorial flavor compared to other search engines.

The Ask Web Crawler FAQ
Ask’s Web crawler is our Web-indexing robot (or crawler/spider). The crawler collects documents from the Web to build the ever-expanding index for our advanced search functionality at Ask and other Web sites that license the proprietary Ask search technology.

Ask search technology is unique from any other search technology because it analyzes the Web as it actually exists — in subject-specific communities. This process begins by creating a comprehensive and high-quality index. Web crawling is an essential tool for this approach, and it ensures that we have the most up-to-date search results.

On this page you’ll find answers to the most commonly asked questions about how the Ask Web crawler works.

Q: What is a Web crawler/Web spider?
A: A Web crawler (or, spider or robot) is a software program designed to follow hyperlinks throughout a Web site, retrieving and indexing pages to document the site for searching purposes. The crawlers are innocuous and cause no harm to an owner’s site or servers.

Q: Why does Ask use Web crawlers?
A: Ask utilizes Web crawlers to collect raw data and gather information that is used in building our ever-expanding search index. Crawling ensures that the information in our results is as up-to-date and relevant as it can possibly be. Our crawlers are well designed and professionally operated, providing an invaluable service that is in accordance with search industry standards.

Q: How does the crawler work?

The crawler goes to a Web address (URL) and downloads the HTML page.
The crawler follows hyperlinks from the page, which are URLs on the same site or on different sites.
The crawler adds new URLs to its list of URLs to be crawled. It continually repeats this function, discovering new URLs, following links, and downloading them.
The crawler excludes some URLs if it has downloaded a sufficient number from the Web site or if it appears that the URL might be a duplicate of another URL already downloaded.
The files of crawled URLs are then built into a search catalog. These URL’s are displayed as part of search results on the site powered by Ask’s search technology when a relevant match is made.

Q: How frequently will the Ask Crawler download pages from my site?
A: The crawler will download only one page at a time from your site (specifically, from your IP address). After it receives a page, it will pause a certain amount of time before downloading the next page. This delay time may range from 0.1 second to hours. The quicker your site responds to the crawler when it asks for pages, the shorter the delay.

Q. Can I prevent Teoma/Ask search engine from showing a cached copy of my page?
A: Yes. We obey the “noarchive” meta tag. If you place the following command in your HTML page, we will not provide an archived copy of the document to the user.

< META NAME = "ROBOTS" CONTENT = "NOARCHIVE" >
If you would like to specify this restriction just for Teoma/Ask, you may use “teoma” in place of “robots”.

Q: Does Ask observe the Robot Exclusion Standard?
A: Yes, we obey the 1994 Robots Exclusion Standard (RES), which is part of the Robot Exclusion Protocol. The Robots Exclusion Protocol is a method that allows Web site administrators to indicate to robots which parts of their site should not be visited by the robot. For more information on the RES, and the Robot Exclusion Protocol, please visit http://www.robotstxt.org/wc/exclusion.html.

Q: Can I prevent the Ask crawler from indexing all or part of my site/URL?
A: Yes. The Ask crawler will respect and obey commands that “ask” it not to index all or part of a given URL. To specify that the Ask crawler visit only pages whose paths begin with /public, include the following lines:

# Allow only specific directories
User-agent: Teoma
Disallow: /
Allow: /public

Q: Where do I put my robots.txt file?
A: Your file must be at the top level of your Web site, for example, if http://www.mysite.com is the name of your Web site, then the robots.txt file must be at http://www.mysite.com/robots.txt.

Q: How can I tell if the Ask crawler has visited my site/URL?
A: To determine whether the Ask crawler has visited your site, check your server logs. Specifically, you should be looking for the following user-agent string:

User-Agent: Mozilla/2.0 (compatible; Ask/Teoma)

Q: How can I prevent the Ask crawler from indexing my page or following links from a particular page?
A: If you place the following command in the section of your HTML page, the Ask crawler will not index the document and, thus, it will not be placed in our search results:

< META NAME = "ROBOTS" CONTENT = "NOINDEX" >
The following commands tell the Ask crawler to index the document, but not follow hyperlinks from it:
< META NAME = "ROBOTS" CONTENT = "NOFOLLOW" >
You may set all directives OFF by using the following:
< META NAME = "ROBOTS" CONTENT = "NONE" >
See http://www.robotstxt.org/wc/exclusion.html#meta for more information.

Q: Why is the Ask crawler downloading the same page on my site multiple times?
A: Generally, the Ask crawler should only download one copy of each file from your site during a given crawl. There are two exceptions:

A URL may contain commands that “redirect” the crawler to a different URL. This may be done with the HTML command:
< META HTTP-EQUIV="REFRESH" CONTENT="0; URL=http://www.your page address here.html" >
or with the HTTP status codes 301 or 302. In this case the crawler downloads the second page in place of the first one. If many URLs redirect to the same page, then this second page may be downloaded many times before the crawler realizes that all these pages are duplicates.”

I wonder is this a paid inclusion SE? greek39