論搜尋引擎以程式在網路上自動抓取資料時可能面臨之法律問題及其解決之道

No Thumbnail Available

Date

2007-04-??

Authors

廖先志
陳鍾誠

Journal Title

Journal ISSN

Volume Title

Publisher

國立台灣師範大學圖書資訊研究所
Graduate Institute of Library and Information Studies

Abstract

搜尋引擎必須以crawler程式(又稱spider程式)來自動抓取網頁並建立索引,如果crawler程式僅僅循著網頁所提供的超連結來搜尋網頁並抓 取內容,稱為一般性的crawler程式;如果不論網頁是否提供超連結,crawler程式會自行計算並找到網頁的所有內容並加以抓取,此種 crawler則稱為深度crawler。而crawler抓取網頁內容的步驟可以細分為「取得資料」及「儲存並建立索引」二大步驟。在「取得資料」階段 中,深度crawler雖然是自行透過演算法來取得網頁的所有內容,但本文認為仍不至於構成非法存取(unauthorized access)。此外,不論是一般的crawler或是深度crawler,如果取得網頁內容時會耗費網站資源而干擾網站的正常運作,就可能構成如美國 eBay案中討論的財產侵害(trespass to chattel)。在「儲存並建立索引」階段中,原則上應該不會侵害網頁擁有者之重製權,然而、有些搜尋引擎 (例如 Google) 將其取得的內容以「庫存頁面」(cache)的方式允許使用者存取,此時即有爭議發生,但本文以為,由於搜尋引擎的主要目的是在使網路使用者更容易接觸網 頁,所以此種「重製」與「散布」行為原則上應有著作權法「合理使用」原則的適用,故不會構成侵害著作權,但仍應考慮搜尋引擎與原網站之間是否處於競爭關 係,以及所抓取之資料量佔原網站之比例等因素綜合判斷。要解決搜尋引擎與網站間可能發生的法律爭議,除可以強化現行的robot exclusion標準外,網站也可以考慮增強自動過濾crawler的功能,以杜絕爭議。
A Search engine uses a crawler (or a spider) to retrieve and index web pages. A “general crawler” crawls across the Internet by following only the hyperlinks directly provided by web pages. On the other hand, a “deep crawler” has the capability of retrieving all contents of web pages whether the hyperlinks are available or not. The process of a crawler’s work can be split into two major steps: the “retrieving” step and the “storage and indexing” step. As far as the retrieving step is concerned, even though a deep crawler can generate all URLs by itself, there should not be any unauthorized access at all. However, a crawler, either a general or deep one, will be charged with trespassing to chattel if it interferes in the website, whereas, in the “storage and indexing” step, owners’ copyright of web pages will in principle not be infringed upon by a crawler. Nevertheless, there are controversies about “cache”, which is provided by certain popular search engines, such as Google and Yahoo. Search engines may claim their fair use defense of “cache”, because one of the main goals of search engines is to improve access to information on the internet. Actually, we must consider many factors before we decide whether “cache” is fair or not, for instance, whether there is competition between the search engine and the retrieved web sites; and what percentage of the contents is retrieved by the crawler, etc. To resolve all possible legal conflicts between crawlers and web pages, among several economic resolutions, a newer and complete robot exclusion standard is required. In addition, a self-detecting mechanism adopted by web pages may be even a more powerful one.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By