Scrapy

**Scrapy**
開發者	Scrapinghub, Ltd.（英语：Scrapinghub, Ltd.）
首次发布	2008年6月26日
当前版本	2.11.1 (2024年2月14日；穩定版本);
源代码库	github.com/scrapy/scrapy;
编程语言	Python
操作系统	Windows、macOS、Linux
类型	网络爬虫
许可协议	BSD许可证
网站	scrapy.org

Scrapy（/ˈskreɪpi/ SKRAY-pee^[3]是一个用Python编写的自由且开源的网络爬虫框架。它在设计上的初衷是用于爬取网络数据，但也可用作使用API来提取数据，或作为生成目的的网络爬虫^[4]。该框架目前由网络抓取的开发与服务公司Scrapinghub公司（英语：Scrapinghub Ltd.）维护。

Scrapy项目围绕“蜘蛛”（spiders）建构，蜘蛛是提供一套指令的自包含的爬网程序（crawlers）。遵循其他如Django框架的一次且仅一次精神^[5]，允许开发者重用代码将便于构建和拓展大型的爬网项目。Scrapy也提供一个爬网shell，开发者可用它测试对网站的效果。^[6]

使用Scrapy的知名公司和产品有：Lyst^[7]^[8]、Parse.ly（英语：Parse.ly）^[9]、Sayone Technologies（英语：Sayone Technologies）^[10]、Sciences Po Medialab^[11]、Data.gov.uk（英语：Data.gov.uk）的世界政府数据网站^[12]等。

历史[编辑]

Scrapy诞生于网络聚合和电子商务公司Mydeco，它由Mydeco和Insophia公司的员工开发和维护。2008年8月首次以BSD许可证公开发布，2015年6月发布有里程碑意义的1.0版本^[13]。2011年，Scrapinghub成为新的官方维护者^[14]^[15]。

参考文献[编辑]

^ Release 2.11.1. 2024年2月14日 [2024年2月20日].
^ Release notes — Scrapy documentation. doc.scrapy.org. [18 November 2020]. （原始内容存档于2020-01-28）（英语）.
^ How do you pronounce "Scrapy"? （页面存档备份，存于互联网档案馆））
^ Scrapy at a glance （页面存档备份，存于互联网档案馆）.
^ Frequently Asked Questions. [28 July 2015]. （原始内容存档于2020-11-11）.
^ Scrapy shell. [28 July 2015]. （原始内容存档于2020-10-31）.
^ Bell, Eddie; Heusser, Jonathan. Scalable Scraping Using Machine Learning. [28 July 2015]. （原始内容存档于2016-10-09）.
^ Scrapy | Companies using Scrapy. [2020-12-08]. （原始内容存档于2020-11-12）.
^ Montalenti, Andrew. Web Crawling & Metadata Extraction in Python. [2020-12-08]. （原始内容存档于2020-09-19）.
^ Scrapy Companies. Scrapy website. [2020-12-08]. （原始内容存档于2020-11-12）.
^ Hyphe v0.0.0: the first release of our new webcrawler is out!. [2020-12-08]. （原始内容存档于2016-06-13）.
^ Ben Firshman [@bfirsh]. World Govt Data site uses Django, Solr, Haystack, Scrapy and other exciting buzzwords bit.ly/5jU3La #opendata #datastore (推文). 21 January 2010 –通过Twitter.
^ Medina, Julia. Scrapy 1.0 official release out! . scrapy-users (邮件列表). 19 June 2015 [2018-09-13]. （原始内容存档于2011-01-22）.
^ Pablo Hoffman. List of the primary authors & contributors. 2013 [18 November 2013]. （原始内容存档于2017-05-29）.
^ Interview Scraping Hub （页面存档备份，存于互联网档案馆）.

外部链接[编辑]

官方网站

参见[编辑]

robots.txt：放在網頁伺服器上，告知網路蜘蛛哪些頁面內容可取得或不可取得。
网络爬虫

[wikidata-8372b8aa6f0ccdcc3c5df40e7b8f75c54ba34515-v3-1] Release 2.11.1. 2024年2月14日 [2024年2月20日].

[2] Release notes — Scrapy documentation. doc.scrapy.org. [18 November 2020]. （原始内容存档于2020-01-28）（英语）.

[3] How do you pronounce "Scrapy"? （页面存档备份，存于互联网档案馆））

[4] Scrapy at a glance （页面存档备份，存于互联网档案馆）.

[5] Frequently Asked Questions. [28 July 2015]. （原始内容存档于2020-11-11）.

[6] Scrapy shell. [28 July 2015]. （原始内容存档于2020-10-31）.

[7] Bell, Eddie; Heusser, Jonathan. Scalable Scraping Using Machine Learning. [28 July 2015]. （原始内容存档于2016-10-09）.

[8] Scrapy | Companies using Scrapy. [2020-12-08]. （原始内容存档于2020-11-12）.

[9] Montalenti, Andrew. Web Crawling & Metadata Extraction in Python. [2020-12-08]. （原始内容存档于2020-09-19）.

[10] Scrapy Companies. Scrapy website. [2020-12-08]. （原始内容存档于2020-11-12）.

[11] Hyphe v0.0.0: the first release of our new webcrawler is out!. [2020-12-08]. （原始内容存档于2016-06-13）.

[12] Ben Firshman [@bfirsh]. World Govt Data site uses Django, Solr, Haystack, Scrapy and other exciting buzzwords bit.ly/5jU3La #opendata #datastore (推文). 21 January 2010 –通过Twitter.

[13] Medina, Julia. Scrapy 1.0 official release out! . scrapy-users (邮件列表). 19 June 2015 [2018-09-13]. （原始内容存档于2011-01-22）.

[list-14] Pablo Hoffman. List of the primary authors & contributors. 2013 [18 November 2013]. （原始内容存档于2017-05-29）.

[15] Interview Scraping Hub （页面存档备份，存于互联网档案馆）.

[1]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]