- 添加头文件:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from tutorial.items import TutorialItem
- 选择要派生的基类:如BaseSpider、CrawlSpider等
- 定义构造函数
a) def __init__(self, *a, **kw):
b) 调用基类构造函数:super(CrawlSpider, self).__init__(*a, **kw)
- 重载默认回调函数
a) def parse(self, response):
b) 返回item或request