如何动态设置Scrapy规则?


问题内容

我有一个在初始化之前运行一些代码的类:

class NoFollowSpider(CrawlSpider):
    rules = ( Rule (SgmlLinkExtractor(allow=("", ),),
                callback="parse_items",  follow= True),
)

def __init__(self, moreparams=None, *args, **kwargs):
    super(NoFollowSpider, self).__init__(*args, **kwargs)
    self.moreparams = moreparams

我正在使用以下命令运行此scrapy代码:

> scrapy runspider my_spider.py -a moreparams="more parameters" -o output.txt

现在,我希望可以从命令行配置名为 rules 的静态变量:

> scrapy runspider my_spider.py -a crawl=True -a moreparams="more parameters" -o output.txt

init 更改为:

def __init__(self, crawl_pages=False, moreparams=None, *args, **kwargs):
    if (crawl_pages is True):
        self.rules = ( Rule (SgmlLinkExtractor(allow=("", ),), callback="parse_items",  follow= True),
    )
    self.moreparams = moreparams

但是,如果我在init中切换静态变量 规则
,则scrapy不再考虑它:它运行,而仅爬取给定的start_urls而不是整个域。看来规则必须是静态的类变量。

那么,如何动态设置静态变量?


问题答案:

因此,这是我在@Not_a_Golfer和@nramirezuy的大力帮助下解决问题的方法,我只是在使用它们的两个建议:

class NoFollowSpider(CrawlSpider):

def __init__(self, crawl_pages=False, moreparams=None, *args, **kwargs):
    super(NoFollowSpider, self).__init__(*args, **kwargs)
    # Set the class member from here
    if (crawl_pages is True):
        NoFollowSpider.rules = ( Rule (SgmlLinkExtractor(allow=("", ),), callback="parse_items",  follow= True),)
        # Then recompile the Rules
        super(NoFollowSpider, self)._compile_rules()

    # Keep going as before
    self.moreparams = moreparams

谢谢大家的帮助!