Wednesday, 15 January 2014

Python scraping using scrapy -


So, I have seen the Troytoriyl how to use Skrepi and now I can see the link to a given page . But what I want to do is get a page that I want to collect my data (metadata and summary), I want to collect their data by going to the link on that page. This is my code so far (not collected data now)

  do CrawlSpider import scrapy.contrib.spiders, scrapy.contrib.linkextractors from scrapy from scrapy.spider import BaseSpider. Rules sgml import SgmlLinkExtractor scrapy.selector import selector #from scrapy.item import SpideyItem class spidey (CrawlSpider) from scrapy.selector import .http import request HtmlXPathSelector: name = "spidey" allowed_domains = [ "wikipedia.org"] start_urls = [ "http: / /en.wikipedia.org/wiki/Game_of_Thrones"] = rule (rule (SgmlLinkExtractor (restrict_xpaths = ())), rules (SgmlLinkExtractor '// div [@ class = "nw- body"] a / @ Href // '(allow = ("http://en.wikipedia.org/wiki/",)), call Track = 'Parse_item'),) def Parse_item (self-response): sel = HtmlXPathSelector (response) Print Selkxpath ( '// Hl [@ class = "firstHeading"] / span / text () "). Extract ()  

After this I want to collect data in the initial pages and data link I see I'm new to web spiders, no indicator is welcome.

I just do not think your question is not sure, but if you're asking how many pages Collect data from and save it to an item ... this is your answer:

In addition to this, if you do not want to do this in inline way, you always have your Store items in a request.meta and a function can be sent in a request with a callback to extract data from that page

Answer this Check :.


No comments:

Post a Comment