Tuesday, 15 September 2015

php - Using Goutte with Symfony2 in Controller -


I am trying to scrape a page and I am not very familiar with the php framework, so I'm trying I am learning about Symphony 2 and I am moving it up, and now I am trying to use Goute. It has been installed in the Vendor folder, and I have a bundle that I am using for my scrapping project.

The question is, is this a good practice to scrap with a controller ? And how? I have always searched for and how to use Goutte from the bundle, because the file structure has been buried deeply.

  & lt ;? Php namespace ontf \ scraper bundle \ controller; Use Symfony \ Bundle \ FrameworkBundle \ Controller \ Controller; Use Goutte \ Client; Class ThingsController Extends Controller {Public Function Some Action ($ Some) {$ client = New Client (); $ Crawler = $ client- & gt; Request ('GET', 'http://www.symfony.com/blog/'); Echo $ crawler- & gt; Text (); $ $ Return - ('Scrapper Bundle: Thing: index.html.twig'); // Return $ - this-> Render ('scraper bundle: Thing: index.html.twig', array (// 'some' => $ some /)); }  

}

I'm not sure I've heard As far as scrapping is done, you can find something in the book.

These are some guidelines I have used in my projects:

  1. Scraping is a slow process, consider assigning that work to the background process
  2. The background process usually runs in the form of a cron job that executes a CLI application or a worker who is constantly running
  3. A process control for managing your employees Use the system
  4. Take a look at Save every scrap file ("raw" version), and log on every error It will enable you to detect problems, to store these files Use Rackspace Cloud Files or AWS S3.
  5. Use to create commands to run your scraper. You can save commands in your bundle under the command directory
  6. To stop running out of memory, run your Symfony2 command by using the following flag: php app / console scraper: run example.com --env = prod --no-debug Where there is an app / console, where Symphony 2 console is Appleton, scraper: run is the name of your command, example.com is a page you want to scrape , And use the --env = prod --no-debug flags to run in the output For example, see code below
  7. Inject your Goutte client to your order:

Ontf / ScraperBundle / Resources / services.yml

Services: goutte_client: class: Goute \ client scraper command: class: archive \ scraperbundle \ command \ scraper comma arguments: ["@gtete_kline"] Tags: - {name: console.command}

And your order should look something like this:

  & lt ;? Php / Ontf / ScraperBundle / Command / ScraperCommand.php Name Location Ontf \ ScraperBundle \ Command; Use Symfony \ Component \ Console \ Command \ Command; Use Symfony \ Component \ Console \ Input \ InputArgument; Use Symfony \ Component \ Console \ Input \ InputInterface; Use Symfony \ Component \ Console \ Input \ InputOption; Use Symfony \ Component \ Console \ Output \ OutputInterface; Use Goutte \ Client; Abstract Class ScraperCommand Command [Private $ Client; Public function __ composition (client $ customer) {$ this- & gt; Customer = $ customer; Parents :: __ Construction (); } Configure Protected Function () {- & gt; Setnam ('scraper: run') - & gt; Set descriptio ('Run Gotta Scrapper.') - & gt; Advert ('url', input astroglimate :: is required, 'URL which you want to scrap.'); } Execute secure function (InputInterface $ input, OutputInterface $ output) {$ url = $ input- & gt; GetArgument ('url'); $ Crawler = $ this- & gt; Client- & gt; Request ('GET', $ url); Echo $ crawler- & gt; Text (); }}  

No comments:

Post a Comment