Friday, 15 August 2014

symfony - Goutte Scrape Login to https Secure Website -


So I'm trying to use Goutte to log in to an https website I get the following error:

Curl error 60: SSL certificate problem: unable to get local issuer certificate 500 internal server error - request exception 1 link exception: RingException < / P>

And this is the code that uses the creator of Gotta:

  Use Goutte \ Client; $ Client = new customer (); $ Crawler = $ client- & gt; Request ('GET', 'http://github.com/'); $ Crawler = $ client- & gt; Click ($ crawler- & gt; select link ('sign in') - & gt; link ()); $ Form = $ Crawler- & gt; Select Button ('Sign In') - & gt; Form (); $ Crawler = $ client- & gt; Submit ($ form, array ('login' = & gt; 'search', 'password' =>);); $ Crawler- & gt; Filter ('Flash-Error') - & gt; Each (function ($ node) {print $ node- & gt; text} "\ N";});  

Or here is the recommended code for Symphony:

  Use Goutte \ Client; // Make an actual request for an external site $ client = new customer (); $ Crawler = $ client- & gt; Request ('GET', 'https://github.com/login'); // Choose the form and fill in some values ​​$ form = $ crawler- & gt; Select Button ('Sign In') - & gt; Form (); $ Form ['login'] = 'symphonyphon'; $ Form ['password'] = 'anypass'; // that form is $ crawler = $ client- & gt; Submit Submit ($ form);  

The thing is that none of them work, I get an error that I posted above. I CAN , but I have logged in using this code written in the previous question:

I just want to use Symphony / Gotate to log in so that the data I need to be scraped to be easy. Any help or suggestions please? Adding the following to the code fixes the error (curl configuration):

/ // make a real request for an external site $ client = new customer (); $ Client- & gt; GetClient () - & gt; Set default option ('config / curl /'. CURLOPT_SSL_VERIFYHOST, FALSE); $ Client- & gt; GetClient () - & gt; Set default option ('config / curl /'. CURLOPT_SSL_VERIFYPEER, FALSE); $ Crawler = $ client- & gt; Request ('GET', 'https://github.com/login');

But then another error occurs:

  the current node list is empty 500 Internal Server Error - invalid agreement exception  

Once again, I'm using Goutte with a symphony and default code to perform a test task, such as logging in https github. For the previous error about

node list empty is decided that the Github login page button is actually called "sign in" and submit or Click on the button Unfortunately, the Goutte API is not as clear if $ form = $ crawler-> Select Button ('Access') - & gt; Form (); The actual plain text of the code HTML name attribute or button. This is clearly plain text; A little confusing, after more research of a bad document API, I ended up with the following code which works:

  // Make a real request for an external site $ client = new customer (); $ Client- & gt; GetClient () - & gt; Set default option ('config / curl /'. CURLOPT_SSL_VERIFYHOST, FALSE); $ Client- & gt; GetClient () - & gt; Set default option ('config / curl /'. CURLOPT_SSL_VERIFYPEER, FALSE); $ Crawler = $ client- & gt; Request ('GET', 'https://github.com/login'); // Choose the form and fill in some values ​​$ form = $ crawler- & gt; Select button ('sign in') - & gt; Form (); $ Form ['login'] = 'symphonyphon'; $ Form ['password'] = 'anypass'; // that form is $ crawler = $ client- & gt; Submit Submit ($ form); Echo $ crawler- & gt; Html ();  

No comments:

Post a Comment