musician87 2 Posted October 15, 2013 Report Share Posted October 15, 2013 Hey guys, I'm trying to scrape emails + names in this format: testemail@gmail.com , BrownAnyway seems that I can only scrape for: mailto:testemail@gmail.comIs not a big problem, with a text editor I can easily remove all the "mailto:", anyway I cannot scrape also the name. I don't know what happened, but I get always a blank document. Here a screen of the code where I get the error:http://i.imgur.com/3E8qx1N.png If anyone need the website that I'm scraping, I'll send you the link via PM. Thank you so much! Quote Link to post Share on other sites
k1lv9h 76 Posted October 15, 2013 Report Share Posted October 15, 2013 Hi, You could use $replace to remove mailto: from email value as it is saved to the table. For Name value "attribute to scrape" should be "outertext" not "name". Kevin 1 Quote Link to post Share on other sites
musician87 2 Posted October 15, 2013 Author Report Share Posted October 15, 2013 Hi, You could use $replace to remove mailto: from email value as it is saved to the table. For Name value "attribute to scrape" should be "outertext" not "name". Kevin Trying the $replace...for the name was "innertext". Anyway I don't understand why with just a link it works but when I try multiple links it doesn't save data (or at least: the .txt is blank). Quote Link to post Share on other sites
k1lv9h 76 Posted October 15, 2013 Report Share Posted October 15, 2013 Hi, Without looking at your code it would be hard to say why the data is not being saved. Kevin Quote Link to post Share on other sites
musician87 2 Posted October 15, 2013 Author Report Share Posted October 15, 2013 Hi, Without looking at your code it would be hard to say why the data is not being saved. Kevin Can I send it through PM? Quote Link to post Share on other sites
k1lv9h 76 Posted October 15, 2013 Report Share Posted October 15, 2013 Can I send it through PM?Yes Quote Link to post Share on other sites
Big Jay 35 Posted October 16, 2013 Report Share Posted October 16, 2013 Please let me know if the code that I sent you helped. To anyone else looking to grab emails from a page this regex works great: [\\.\\-_A-Za-z0-9]+?@[\\.\\-A-Za-z0-9]+?[\\.A-Za-z0-9]\{2,\} Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.