veeco 0 Posted May 24, 2011 Report Share Posted May 24, 2011 Hello,i watched tutorial and can't find a clue about data scraping about a page...i did find a tutorial that scrape a data in table but i don't have the "choose ancestor" feature.. i tried to use regex for pagescrape but it didn't work can a standard version do data mining from scrape a page ? or i have to use the professional version so far i found a faster solution using cURL and preg_match in php... i really hope ubot can make it faster.. Any idea/tutorial is appreciated.. Thanks Quote Link to post Share on other sites
UBotBuddy 331 Posted May 24, 2011 Report Share Posted May 24, 2011 Standard can scrape websites. What site are you trying to scrape? Quote Link to post Share on other sites
UBotBuddy 331 Posted May 24, 2011 Report Share Posted May 24, 2011 Also, update your profile to show us what version of UBot that you have as well as your computing environment. Many times the right person that sees your setup can identify if not provide the correct response for you. Quote Link to post Share on other sites
UBotBuddy 331 Posted May 24, 2011 Report Share Posted May 24, 2011 So did you watch all of these videos? http://ubotstudio.com/tutorials.aspx Quote Link to post Share on other sites
veeco 0 Posted May 24, 2011 Author Report Share Posted May 24, 2011 So did you watch all of these videos? http://ubotstudio.com/tutorials.aspx Yupe.. i watched them all...This is the page that i try to scrape (get the info from ubottutorials.com)http://www.ip-adress.com/proxy_list/?k=time&d=desc any idea to scrape the data.. at least to have proxy, type, and country... ? in PHP scraped with something like this:\<TR class\=\".*\"\>\<TD\>(.*)\<\/TD\>\<TD\>(.*)\<\/TD\>\<TD\>(.*)\<\/TD\> and i could get proxy, type, and country Also the page had class=odd and class=even.. that's why i must use $page_scrape twice for odd list and event list. I would like to know if i could do these steps:1. choose attribute of innerhtml for the table2. set a variable to hold the html3. $replace , class=odd and class=even to class=data4. set the innerhtml back to the page.. the idea is to remove class=odd/even... and have only one class=data for easy scraping... is it possible ? Quote Link to post Share on other sites
JohnB 255 Posted May 24, 2011 Report Share Posted May 24, 2011 Here is a quick example of how you can scrape the proxies: proxy scraper.ubot John Quote Link to post Share on other sites
UBotBuddy 331 Posted May 24, 2011 Report Share Posted May 24, 2011 Thanks John! That is a great example of scraping. Also, thanks for filling out your profile info. Quote Link to post Share on other sites
veeco 0 Posted May 25, 2011 Author Report Share Posted May 25, 2011 Wow nice approach... time to turn on my Regex Buddy and try to decrypt your regex... so, if i would like to have the type and country... i should use 3 list am i right ? or can i build table ? What REGEX types does UBOT implement ? PCRE ? POSIX ? Java ? PERL ? etc... Quote Link to post Share on other sites
UBotBuddy 331 Posted May 25, 2011 Report Share Posted May 25, 2011 I would rather see you focus on your UBot skills rather than regex. Regex is good for difficult thing but it is overkill for the plain scraping that you will likely be doing. I hardly ever use regex because the native UBot nodes are great at what they do. Quote Link to post Share on other sites
veeco 0 Posted May 26, 2011 Author Report Share Posted May 26, 2011 @BotBuddy, Thanks for your suggestion... but i think having regex skill will complement the way we made robot..unfortunately, there are no regex standard.. for example:i build a page to check ip , let say it return: 118.138.50.126 (no html, just a plain text)then i choose the atributte->outertext->SearchString: ([0-9]{2,3}\.){2}i was hope to get: 118.138. (i test the pattern in regexbuddy)but unfortunately it return 118.138.50.126 i test with the pattern that you suggest from previous bot: ^\b(??:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){2}\b it did work in regexbuddy.. but failed on ubot so i wonder if it has something to do with Regex Type that ubot capable of.. Quote Link to post Share on other sites
Eddie Waller 158 Posted May 26, 2011 Report Share Posted May 26, 2011 That pattern works fine for me in UBot, the only issue might be that UBot uses {1} and {2} etc to references things that you want to put into a string, so it may be getting confused with that. http://screencast.com/t/9LQkp04q8 As far as what we support, we are compatible with PCRE. Let me know if that helps. Quote Link to post Share on other sites
veeco 0 Posted May 26, 2011 Author Report Share Posted May 26, 2011 Here's the sample ubot i made... i might missing something Quote Link to post Share on other sites
veeco 0 Posted May 26, 2011 Author Report Share Posted May 26, 2011 forgot to attach the botIP_Regex.ubot Quote Link to post Share on other sites
Eddie Waller 158 Posted May 26, 2011 Report Share Posted May 26, 2011 It looks like you're misunderstanding the choosing system. When you choose something by an attribute, and then scrape that same attribute, it is going to be the entire attribute's content, not just the part you matched. The choosing system will find the right element, and to modify the attribute value you will have to scrape that attribute and then change it. Hope that makes more sense. Here's the modified version that seems to work for me. http://screencast.com/t/Bi0ZiytsMk Quote Link to post Share on other sites
veeco 0 Posted May 26, 2011 Author Report Share Posted May 26, 2011 @Eddie Waller , i try to find your constant "Find Regular Expression" is it available in Standard Edition ? if not, can i accomplish the same thing with standard version ? Quote Link to post Share on other sites
Eddie Waller 158 Posted May 26, 2011 Report Share Posted May 26, 2011 Ooh sorry, I haven't looked much at what's available in each version of UBot haha. Here's a version using $replace regular expression, where I replace the end of the IP address with an empty parameter. http://screencast.com/t/vZ7QpPUFmH7F Quote Link to post Share on other sites
veeco 0 Posted May 26, 2011 Author Report Share Posted May 26, 2011 it works.. thanks ! Quote Link to post Share on other sites
tresehabas 2 Posted June 3, 2011 Report Share Posted June 3, 2011 veeco, can i ask for some example? well actually I've been figuring how this works for some time. I want to see if I'm in the right track. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.