veeco 0 Posted May 25, 2011 Report Share Posted May 25, 2011 Hello Guys,i'm not an expert (yet) in Regex, been searching to solve this problem... i have a page to scrape that has this sub-text: "the Level0 means..."<td>Invitation-Level0</td><td>Level0</td><td>Level1</td><td>Level0-Gold</td> it has lots data like the above...i want to grab the Level0 inside "<td>" only (filter out the Invitation and Gold) to count how many Level0 displayed.. So far i use the pattern "\>Level0\<" , but i could not use it to save the scraped attribute as it return string ">Level0<", hopefully there's pattern to Grab "Level0" only.. Any idea is appreciated.. Thanks Quote Link to post Share on other sites
Kreatus (Ubot Ninja) 422 Posted May 25, 2011 Report Share Posted May 25, 2011 Hi this is not regex but this is the way I do when I want the bot to count the keywords inside a page. Check this file count keywords.ubot Note: I actually got this idea from botbuddy. Quote Link to post Share on other sites
Kreatus (Ubot Ninja) 422 Posted May 25, 2011 Report Share Posted May 25, 2011 With regex you can use this pattern "w+0" or literal "Level0" Here is the sample count keywords.ubot Quote Link to post Share on other sites
UBotBuddy 331 Posted May 25, 2011 Report Share Posted May 25, 2011 @Kreatus Interesting implementation. Quote Link to post Share on other sites
veeco 0 Posted May 25, 2011 Author Report Share Posted May 25, 2011 @Kreatus your idea to count the keyword is great. But my problem is filtering the keyword..for example:"the Level0 means..."<td>Invitation-Level0</td><td>Level0</td><td>Level1</td><td>Level0-Gold1</td><td>Level0-Gold2</td><td>Level0-Gold3</td><td>Level0</td><td>Level0-Gold4</td><td>Level0-Gold5</td><td>Level0-Gold6</td> now i want to get the keyword which only show Level0 in a cell table... this mean i filterout "Invitation-Level0, Level0-Gold1, etc" So if i scrape this text i only will get "Level0" two times. Hope this clear my problem Quote Link to post Share on other sites
veeco 0 Posted May 25, 2011 Author Report Share Posted May 25, 2011 Another same scenario i face... this is the url of page to scrape:http://www.ip-adress.com/proxy_list/?k=time&d=desc If i scrape with pattern "Elite" , the IP from the link (which is 119.57.7.118:80:Elite)is also selected.if i use the pattern [^:]Elite, i get the the list of keywords i targeted but only it consist of ">Elite" Basically i would like to scrape the proxy's IP, type, and country... I hope there's solution for Standard Edition, i saw the tutorial when it use the Professional use choose ancestor from a cell... someday i might get the Pro Edition.. but now i must limit myself with Standard Edition one.. Quote Link to post Share on other sites
Pete 121 Posted May 25, 2011 Report Share Posted May 25, 2011 Basically i would like to scrape the proxy's IP, type, and country...Maybe this will helpProxieScraper.ubot Quote Link to post Share on other sites
veeco 0 Posted May 26, 2011 Author Report Share Posted May 26, 2011 @zap.. i got your ideas... nice.. but there's little problem that some IP has unknown country so the page scrape for country will have less list.. Quote Link to post Share on other sites
Frank 177 Posted June 24, 2011 Report Share Posted June 24, 2011 Guys, it can be so much easier.... http://ubotstudio.com/forum/index.php?/topic/7162-using-regex-to-catch-text-between-sections/#entry36055 Frank Quote Link to post Share on other sites
lazlink 0 Posted August 8, 2011 Report Share Posted August 8, 2011 thats great. thank u for your sharing dude..i'new commer here.. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.