Jaro 6 Posted November 6, 2015 Report Share Posted November 6, 2015 Pleease help, I'm trying to scrape hehehe from the text: name=bla value=o>bla - bla <hehehe></td> I'm using the regex syntax (?<=value=o>.*?<).*?(?=>) just fine in my EditPad Pro, it works exactly how I want... but of course ubot's got problems with my regex again!Please tell me how to make it work in uBot as well, THANKS! Quote Link to post Share on other sites
Bill 7 Posted November 7, 2015 Report Share Posted November 7, 2015 These all work (?<=\).*?(?=\&)(?<=\<\).*?(?=\>\) Quote Link to post Share on other sites
Jaro 6 Posted November 7, 2015 Report Share Posted November 7, 2015 Thank you Bill, but no it doesn't work for what I need it matches more characters, that's why I really need to have there the value=o...etc. condition...Buddy S. from the Ubot support advised ^.*value=o>.*<(.*)>.* but as it works on his favorite rubular website but it doesn't work in Ubot either, although it works on that website.. Quote Link to post Share on other sites
Jaro 6 Posted November 7, 2015 Report Share Posted November 7, 2015 I've just found out that (?<=value\=o\>.*\<\.*?(?=\>\ works finally in Ubot Regex editor but the Rubular website says it's an 'Invalid pattern in look-behind.'Although it works in the Regex editor it doesn't work in Ubot scripts, and again, adds empty values unfortunatelly. UPDATE 1: And here seems to be the reason: https://www.ruby-forum.com/topic/4483308 UPDATE 2: I've just solved the problem by adjusting the text on the sides - around the pattern to be selected: (?<=<\.*?(?=\>\;\<\/td\>\<td width) Quote Link to post Share on other sites
maBOT 10 Posted December 9, 2015 Report Share Posted December 9, 2015 Hello guyz, I'm struggling with the pretty similar stuff and I'd like someone to help me out overcome it. Actually I can scrape the desired from two sources, however, It seems I simply can't get it work. Here are both: <a href="javascript:void(0);" onclick="checkclosed('EcqodmiMOWM');" class="likebutton">Like Video</a> <img src="http://i.ytimg.com/vi/EcqodmiMOWM/default.jpg"> I'm trying to scrape the "EcqodmiMOWM" one (no matter through each of these two), put it into variable and using "navigate" function to open it up in uBot as a normal Youtube URL.. Here are entire code which I'm using: } set(#youtube,$scrape attribute(<src="http://i.ytimg.com/vi/EcqodmiMOWM/default.jpg">,$find regular expression("","(?<=<img src=\\\"http://i.ytimg.com/vi/).*?(?=/default.jpg\\\">)")),"Global") in shared browser { navigate("https://www.youtube.com/watch?v={#youtube}","Wait") wait for browser event("Everything Loaded","") wait(10) } With this set up debugger returns nothing. ***I've been trying various methods which I read about here except regex, and none of them worked for me. In addition, the page contains only this single a href/ img src, no multiple attributes present on the page. Any suggestions for plugins, whether free or commercial ones, just mention it here.. Thanks ahead a lot, P.S.This is my first forum post ever.. I wanted to seek for help once I really need it ;-) Quote Link to post Share on other sites
Pete 121 Posted December 9, 2015 Report Share Posted December 9, 2015 try this set(#youtube,$find regular expression("<src=\"http://i.ytimg.com/vi/EcqodmiMOWM/default.jpg\">","(?<=ytimg\\.com\\/vi\\/).*?(?=\\/default\\.jpg)"),"Global") navigate("https://www.youtube.com/watch?v={#youtube}","Wait") wait for browser event("Everything Loaded","") wait(10)I think you don't see it in the debugger becouse you failed to click the quotation marks for the text in your regex 1 Quote Link to post Share on other sites
maBOT 10 Posted December 9, 2015 Report Share Posted December 9, 2015 try this set(#youtube,$find regular expression("<src=\"http://i.ytimg.com/vi/EcqodmiMOWM/default.jpg\">","(?<=ytimg\\.com\\/vi\\/).*?(?=\\/default\\.jpg)"),"Global") navigate("https://www.youtube.com/watch?v={#youtube}","Wait") wait for browser event("Everything Loaded","") wait(10)I think you don't see it in the debugger becouse you failed to click the quotation marks for the text in your regex Hi Zap and thanks very much. That really did the trick... However, what if I will be getting the same format like src each time to scrape from but with different tag? So basically EcqodmiMOWM will change (varies with a new value) each time on scraping process. It seems that "wildcard" cannot help in combination with "regex".. Do you have any good suggestion on this? I probably need to use some wider selection like "outerhtml" or other selector.. Quote Link to post Share on other sites
Pete 121 Posted December 9, 2015 Report Share Posted December 9, 2015 If i knew what urls you needed it would be easyer navigate("https://www.youtube.com/channel/UCgkY2u5AprRNiIX4JHdIuHA/videos","Wait") clear list(%urls) add list to list(%urls,$list from text($scrape attribute(<href=w"/watch?v=*">,"fullhref"),$new line),"Delete","Global") loop while($comparison($list total(%urls),"> Greater than",0)) { navigate($list item(%urls,0),"Wait") wait for browser event("Everything Loaded",15) wait(10) remove from list(%urls,0) } maybe this is what you need 1 Quote Link to post Share on other sites
maBOT 10 Posted December 9, 2015 Report Share Posted December 9, 2015 Hello, Thanks, Zap! Now I've gotten what I wanted... The below (old) code I needed in loop where part of src ('EcqodmiMOWM') is an unique value on each separate loop. I did it with wildcard + regex. Look at the code: loop($rand(2,5)) { set(#youtube,$find regular expression($scrape attribute(<src=w"http://i.ytimg.com/vi/*/default.jpg">,"src"),"(?<=ytimg\\.com\\/vi\\/).*?(?=\\/default\\.jpg)"),"Global") Thanks a lot for your support!! Quote Link to post Share on other sites
BlackHatMon3yMaker 0 Posted March 29, 2016 Report Share Posted March 29, 2016 Hi Zap and thanks very much. That really did the trick... However, what if I will be getting the same format like src each time to scrape from but with different tag? So basically EcqodmiMOWM will change (varies with a new value) each time on scraping process. It seems that "wildcard" cannot help in combination with "regex".. Do you have any good suggestion on this? I probably need to use some wider selection like "outerhtml" or other selector..I'm in the same position, any input is greatly appreciated! Quote Link to post Share on other sites
maBOT 10 Posted March 29, 2016 Report Share Posted March 29, 2016 I'm in the same position, any input is greatly appreciated! Hi, Mony3Maker Have you noticed my latest reply on this thread which did the trick? Pls see below: loop($rand(2,5)) { set(#youtube,$find regular expression($scrape attribute(<src=w"http://i.ytimg.com/vi/*/default.jpg">,"src"),"(?<=ytimg\\.com\\/vi\\/).*?(?=\\/default\\.jpg)"),"Global")Hope it helped. Quote Link to post Share on other sites
BlackHatMon3yMaker 0 Posted March 29, 2016 Report Share Posted March 29, 2016 Hi, Mony3Maker Have you noticed my latest reply on this thread which did the trick? Pls see below: loop($rand(2,5)) { set(#youtube,$find regular expression($scrape attribute(<src=w"http://i.ytimg.com/vi/*/default.jpg">,"src"),"(?<=ytimg\\.com\\/vi\\/).*?(?=\\/default\\.jpg)"),"Global")Hope it helped.Hey thanks for the response! I tried plugging in the info i needed to use and still can't get it to extract the info like I want it. Am I missing something obvious? I changed some things but nothing that should cause it not to work? loop(1) { set(#picture,$find regular expression($scrape attribute(<src=w"http://ecx.images-amazon.com/images/I/*">,"src"),"(?<=http://ecx.images-amazon.com/images/I/).*?(?<=._SL1500_.jpg)"),"Global") } This is the code with the full link I'm trying to extract <img src="http://ecx.images-amazon.com/images/I/919sFzge2iL._SL1500_.jpg" class="fullScreen" style="height: 471px; width: 818.333px; margin-top: 10px; margin-left: 93px;"> Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.