Jump to content
UBot Underground

Using Regex To catch text between sections


Recommended Posts

Pleease help, I'm trying to scrape hehehe from the text: name=bla value=o>bla - bla <hehehe></td>

 

I'm using the regex syntax (?<=value=o>.*?<).*?(?=>) just fine in my EditPad Pro, it works exactly how I want... but of course ubot's got problems with my regex again!

Please tell me how to make it work in uBot as well, THANKS!

Link to post
Share on other sites

Thank you Bill, but no it doesn't work for what I need it matches more characters, that's why I really need to have there the value=o...etc. condition...

Buddy S. from the Ubot support advised ^.*value=o>.*<(.*)&gt.* but as it works on his favorite rubular website but it doesn't work in Ubot either, although it works on that website..

Link to post
Share on other sites

I've just found out that 

(?<=value\=o\>.*\&lt\.*?(?=\&gt\

works finally in Ubot Regex editor but the Rubular website says it's an 'Invalid pattern in look-behind.'

Although it works in the Regex editor it doesn't work in Ubot scripts, and again, adds empty values unfortunatelly.

 

UPDATE 1: And here seems to be the reason:
 https://www.ruby-forum.com/topic/4483308

 

UPDATE 2: I've just solved the problem by adjusting the text on the sides - around the pattern to be selected:

(?<=&lt\.*?(?=\&gt\;\<\/td\>\<td width)
Link to post
Share on other sites
  • 1 month later...

Hello guyz,

 

I'm struggling with the pretty similar stuff and I'd like someone to help me out overcome it. Actually I can scrape the desired from two sources, however, It seems I simply can't get it work.

 

Here are both:

<a href="javascript:void(0);" onclick="checkclosed('EcqodmiMOWM');" class="likebutton">Like Video</a>
<img src="http://i.ytimg.com/vi/EcqodmiMOWM/default.jpg">

I'm trying to scrape the "EcqodmiMOWM" one (no matter through each of these two), put it into variable and using "navigate" function to open it up in uBot as a normal Youtube URL..  Here are entire code which I'm using:

}
set(#youtube,$scrape attribute(<src="http://i.ytimg.com/vi/EcqodmiMOWM/default.jpg">,$find regular expression("","(?<=<img src=\\\"http://i.ytimg.com/vi/).*?(?=/default.jpg\\\">)")),"Global")
in shared browser {
    navigate("https://www.youtube.com/watch?v={#youtube}","Wait")
    wait for browser event("Everything Loaded","")
    wait(10)
}

With this set up debugger returns nothing. ***I've been trying various methods which I read about here except regex, and none of them worked for me. In addition, the page contains only this single a href/ img src, no multiple attributes present on the page.

 

Any suggestions for plugins, whether free or commercial ones, just mention it here..

 

 

Thanks ahead a lot,

 

 

P.S.

This is my first forum post ever.. I wanted to seek for help once I really need it ;-)

Link to post
Share on other sites

try this

set(#youtube,$find regular expression("<src=\"http://i.ytimg.com/vi/EcqodmiMOWM/default.jpg\">","(?<=ytimg\\.com\\/vi\\/).*?(?=\\/default\\.jpg)"),"Global")
navigate("https://www.youtube.com/watch?v={#youtube}","Wait")
wait for browser event("Everything Loaded","")
wait(10)

I think you don't see it in the debugger becouse you failed to click the quotation marks for the text in your regex

  • Like 1
Link to post
Share on other sites

try this

set(#youtube,$find regular expression("<src=\"http://i.ytimg.com/vi/EcqodmiMOWM/default.jpg\">","(?<=ytimg\\.com\\/vi\\/).*?(?=\\/default\\.jpg)"),"Global")
navigate("https://www.youtube.com/watch?v={#youtube}","Wait")
wait for browser event("Everything Loaded","")
wait(10)

I think you don't see it in the debugger becouse you failed to click the quotation marks for the text in your regex

 

Hi Zap and thanks very much. That really did the trick...

 

However, what if I will be getting the same format like src each time to scrape from but with different tag? So basically EcqodmiMOWM will change (varies with a new value) each time on scraping process.

 

It seems that "wildcard" cannot help in combination with "regex"..

 

Do you have any good suggestion on this? I probably need to use some wider selection like "outerhtml" or other selector..

Link to post
Share on other sites

If i knew what urls you needed it would be easyer

navigate("https://www.youtube.com/channel/UCgkY2u5AprRNiIX4JHdIuHA/videos","Wait")
clear list(%urls)
add list to list(%urls,$list from text($scrape attribute(<href=w"/watch?v=*">,"fullhref"),$new line),"Delete","Global")
loop while($comparison($list total(%urls),"> Greater than",0)) {
    navigate($list item(%urls,0),"Wait")
    wait for browser event("Everything Loaded",15)
    wait(10)
    remove from list(%urls,0)
}

maybe this is what you need

  • Like 1
Link to post
Share on other sites

Hello,

 

Thanks, Zap! Now I've gotten what I wanted...

 

The below (old) code I needed in loop where part of src ('EcqodmiMOWM') is an unique value on each separate loop. I did it with wildcard + regex. Look at the code:

 

 

loop($rand(2,5)) {
    set(#youtube,$find regular expression($scrape attribute(<src=w"http://i.ytimg.com/vi/*/default.jpg">,"src"),"(?<=ytimg\\.com\\/vi\\/).*?(?=\\/default\\.jpg)"),"Global")

 

 

Thanks a lot for your support!!

Link to post
Share on other sites
  • 3 months later...

Hi Zap and thanks very much. That really did the trick...

 

However, what if I will be getting the same format like src each time to scrape from but with different tag? So basically EcqodmiMOWM will change (varies with a new value) each time on scraping process.

 

It seems that "wildcard" cannot help in combination with "regex"..

 

Do you have any good suggestion on this? I probably need to use some wider selection like "outerhtml" or other selector..

I'm in the same position, any input is greatly appreciated!

Link to post
Share on other sites

I'm in the same position, any input is greatly appreciated!

 

Hi, Mony3Maker

 

Have you noticed my latest reply on this thread which did the trick? Pls see below:

loop($rand(2,5)) {
    set(#youtube,$find regular expression($scrape attribute(<src=w"http://i.ytimg.com/vi/*/default.jpg">,"src"),"(?<=ytimg\\.com\\/vi\\/).*?(?=\\/default\\.jpg)"),"Global")

Hope it helped.

Link to post
Share on other sites

Hi, Mony3Maker

 

Have you noticed my latest reply on this thread which did the trick? Pls see below:

loop($rand(2,5)) {
    set(#youtube,$find regular expression($scrape attribute(<src=w"http://i.ytimg.com/vi/*/default.jpg">,"src"),"(?<=ytimg\\.com\\/vi\\/).*?(?=\\/default\\.jpg)"),"Global")

Hope it helped.

Hey thanks for the response! I tried plugging in the info i needed to use and still can't get it to extract the info like I want it. Am I missing something obvious? I changed some things but nothing that should cause it not to work?

loop(1) {
    set(#picture,$find regular expression($scrape attribute(<src=w"http://ecx.images-amazon.com/images/I/*">,"src"),"(?<=http://ecx.images-amazon.com/images/I/).*?(?<=._SL1500_.jpg)"),"Global")
}

This is the code with the full link I'm trying to extract

<img src="http://ecx.images-amazon.com/images/I/919sFzge2iL._SL1500_.jpg" class="fullScreen" style="height: 471px; width: 818.333px; margin-top: 10px; margin-left: 93px;">
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...