Jump to content
UBot Underground

Trying To Use Regex To Scrape Url Between Shortcodes [Kenpolayer][/kenplayer]


Recommended Posts

When i try to scrape an url between these 2 shortcodes using page scrape it gives me an error after some times, i guess it's because the ubot converts those 2 shortcodesinto a math function.

 

the url appears like that in the page:

[kenplayer]https://www.websiteurl.com/3456782[/kenplayer]

What code should i use to scrape the url, or the number after the url at least? are the regex codes put directly in the ubot editor, or i have to download a plugin?

Link to post
Share on other sites

URL

 

 

alert($find regular expression("[kenplayer]https://www.websiteurl.com/3456782[/kenplayer]","(?<=\\]).*?(?=\\[)"))
Number

 

 

 

alert($find regular expression("[kenplayer]https://www.websiteurl.com/3456782[/kenplayer]","\\d+"))
Link to post
Share on other sites

I tried this but the bot crashes when trying to gather [kenplayer]https://www.websiteurl.com/3456782[/kenplayer] ,  How do i use regex to directly scrape the url that is between  [kenplayer] and  [/kenplayer] ? I tried page scrape, but when scrapingh this thing, the bot crashes.

Link to post
Share on other sites

I found a tournabout to my issue, i scrape an url like that: 

https://ci.mydomain.com/m=eGJF8f/media/videos/201401/08/639987/original/12.jpg

But i do not know howis the code to scrape the 639987 from this url.as the numbers change in every page, what it seems that there is always a two digit number between  slashes before, and a slash after.

Link to post
Share on other sites

Try this

define $Item number(#URL) {
    clear list(%id)
    add list to list(%id,$list from text(#URL,"/"),"Delete","Global")
    comment("Should work as long as the position is the same.")
    return($list item(%id,7))
}
alert($Item number("https://ci.mydomain.com/m=eGJF8f/media/videos/201401/08/639987/original/12.jpg"))

Regards,

CD

Link to post
Share on other sites

While Nick and Pash gave you some great information it still did not address how you would scrape the information.

 

Since I did not know the website that you are using I took the liberty of using my own test website. http://ubotsandbox.website/page-scrape-example.php

navigate("http://ubotsandbox.website/page-scrape-example.php","Wait")
wait for browser event("Everything Loaded","")
clear list(%ItemstoKeep)
set(#HowManyToScrape,$scrape attribute(<(tagname="p" AND innertext=r"\\(LEFTSIDE TEXT\\)(.\{1,\})\\(RIGHTSIDE TEXT\\)")>,"tagname"),"Global")
set(#HowManyToScrape,"{$text length($replace(#HowManyToScrape,$new line,$nothing))}","Global")
set(#index,"-1","Global")
loop(#HowManyToScrape) {
    increment(#index)
    set(#var,$scrape attribute($element offset(<(tagname="p" AND innertext=r"\\(LEFTSIDE TEXT\\)(.\{1,\})\\(RIGHTSIDE TEXT\\)")>,#index),"innertext"),"Global")
    set(#var,$replace($replace(#var,"(LEFTSIDE TEXT)",$nothing),"(RIGHTSIDE TEXT)",$nothing),"Global")
    add item to list(%ItemstoKeep,#var,"Don\'t Delete","Global")
}
set(#var,"","Global")

There are probably some shortcuts that I could have taken to shorten my code but I wanted to show you and others learning UBot Studio how things work at basic levels.

 

Notice how I counted the number of items I wanted to scrape.  I then used $element offset to scrape each item one by one in a loop and I stripped out the text that was encapsulating my text (which could have been your shortcodes).  While my example shows the innertext of my HTML it could have been embedded as well..

 

I hope that helps you.

 

Buddy

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...