Jump to content
UBot Underground

Recommended Posts

Hello.

I already read Offset tutorial, but i still can't get it.

So i want to scrape a href link under all of <li> tag's and add them to the new list.

 

<li>
<a href="/alfaromeo3455534/">
....
</li>
 
So the whole code looks like this :
 
<body>
<div id="main">
<div class="content">
<div class="c-1 endless_page_template">
<ul class="list">
<li> <ahref> </li>
<li> <ahref> </li>
<li> <ahref> </li>
<li> <ahref> </li>
<li> <ahref> </li>
...
</ul>
 

 

Link to post
Share on other sites

Based on the code you provided it will only scrape links in the class="list" which is the unordered list. There could be more elements with that class name which could be the problem. You can try this instead which is a bit more specific but without seeing the page this is all I can go on and so keep that in mind:

add list to list(%links,$find regular expression($scrape attribute(<(class="list" AND tagname="ul")>,"innerhtml"),"(?<=href=\\\").*?(?=\\\")"),"Delete","Global")
Link to post
Share on other sites

So with this code it should scrape only from <li> which are in <ul> tag, but it doesn't work. Ubot is still scraping all a href tags, including these from div's below (hashtags). 

 

<div class="c-1 endless_page_template">
<div id="hashtag_ticker">
<a id="more_hashtags_link" href="/tags/">(more tags)</a>
<a href="/tag/fire/">#fire</a>
<a href="/tag/knife/">#knife</a>   - i don't want to scrape 
<a href="/tag/fish/">#fish</a>
 
</div>
<div>
<h2></h2>
</div>
<div class="searching-note" style="display:none">
<p>Searching for items matching your preferences...</p>
</div>
<div class="searching-keyword" style="display:none">
<p>Search results for "None"</p>
</div>
<ul class="list">
<li>
<a href="/lettali/"> - i want to scrape
...
<li>
Link to post
Share on other sites

I see what you're saying maybe if you show me the page I can take a look but for now this is what I can do and it works fine for me:

load html("<div class=\"c-1 endless_page_template\">
<div id=\"hashtag_ticker\">
<a id=\"more_hashtags_link\" href=\"/tags/\">(more tags)</a>
<a href=\"/tag/fire/\">#fire</a>
<a href=\"/tag/knife/\">#knife</a>   - i don\'t want to scrape 
<a href=\"/tag/fish/\">#fish</a>
 
</div>
<div>
<h2></h2>
</div>
<div class=\"searching-note\" style=\"display:none\">
<p>Searching for items matching your preferences...</p>
</div>
<div class=\"searching-keyword\" style=\"display:none\">
<p>Search results for \"None\"</p>
</div>
<ul class=\"list\">
<li>
<a href=\"/lettali/\">iwant to scrape</a>
</li>
</ul>")
clear list(%links)
add list to list(%links,$find regular expression($scrape attribute(<(class="list" AND tagname="ul")>,"innerhtml"),"(?<=href=\\\").*?(?=\\\")"),"Delete","Global")
Link to post
Share on other sites

Many thanks for your help, but for me it's still scraping tags.

So the website is - chaturbate .c*m  (+18) and i want to scrape all rooms urls from 1 page.

With your code it also scrape tags which are above.

Link to post
Share on other sites

Many thanks for your help, but for me it's still scraping tags.

So the website is - chaturbate .c*m  (+18) and i want to scrape all rooms urls from 1 page.

With your code it also scrape tags which are above.

 

I see why that's happening now, there are tags in there as well which get picked up. Anyways, try this:

clear list(%links)
add list to list(%links,$find regular expression($scrape attribute(<class="title">,"innerhtml"),"(?<=href=\\\")\\/.*?(?=\\\")"),"Delete","Global")
  • Like 1
Link to post
Share on other sites

 

I see why that's happening now, there are tags in there as well which get picked up. Anyways, try this:

clear list(%links)
add list to list(%links,$find regular expression($scrape attribute(<class="title">,"innerhtml"),"(?<=href=\\\")\\/.*?(?=\\\")"),"Delete","Global")

Awesome, it works perfectly, I appreciate your help. Thank you once more !

  • Like 1
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...