aggin23 1 Posted June 21, 2017 Report Share Posted June 21, 2017 Hello.I already read Offset tutorial, but i still can't get it.So i want to scrape a href link under all of <li> tag's and add them to the new list. <li><a href="/alfaromeo3455534/">....</li> So the whole code looks like this : <body><div id="main"><div class="content"><div class="c-1 endless_page_template"><ul class="list"><li> <ahref> </li><li> <ahref> </li><li> <ahref> </li><li> <ahref> </li><li> <ahref> </li>...</ul> Quote Link to post Share on other sites
HelloInsomnia 1103 Posted June 21, 2017 Report Share Posted June 21, 2017 Try this: add list to list(%links,$find regular expression($scrape attribute(<class="list">,"innerhtml"),"(?<=href=\\\").*?(?=\\\")"),"Delete","Global") Quote Link to post Share on other sites
aggin23 1 Posted June 21, 2017 Author Report Share Posted June 21, 2017 It works, but it also scrape links from div above, i want only links which are under ul tag. Quote Link to post Share on other sites
HelloInsomnia 1103 Posted June 21, 2017 Report Share Posted June 21, 2017 Based on the code you provided it will only scrape links in the class="list" which is the unordered list. There could be more elements with that class name which could be the problem. You can try this instead which is a bit more specific but without seeing the page this is all I can go on and so keep that in mind: add list to list(%links,$find regular expression($scrape attribute(<(class="list" AND tagname="ul")>,"innerhtml"),"(?<=href=\\\").*?(?=\\\")"),"Delete","Global") Quote Link to post Share on other sites
aggin23 1 Posted June 21, 2017 Author Report Share Posted June 21, 2017 So with this code it should scrape only from <li> which are in <ul> tag, but it doesn't work. Ubot is still scraping all a href tags, including these from div's below (hashtags). <div class="c-1 endless_page_template"><div id="hashtag_ticker"><a id="more_hashtags_link" href="/tags/">(more tags)</a><a href="/tag/fire/">#fire</a><a href="/tag/knife/">#knife</a> - i don't want to scrape <a href="/tag/fish/">#fish</a> </div><div><h2></h2></div><div class="searching-note" style="display:none"><p>Searching for items matching your preferences...</p></div><div class="searching-keyword" style="display:none"><p>Search results for "None"</p></div><ul class="list"><li><a href="/lettali/"> - i want to scrape...<li> Quote Link to post Share on other sites
HelloInsomnia 1103 Posted June 21, 2017 Report Share Posted June 21, 2017 I see what you're saying maybe if you show me the page I can take a look but for now this is what I can do and it works fine for me: load html("<div class=\"c-1 endless_page_template\"> <div id=\"hashtag_ticker\"> <a id=\"more_hashtags_link\" href=\"/tags/\">(more tags)</a> <a href=\"/tag/fire/\">#fire</a> <a href=\"/tag/knife/\">#knife</a> - i don\'t want to scrape <a href=\"/tag/fish/\">#fish</a> </div> <div> <h2></h2> </div> <div class=\"searching-note\" style=\"display:none\"> <p>Searching for items matching your preferences...</p> </div> <div class=\"searching-keyword\" style=\"display:none\"> <p>Search results for \"None\"</p> </div> <ul class=\"list\"> <li> <a href=\"/lettali/\">iwant to scrape</a> </li> </ul>") clear list(%links) add list to list(%links,$find regular expression($scrape attribute(<(class="list" AND tagname="ul")>,"innerhtml"),"(?<=href=\\\").*?(?=\\\")"),"Delete","Global") Quote Link to post Share on other sites
aggin23 1 Posted June 21, 2017 Author Report Share Posted June 21, 2017 Many thanks for your help, but for me it's still scraping tags.So the website is - chaturbate .c*m (+18) and i want to scrape all rooms urls from 1 page.With your code it also scrape tags which are above. Quote Link to post Share on other sites
HelloInsomnia 1103 Posted June 23, 2017 Report Share Posted June 23, 2017 Many thanks for your help, but for me it's still scraping tags.So the website is - chaturbate .c*m (+18) and i want to scrape all rooms urls from 1 page.With your code it also scrape tags which are above. I see why that's happening now, there are tags in there as well which get picked up. Anyways, try this: clear list(%links) add list to list(%links,$find regular expression($scrape attribute(<class="title">,"innerhtml"),"(?<=href=\\\")\\/.*?(?=\\\")"),"Delete","Global") 1 Quote Link to post Share on other sites
aggin23 1 Posted June 23, 2017 Author Report Share Posted June 23, 2017 I see why that's happening now, there are tags in there as well which get picked up. Anyways, try this: clear list(%links) add list to list(%links,$find regular expression($scrape attribute(<class="title">,"innerhtml"),"(?<=href=\\\")\\/.*?(?=\\\")"),"Delete","Global")Awesome, it works perfectly, I appreciate your help. Thank you once more ! 1 Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.