Jump to content
UBot Underground

How Can I Scrape All Titles On Every Page On Site?


Recommended Posts

I'm trying to learn how to scrape until the bot has no more pages to scrape or at least 'x' amount deep.

 

Here is the code I have so far but I can't understand yet how to make it click remaining pages or 'x' pages

and keep adding titles.

clear cookies
navigate("http://www.copyblogger.com/blog/", "Wait")
wait for browser event("Everything Loaded", "")
clear list(%titles)
add list to list(%titles, $scrape attribute(<h2,class="entry-title">, "innertext"), "Delete", "Global")

Any suggestions?

Link to post
Share on other sites

Here's one way:

 

clear cookies
ui text box("# of Pages:"#pages)
navigate("http://www.copyblogger.com/blog/""Wait")
wait for browser event("Everything Loaded""")
clear list(%titles)
loop(#pages) {
    add list to list(%titles$scrape attribute(<h2,class="entry-title">"innertext"), "Delete""Global")
    if($exists(<innertext="Next Page»">)) {
        then {
            click(<innertext="Next Page»">"Left Click""No")
            wait(2)
        }
    }
}

Link to post
Share on other sites

Solution nr. 1:

navigate("http://www.copyblogger.com/blog/", "Wait")
wait for browser event("Everything Loaded", "")
clear list(%titles)
loop while($exists(<innertext="Next Page»">)) {
    add list to list(%titles, $scrape attribute(<class="entry-title">, "innertext"), "Delete", "Global")
    click(<innertext="Next Page»">, "Left Click", "No")
    wait for browser event("Everything Loaded", "")
    wait(3)
}

If exists Next Page» then click it until not exist....

Link to post
Share on other sites

Solution nr 2:

ui text box("Nr. of pages:", #nr_of_pages)
clear list(%titles)
set(#i, 1, "Global")
loop(#nr_of_pages) {
    navigate("http://www.copyblogger.com/blog/page/{#i}/", "Wait")
    wait for browser event("Everything Loaded", "")
    wait(3)
    add list to list(%titles, $scrape attribute(<class="entry-title">, "innertext"), "Delete", "Global")
    increment(#i)
}

for this particular case,can be used also this...

If you want to scrape all pages for titles,without to introduce a number, scrape the maximum number, <193> in this case ,as total nr_of_pages)

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...