Jump to content
UBot Underground

[NEED] Newb's Guide To Scraping & Best Practices


Recommended Posts

Ok so Im sick of writing new articles and have moved on from posting press releases, Im officially over it. So here comes web2.0 sites and Im already fed up writing. After downloading Meatro's gracious bot http://ubotstudio.com/forum/index.php?/topic/9383-get-archived-content-generator/ It has answered a lot of questions and proposed many more. All I want to do is scrape an article from a search engine on any of the article or press release sites and will spin it with TBS..

 

This is what I came up with

 

ui text box("Keyword: ", #kw)
clear list(%scrapeTITLE)
clear list(%scrapeBODY)
ui stat monitor("Total Scraped", $list total(%urls))
reset account("Any")
navigate("http://goarticles.com/", "Wait")
type text(<name="q">, #kw, "Standard")
click(<class="site-search-button">, "Left Click", "No")
wait(2)
loop(10) {
   add list to list(%scrapeTITLE, $scrape attribute(<outerhtml=w"<a href=\"/article/*\" target=\"newarticle\" class=\"article_title_link\">*</a>">, "outertext"), "Delete", "Global")
   click(<outerhtml=w"<a href=\"/article/*/\" target=\"newarticle\" class=\"article_title_link\">*</a>">, "Left Click", "No")
   wait($rand(12, 20))
   add list to list(%scrapeBODY, $scrape attribute(<innertext=w"*">, "outertext"), "Delete", "Global")
   click(<id="uscript-close-button">, "Left Click", "No")
}
save to file("{$special folder("Desktop")}/goarticlescrape-messy.txt", "{%scrapeTITLE},{%scrapeBODY}")

 

It should look familiar to you John.. thanks

 

However its just not working as desired and includes an a ton of crap inserted by goarticles. Im thinking before I waste another night on this maybe some pros could chime in on best practices for scraping articles.

 

Thanks in advance

Link to post
Share on other sites

Try this out:

 

ui text box("Keyword: ", #kw)

clear list(%scrapeTITLE)

clear list(%scrapeBODY)

ui stat monitor("Total Scraped", $list total(%urls))

reset account("Any")

navigate("http://goarticles.com/", "Wait")

type text(<name="q">, #kw, "Standard")

click(<class="site-search-button">, "Left Click", "No")

wait(2)

loop(10) {

add list to list(%scrapeTITLE, $scrape attribute(<outerhtml=w"<a href=\"/article/*\" target=\"newarticle\" class=\"article_title_link\">*</a>">, "fullhref"), "Delete", "Global")

click(<innertext="Next">, "Left Click", "No")

wait(5)

}

clear list(%ArtTitle)

clear list(%scrapeBODY)

set(#position, 0, "Global")

loop($list total(%scrapeTITLE)) {

navigate($list item(%scrapeTITLE, #position), "Wait")

stop script

add item to list(%ArtTitle, $scrape attribute(<class="art_head">, "innertext"), "Delete", "Global")

add item to list(%scrapeBODY, $scrape attribute(<class="KonaBody">, "innertext"), "Delete", "Global")

wait($rand(12, 20))

increment(#position)

}

clear table(&articles)

add list to table as column(&articles, 0, 0, %ArtTitle)

add list to table as column(&articles, 0, 1, %scrapeBODY)

save to file("{$special folder("Desktop")}/goarticlescrape.csv", &articles)

John

  • Like 1
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...