rocket976 62 Posted February 11, 2012 Report Share Posted February 11, 2012 Ok so Im sick of writing new articles and have moved on from posting press releases, Im officially over it. So here comes web2.0 sites and Im already fed up writing. After downloading Meatro's gracious bot http://ubotstudio.com/forum/index.php?/topic/9383-get-archived-content-generator/ It has answered a lot of questions and proposed many more. All I want to do is scrape an article from a search engine on any of the article or press release sites and will spin it with TBS.. This is what I came up with ui text box("Keyword: ", #kw) clear list(%scrapeTITLE) clear list(%scrapeBODY) ui stat monitor("Total Scraped", $list total(%urls)) reset account("Any") navigate("http://goarticles.com/", "Wait") type text(<name="q">, #kw, "Standard") click(<class="site-search-button">, "Left Click", "No") wait(2) loop(10) { add list to list(%scrapeTITLE, $scrape attribute(<outerhtml=w"<a href=\"/article/*\" target=\"newarticle\" class=\"article_title_link\">*</a>">, "outertext"), "Delete", "Global") click(<outerhtml=w"<a href=\"/article/*/\" target=\"newarticle\" class=\"article_title_link\">*</a>">, "Left Click", "No") wait($rand(12, 20)) add list to list(%scrapeBODY, $scrape attribute(<innertext=w"*">, "outertext"), "Delete", "Global") click(<id="uscript-close-button">, "Left Click", "No") } save to file("{$special folder("Desktop")}/goarticlescrape-messy.txt", "{%scrapeTITLE},{%scrapeBODY}") It should look familiar to you John.. thanks However its just not working as desired and includes an a ton of crap inserted by goarticles. Im thinking before I waste another night on this maybe some pros could chime in on best practices for scraping articles. Thanks in advance Quote Link to post Share on other sites
JohnB 255 Posted February 11, 2012 Report Share Posted February 11, 2012 Try this out: ui text box("Keyword: ", #kw)clear list(%scrapeTITLE)clear list(%scrapeBODY)ui stat monitor("Total Scraped", $list total(%urls))reset account("Any")navigate("http://goarticles.com/", "Wait")type text(<name="q">, #kw, "Standard")click(<class="site-search-button">, "Left Click", "No")wait(2)loop(10) { add list to list(%scrapeTITLE, $scrape attribute(<outerhtml=w"<a href=\"/article/*\" target=\"newarticle\" class=\"article_title_link\">*</a>">, "fullhref"), "Delete", "Global") click(<innertext="Next">, "Left Click", "No") wait(5)}clear list(%ArtTitle)clear list(%scrapeBODY)set(#position, 0, "Global")loop($list total(%scrapeTITLE)) { navigate($list item(%scrapeTITLE, #position), "Wait") stop script add item to list(%ArtTitle, $scrape attribute(<class="art_head">, "innertext"), "Delete", "Global") add item to list(%scrapeBODY, $scrape attribute(<class="KonaBody">, "innertext"), "Delete", "Global") wait($rand(12, 20)) increment(#position)}clear table(&articles)add list to table as column(&articles, 0, 0, %ArtTitle)add list to table as column(&articles, 0, 1, %scrapeBODY)save to file("{$special folder("Desktop")}/goarticlescrape.csv", &articles)John 1 Quote Link to post Share on other sites
rocket976 62 Posted February 11, 2012 Author Report Share Posted February 11, 2012 awesome, left it last night at the navigating to list, this clarifies it thanks again John Quote Link to post Share on other sites
JohnB 255 Posted February 11, 2012 Report Share Posted February 11, 2012 np http://ubotstudio.com/forum/public/style_emoticons/default/smile.gif John Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.