$10 to the first person to finish this article bot

semjuice · May 4, 2010

>> $10 Paypal to the first person who can finish the last part of this script!

------------------------------------------------------------------------

One one webpage, there are 3 different blocks of content each having a TITLE and an ARTICLE. I've written a bot that will set the KEYWORD, TITLE 1, and ARTICLE 1, but for whatever reason I can't figure out how to properly set TITLE 2 & 3 and ARTICLE 2 & 3.

* I've attached a bot which has the first TITLE and ARTICLE successfully working

* The sample page that I'm scraping is:

http://www.viligent.com/ubot_sample.htm

* Here's a screenshot that shows the layout of the page (also attached):

http://viligent.com/images/ubot-sample.jpg

Let me know if you have any questions that I can clarify.

Otherwise if you're able to finish this bot, please post it here or send it to me at semjuice @ gmail .com (along with your Paypal so I can pay you

Thanks in advance!

UBotBuddy · May 4, 2010

Where's your bot?

semjuice · May 4, 2010

Sorry about that...see attached on this post

articlescrape.ubot

semjuice · May 12, 2010

Ok, I'll up the ante to $20 if someone can finish this by Friday

AKprogrammer · May 14, 2010

In my opinion, your code to scrape title2 should work and this is a UBot bug. Either that, or I don't understand the implementation of $page_scrape.

One time I was so frustrated with $page_scrape, I wrote my own function in javascript. Unfortunately I can't help you using this function, because your page doesn't have javascript on it... and pages without existing javascript don't run inserted javascript (btw - my utility SpeedyBot can get around this).

So I can't claim your $20. Still - I attached a bot that would work if you had javascript on the page.

I'd also like to see how a UBot expert would scrape this page with $page_scrape...

ArticleScrapeJavascript.ubot

Net66 · May 14, 2010

In my opinion, your code to scrape title2 should work and this is a UBot bug. Either that, or I don't understand the implementation of $page_scrape.

I agree. I had a mess about with this and you just cannot make that scrape work when it should.

Net66 · May 14, 2010

OK I solved it for you :-)

Will email you the working bot.

Andy

semjuice · May 14, 2010

OK I solved it for you :-)

Will email you the working bot.

Andy

Awesome Andy! Thanks! I got your email, tested it out and everything worked perfectly (just had to update the location of the file). Let me know if you didn't get the paypal...I sent it about 2 mins ago.

Net66 · May 14, 2010

Awesome Andy! Thanks! I got your email, tested it out and everything worked perfectly (just had to update the location of the file). Let me know if you didn't get the paypal...I sent it about 2 mins ago.

Excellent. I was meant to change the file location to use the document folder before I sent it to you. Thanks for the paypal, I shall be donating it to a local school.

Best wishes,

Andy

AKprogrammer · May 16, 2010

OK I solved it for you :-)

Will email you the working bot.

Do you mind sharing how you did it without javascript?

Did you use the page_scrape function?

Net66 · May 16, 2010

Do you mind sharing how you did it without javascript?

Did you use the page_scrape function?

Yes I used page scrape, some loops and some lists...

Basically I used the scrape to get the titles then scraped the whole page, wrote it out to a temp txt file and then read it back into a list so that each line of the page was on a seperate list item. I had title2 and title3 stored in variables so I looped through the new list and recorded the position in the list that title2 and title3 fell at. Finally I built the article2 and 3 variables buy looping through and if the line number was >title2pos and less than title3pos then add the line to article2 variable. If it was greater than title3 pos then add it to article3.

I don't know if that makes any sense at all?

Andy

AKprogrammer · May 17, 2010

Basically I used the scrape to get the titles then scraped the whole page, wrote it out to a temp txt file and then read it back into a list so that each line of the page was on a seperate list item. I had title2 and title3 stored in variables so I looped through the new list and recorded the position in the list that title2 and title3 fell at. Finally I built the article2 and 3 variables buy looping through and if the line number was >title2pos and less than title3pos then add the line to article2 variable. If it was greater than title3 pos then add it to article3.

Nice workaround. Never thought to save to a file and parse it line-by-line... I'll keep that in mind if I have to make a scrape-bot without javascript.

Thanks for sharing.

$10 to the first person to finish this article bot

Recommended Posts

semjuice 0

Link to post

Share on other sites

UBotBuddy 331

Link to post

Share on other sites

semjuice 0

Link to post

Share on other sites

semjuice 0

Link to post

Share on other sites

AKprogrammer 7

Link to post

Share on other sites

Net66 54

Link to post

Share on other sites

Net66 54

Link to post

Share on other sites

semjuice 0

Link to post

Share on other sites

Net66 54

Link to post

Share on other sites

AKprogrammer 7

Link to post

Share on other sites

Net66 54

Link to post

Share on other sites

AKprogrammer 7

Link to post

Share on other sites

Join the conversation