groneg 0 Posted January 29, 2010 Report Share Posted January 29, 2010 I'm trying to scrape wikipedia & am getting hung up under certain conditions. Take these two search terms: - Menudo - Ricky Martin If you search for Menudo, you come to an intermediary page that lists different options. Searching under Ricky Martin brings you directly to the artist's page. I want to be able to evaluate the intermediary page and click through to the final destination. I have the issue partly solved: I write a conditional statement that searches for "This disambiguation page..." text on the page. But now I need to click the FIRST (#1) bullet point on the page to bring me to the correct URL. But I haven't been able to figure out a way to code this... Quote Link to post Share on other sites
Aaron Nimocks 19 Posted January 29, 2010 Report Share Posted January 29, 2010 You would have to scrape all the links that have that keyword in it on the page and add it to a list. Then just navigate to next list item and it should be the first on the list. Quote Link to post Share on other sites
Aaron Nimocks 19 Posted January 29, 2010 Report Share Posted January 29, 2010 Attached is the bot to help out. This one is done from this page but test multiple ones and it works. http://en.wikipedia.org/wiki/Hamlet_(disambiguation) Im assuming you will have the search keyword saved and you need to use that where the variable keyword is.groneg.ubot 2 Quote Link to post Share on other sites
groneg 0 Posted January 29, 2010 Author Report Share Posted January 29, 2010 Attached is the bot to help out. This one is done from this page but test multiple ones and it works. http://en.wikipedia.org/wiki/Hamlet_(disambiguation) Im assuming you will have the search keyword saved and you need to use that where the variable keyword is. I gotcha. The example is perfect. Thank you Very Much! Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.