Cary Duke 0 Posted May 27, 2010 Report Share Posted May 27, 2010 Hi guys, Looking for a little direction. Not only am I a newbie to Ubot but am totally new to programing... I want to make a bot which will scrape membership site information. Some of these include pdf's of the membership directory. This directory include emails. That's the info I desire to scrape and put into a .csv format for use in outlook. With that said, I am watching the tutorials, practicing, and learning... I have the 3.3 beta version. The tutorials are somewhat different. Should I go back to the previous version? Can Ubot scrape a pdf file? I ask because when i'm on the pdf page, the right click drop down options aren't there. Suggestions? Thanks in advance Cary Quote Link to post Share on other sites
TommyTx 5 Posted May 27, 2010 Report Share Posted May 27, 2010 I would guess Ubot won't scrape a pdf.. however it can download it and if you need just certain stuff from it there are programs that can convert it to word type documents so that any data can be retrieved... Quote Link to post Share on other sites
UBotBuddy 331 Posted May 27, 2010 Report Share Posted May 27, 2010 No, I am pretty sure it can't scrape from a PDF. NOW THAT my friends would be a VERY cool trick. Also, it cannot scrape from a Flash form either...I tried. I thought they were regular forms. They sure looked liked regular web forms. Quote Link to post Share on other sites
The_Brit 13 Posted May 27, 2010 Report Share Posted May 27, 2010 You can do it in a round about way. If the pdf can be downloaded via UBot, you can execute one of the programs that TommyTX mentioned and convert it to plain text. You then use the Navigate option to load it back into UBot where you can then scrape the contents. Instead of http:// you use file:// I used this for obtaining the shortened URL from lil.io for somebody. Not translating a pdf but loading a text file to scrape the data. Hope this helps Dave 1 Quote Link to post Share on other sites
UBotBuddy 331 Posted May 27, 2010 Report Share Posted May 27, 2010 Hmmmmm.... Interesting. That is an interesting solution. I wonder if there is a way to snapshot a Flash form to a PDF and do this method you suggested. That would be a solution to a problem I need to address. Quote Link to post Share on other sites
Net66 54 Posted May 27, 2010 Report Share Posted May 27, 2010 You could grab the url of the pdf and then feed it into the adobe online conversion tool to convert to html or text file. http://www.adobe.com/products/acrobat/access_onlinetools.html If you convert to html the content can be scraped :-) I'd use proxies if you are doing more than one file in succession. Andy P.S. Welcome to the Ubot Underground Cary! 1 Quote Link to post Share on other sites
BizWebCoach 0 Posted September 30, 2010 Report Share Posted September 30, 2010 I am trying to download a PDF, but I'm unsuccessful. I can use the 'download file' command to save a file that ends up being a PDF, but when I try to open it, I get an error saying the file is not a valid PDF. On the dialog to save the file for the download file command, I can name the file and choose where to save it, but I cannot specify that it should be saved as a PDF file type in the file type dropdown. Is there a problem with mime types, or am I doing something wrong? I would appreciate instructions about how to save a PDF from within the adobe helper window that controls the browser when viewing a PDF with the browser. Thanks to anyone who can help! Quote Link to post Share on other sites
MiriamMB 63 Posted September 30, 2010 Report Share Posted September 30, 2010 I am trying to download a PDF, but I'm unsuccessful. I can use the 'download file' command to save a file that ends up being a PDF, but when I try to open it, I get an error saying the file is not a valid PDF. On the dialog to save the file for the download file command, I can name the file and choose where to save it, but I cannot specify that it should be saved as a PDF file type in the file type dropdown. Is there a problem with mime types, or am I doing something wrong? I would appreciate instructions about how to save a PDF from within the adobe helper window that controls the browser when viewing a PDF with the browser. Thanks to anyone who can help! Hmm...it seems there's no reason why it should not be working. I just tested this with the download file command. Let me attach a picture. and it was able to save when I browsed to the folder I wanted and then typed in "neruda.pdf". try typing the name and the extension like how I saved mine. so file.pdf Quote Link to post Share on other sites
jimbourekas@yahoo.gr 1 Posted June 12, 2013 Report Share Posted June 12, 2013 (edited) Hello, I know it's over 2 years later that I'm waking this thread up. I am wondering, is it possible with version 4 to display a pdf in the browser area just like your screenshot? jim Ok, I got support to answer this. Version 3 was based on Internet Explorer core while version 4 is not. So, the answer is that it cannot display a pdf... Edited June 13, 2013 by jimbourekas@yahoo.gr Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.