Scrape .pdf contet

mdc101 · June 26, 2011

Is it possible to be able to scrape pdf content via the browser?

This would be a great feature

LoWrIdErTJ - BotGuru · June 26, 2011

pdf's are downloadable from IE if i am correct. have you tried to select any text or scrape page when viewing.

JohnB · June 26, 2011

This should be a good indicator of why it's not possible:

http://screencast.com/t/BiAuogPDB5U

John

Guerrilla · June 29, 2011

The is a programming library I came across once (cant remember name) that allowed you to feed in a pdf and it would export the PDF as HTML. I even remember finding a few websites that you could upload a pdf to and get a HTML version of it.

This would be the route to go down to scrape the text via ubot.

LoWrIdErTJ - BotGuru · June 29, 2011

it is possible to render the pdf as plain text using c++ and C#

but we will see if this becomes an option or not.

Good idea on the download of, and uplaod to a site to convert it thought. +1

mdc101 · December 1, 2011

Thanks for the feedback guys

Sign In

Scrape .pdf contet

Recommended Posts

mdc101 15

Link to post

Share on other sites

LoWrIdErTJ - BotGuru 904

Link to post

Share on other sites

JohnB 255

Link to post

Share on other sites

Guerrilla 19

Link to post

Share on other sites

LoWrIdErTJ - BotGuru 904

Link to post

Share on other sites

mdc101 15

Link to post

Share on other sites

Join the conversation

Browse

Activity