The is a programming library I came across once (cant remember name) that allowed you to feed in a pdf and it would export the PDF as HTML. I even remember finding a few websites that you could upload a pdf to and get a HTML version of it. This would be the route to go down to scrape the text via ubot.