MrGeezer 3 Posted May 16, 2013 Report Share Posted May 16, 2013 Hi everyone, I have had this issue since I purchased UbotStudio and finally decided to ask after many many hours trying to figure it out (if it is even possible!) I am trying to scrape content on a page that is between this code MISC HTML</abbr></span></span></h3><p class="description">SCRAPETHISCONTENT</p><dl class="demographic"><dt>Location</dt><dd>MISC HTML MISC HTML</abbr></span></span></h3><p class="description">SCRAPETHISCONTENT</p><dl class="demographic"><dt>Location</dt><dd>MISC HTML Now, I can scrape attribute and Wildcard content between <p class="description">*</p>. But this is not viable as there are 3 other irrelevant pieces of content on the web page that also use the tags above. Therefore I want to regex the page between this code: </h3><p class="description"></p><dl class="demographic">using (?<=</h3><p class="description">)(.*?)(?=</p><dl class="demographic">)However I am having trouble figuring out how to set this up? I tried using add to list, scrape attribute, switching to regularexpressions and pasting the above exp, however this does not seem to work? Is there any way to regex multiple attributes within the html source code (where we are breaking out of html tags?) Thanks guys Quote Link to post Share on other sites
MrGeezer 3 Posted May 16, 2013 Author Report Share Posted May 16, 2013 EDITED Quote Link to post Share on other sites
VaultBoss 310 Posted May 16, 2013 Report Share Posted May 16, 2013 Why don't you scrape all the data (the 3 different sets) into a list and apply various data cleaning after that on the list with regex, for instance, to keep only what you need? Usually, when the page you scrape is coded poorly, class/id-wise, it is best to just take as much as you can and clean things within UBS. Hope this helps you... 1 Quote Link to post Share on other sites
MrGeezer 3 Posted May 16, 2013 Author Report Share Posted May 16, 2013 Thanks for the tip Vaultboss! Went with your suggestion and it got it to go Although I am curious as to how Regular Expressions work from within Scrape Attribute? I have searched ubotstudio but cannot seem to find a guide Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.