Scraping multiple attributes with REGEX

MrGeezer · May 16, 2013

Hi everyone,

I have had this issue since I purchased UbotStudio and finally decided to ask after many many hours trying to figure it out (if it is even possible!)

I am trying to scrape content on a page that is between this code

MISC HTML</abbr></span></span></h3><p class="description">SCRAPETHISCONTENT</p><dl class="demographic"><dt>Location</dt><dd>MISC HTML
 
MISC HTML</abbr></span></span></h3><p class="description">SCRAPETHISCONTENT</p><dl class="demographic"><dt>Location</dt><dd>MISC HTML

Now, I can scrape attribute and Wildcard content between <p class="description">*</p>. But this is not viable as there are 3 other irrelevant pieces of content on the web page that also use the tags above.

Therefore I want to regex the page between this code:

</h3><p class="description"></p><dl class="demographic">

using

 (?<=</h3><p class="description">)(.*?)(?=</p><dl class="demographic">)

However I am having trouble figuring out how to set this up? I tried using add to list, scrape attribute, switching to regularexpressions and pasting the above exp, however this does not seem to work?

Is there any way to regex multiple attributes within the html source code (where we are breaking out of html tags?)

Thanks guys

MrGeezer · May 16, 2013

EDITED

VaultBoss · May 16, 2013

Why don't you scrape all the data (the 3 different sets) into a list and apply various data cleaning after that on the list with regex, for instance, to keep only what you need?

Usually, when the page you scrape is coded poorly, class/id-wise, it is best to just take as much as you can and clean things within UBS.

Hope this helps you...

MrGeezer · May 16, 2013

Thanks for the tip Vaultboss! Went with your suggestion and it got it to go

Although I am curious as to how Regular Expressions work from within Scrape Attribute? I have searched ubotstudio but cannot seem to find a guide

Sign In

Scraping multiple attributes with REGEX

Recommended Posts

MrGeezer 3

Link to post

Share on other sites

MrGeezer 3

Link to post

Share on other sites

VaultBoss 310

Link to post

Share on other sites

MrGeezer 3

Link to post

Share on other sites

Join the conversation

Browse

Activity