Jump to content
UBot Underground

Scraping multiple attributes with REGEX


Recommended Posts

Hi everyone,

 

I have had this issue since I purchased UbotStudio and finally decided to ask after many many hours trying to figure it out (if it is even possible!)

 

I am trying to scrape content on a page that is between this code

MISC HTML</abbr></span></span></h3><p class="description">SCRAPETHISCONTENT</p><dl class="demographic"><dt>Location</dt><dd>MISC HTML
 
MISC HTML</abbr></span></span></h3><p class="description">SCRAPETHISCONTENT</p><dl class="demographic"><dt>Location</dt><dd>MISC HTML

Now, I can scrape attribute and Wildcard content between <p class="description">*</p>. But this is not viable as there are 3 other irrelevant pieces of content on the web page that also use the tags above.

 

Therefore I want to regex the page between this code:

</h3><p class="description"></p><dl class="demographic">

using

 (?<=</h3><p class="description">)(.*?)(?=</p><dl class="demographic">)

However I am having trouble figuring out how to set this up? I tried using add to list, scrape attribute, switching to regularexpressions and pasting the above exp, however this does not seem to work?

 

Is there any way to regex multiple attributes within the html source code (where we are breaking out of html tags?)

 

Thanks guys :)

Link to post
Share on other sites

Why don't you scrape all the data (the 3 different sets) into a list and apply various data cleaning after that on the list with regex, for instance, to keep only what you need?

 

Usually, when the page you scrape is coded poorly, class/id-wise, it is best to just take as much as you can and clean things within UBS.

 

Hope this helps you...

  • Like 1
Link to post
Share on other sites

Thanks for the tip Vaultboss! Went with your suggestion and it got it to go :)

Although I am curious as to how Regular Expressions work from within Scrape Attribute? I have searched ubotstudio but cannot seem to find a guide

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...