Jump to content
UBot Underground

Any Way to Scrape Font Size?


Recommended Posts

I've been using ubot for ~4 days now and I am trying to make a bot that can go to a product page on just about any ecommerce site and grab the price of that product.

 

So far I have created a very simple and somewhat crude  (but working) Regular Expression \$\d?\d?\d?,?\d?\d?\d?\.\d\d 

that will grab prices from the html source of a page but unfortunately almost all product pages have multiple prices on them whether it be a compare at price, or shipping price, or even related products.

 

After looking a few different sites over and thinking about it, one thing they all had in common was the product price (the price I want) always has the largest font size if I look in the computed styles for firefox or chrome.

 

Unfortunately I can't figure out how to scrape the font size in ubot or if it even has that functionality. Does ubot have its own version of computed styles? (I imagine even just being able to access any pages css via ubot would work but I'm thinking it would be a little tougher to get to work for the purpose I'd be using it for)

 

Any help or resources about css with ubot would be much appreciated.

 

 

 

Link to post
Share on other sites

Since you are scraping HTML font size is usually provided with innline style or CSS (when you srape you don't usually scrape info about font size), so this is where you should look for font size.

 

However, it might be sufficient if you only find out the CSS class of the product price and always scrape that one.

Link to post
Share on other sites

There is probably a better way to scrape the price you need. If you post the website or an example of the html you need to scrape from it would be easier to help.

 

btw here is a better regex for scraping the price \$([0-9]{1,3},|)[0-9]{1,3}\.[0-9]{2,2}

Link to post
Share on other sites

Thanks for the responses, I have setup a script that will scrape specific sites, but having to manually put in waits and clicks and find the element to scrape was getting quite tedious after the ~20th site even though the majority of the code was copy and paste. So I was looking for some way to make a universal scraper that would work on almost any site, though it appears there may not quite be an easy way to do it.

 

Here's more of an example, I can easily scrape the prices from these pages by themselves by selecting the element I want in scrape atribute.

 
but the idea is to come up with one script that I could use to grab the price data from not only those 3 sites but theoretically any product page I throw in there.
 
the 2 unifying factors that seem to be universal are the facts that the data I want is always proceeded with a $ sign and ends with a .and 2 digits (the RegEx I made), and from the multiple prices on a page that I may grab, the one I always want is the one with the largest font.
 
All in all this is looking like its going to be quite the project to make work so I'm thinking I might stick with the monotonous way I've been doing it of 1 site at a time. I was more curious to see if there would be a fairly easy way to do it.
Link to post
Share on other sites

The best I could come up with was this:

define price scrape {
    if($contains($url, "appliancesconnection.com")) {
        then {
            set(#price, $find regular expression($scrape attribute(<id="product-page-price">, "innertext"), "\\$([0-9]\{1,3\},|)[0-9]\{1,3\}\\.[0-9]\{2,2\}"), "Global")
        }
        else {
        }
    }
    if($contains($url, "allegroshops.com")) {
        then {
            set(#price, $find regular expression($scrape attribute($element offset(<class="price">, 2), "innertext"), "\\$([0-9]\{1,3\},|)[0-9]\{1,3\}\\.[0-9]\{2,2\}"), "Global")
        }
        else {
        }
    }
    if($contains($url, "decoruniverse.com")) {
        then {
            set(#price, $find regular expression($scrape attribute(<id="divPrice">, "innertext"), "\\$([0-9]\{1,3\},|)[0-9]\{1,3\}\\.[0-9]\{2,2\}"), "Global")
        }
        else {
        }
    }
}

If you already have a list of specific sites all you need to do is to copy the if command and replace the domain on the contains function and the element to scrape on the scrape attribute. I did this with this 3 sites very quickly so I'm sure you could have a bunch of sites done in less than 5-10 mins.

 

To make a universal scraper of this type is a bit dificult since most of the time when you scrape you look for patterns and the few ones that you mention can basically be anything on any page.

Link to post
Share on other sites

I appreciate you taking the time to work out an alternative solution like that.

 

Looking at your code  I'm definitely seeing how I could put together a bot that could search for example "product name"+"SKU code" in google for a product, grab the pages, navigate to the ones I have taught the bot to deal with in the little group of if then statements there and grab the price, then throw in a little else statement that will save all the sites I don't currently have to a list to add in later.(rough idea but the wheels are turning)

 

Eventually I would hope to be able to grab prices from just about all sites I come across.(the initial goal)

 

I almost feel like a lot of the bot would be quicker to code with code as opposed to the ubot GUI. Is it possible to use code in the standard license version of ubot or would I need to upgrade?

Edited by Alex1800
Link to post
Share on other sites

You cant use code on standard version but the fix for copying anything on the standard version is to put everything on a define or a loop and copy the whole define or loop then moving it out the command.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...