Jump to content
UBot Underground

Scrape Only First Line Of Result


Recommended Posts

Hello,

 

This is my first post. I have Ubot Pro Edition and I am trying to get thing running. I use another program to scrape but I know with Ubot there is much more to do. But I need to learn as I know nothing.

 

I am trying to scrape only the first result of a search.

 

I use the element selector for the whole block and I tried many things, but my output is every time two lines (or more).

 

For example I tried this:

 

add list to list(%Related_ads,$scrape attribute(<class="category-suggestions list expand">,"innertext"),"Delete","Global")

 

I am also looking into regex and I can get a result, but still since the output in both classes is the same, I can't seem how to get only the first result.

 

I hope someone can help. I post the html since the search box is on a site with password.

 

 

<ul class="category-suggestions list expand">
 
<li class="suggestion item" data-l1="537" data-l2="561" data-bucket="282">
<span class="l1">Witgoed en Apparatuur</span><span class="mp-svg-arrow-right"></span>
<span class="l2">Ventilatoren en Airco's</span></li>
 
<li class="suggestion item" data-l1="537" data-l2="553" data-bucket="286">
<span class="l1">Witgoed en Apparatuur</span><span class="mp-svg-arrow-right"></span>
<span class="l2">Overige Witgoed en Apparatuur</span></li></ul>
                       
 
</div>
 
 

 

 

 

 

 

 

 

 

 

Link to post
Share on other sites

HelloInsomnia's answer is best for a beginner, but to show you another method, Ubots Element Offset, using the same code as he uses, but with a element offset node, you can select the specific element to scrape, in this case 0

add item to list(%Related_ads,$scrape attribute($element offset(<(tagname="li" AND class="suggestion item")>,0),"innertext"),"Don\'t Delete","Global")

I made a quick tutorial on element offsets posted here

 

http://network.ubotstudio.com/forum/index.php/topic/19057-tutorial-what-are-element-offsets/

  • Like 1
Link to post
Share on other sites

I think there's an error on the site I try to scrape.

 

Between <span class="l1"> and <span class="l2"> there's this: <span class="mp-svg-arrow-right"></span> 

 

That thing doens't excist, I think it must be <span class="mp-Icon mp-svg-arrow-right">

 

When I replace this in Firefox the divider is there.

 

Now in my output l1 and l2 is one result without a divider or space.

 

Is there a way to work around?

Link to post
Share on other sites

I think there's an error on the site I try to scrape.

 

Between <span class="l1"> and <span class="l2"> there's this: <span class="mp-svg-arrow-right"></span> 

 

That thing doens't excist, I think it must be <span class="mp-Icon mp-svg-arrow-right">

 

When I replace this in Firefox the divider is there.

 

Now in my output l1 and l2 is one result without a divider or space.

 

Is there a way to work around?

 

If you can go into ubot and navigate to the page and then click (in Ubot menu bar) View -> Web Inspector and then find the parent element of the list by highlighting over different sections and trying to find the list element. Then right click that and select Copy -> Copy outerHTML and paste that in here so we can see what's going on.

 

Viewing it in another browser can potentially give you different code.

Link to post
Share on other sites

<ul class="category-suggestions list expand"><li class="suggestion item" data-l1="2600" data-l2="2906" data-bucket="20"><span class="l1">Auto-onderdelen</span><span class="mp-svg-arrow-right"></span><span class="l2">Airco en Verwarming</span></li><li class="suggestion item" data-l1="91" data-l2="114" data-bucket=""><span class="l1">Auto's</span><span class="mp-svg-arrow-right"></span><span class="l2">Honda</span></li><li class="suggestion item" data-l1="91" data-l2="112" data-bucket=""><span class="l1">Auto's</span><span class="mp-svg-arrow-right"></span><span class="l2">Ford</span></li><li class="suggestion item" data-l1="91" data-l2="129" data-bucket=""><span class="l1">Auto's</span><span class="mp-svg-arrow-right"></span><span class="l2">Mazda</span></li><li class="suggestion item" data-l1="91" data-l2="133" data-bucket=""><span class="l1">Auto's</span><span class="mp-svg-arrow-right"></span><span class="l2">Mini</span></li></ul>

Link to post
Share on other sites

Okay I think I see the problem, we were giving you something which mashed the words together, this will split them into 2 list items so that you can put them back together if you need to or use them separately:

set(#firstItem,$scrape attribute($element offset(<(tagname="li" AND class="suggestion item")>,0),"innerhtml"),"Global")
clear list(%ad)
add list to list(%ad,$find regular expression(#firstItem,"(?<=\\\">).+?(?=<\\/)"),"Delete","Global")
  • Like 1
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...