Jump to content
UBot Underground

Recommended Posts

Hi guys I need help on this.. How can I isolate the company name at these samples:

 

<SPAN class=pp-place-title><SPAN>Company Name</SPAN></SPAN>
<SPAN class=pp-place-title><SPAN>Company XYZ Inc,</SPAN></SPAN>
<SPAN class=pp-place-title><SPAN>Company Name - Inc</SPAN></SPAN>
<SPAN class=pp-place-title><SPAN>Company Name, ABC</SPAN></SPAN>

 

Thanks in advance

Link to post
Share on other sites

Kreatus, what function are you looking to use? The reason I ask is because the process will be the same with or without regex. (You will most likely still need to use the $replace or $replace regular expression). And since the text within the tags is constant, it will simple be a literal match in regex.

 

John

Link to post
Share on other sites

I wanted to scrape the company title that may vary on different pages.

 

Here are couple of samples:

http://maps.google.com/maps/place?cid=7378256458188814955&q=beer&hl=en&ved=0CF8Q-QswAA&sa=X&ei=9HaLTauDA5TyvgPCufi0CQ

http://maps.google.com/maps/place?cid=7378256458188814955&q=beer&hl=en&ved=0CF8Q-QswAA&sa=X&ei=9HaLTauDA5TyvgPCufi0CQ

 

I am scraping it inside the socket command and regex is my only hope to get it since I tried several attempts to scrape it but still no luck.. Outside socket command I can scrape this easily but outside socket I cant. Regex will surely work on this I just dont know the right pattern..

Link to post
Share on other sites

Kreatus,

 

The problem I think is the ability for "choose by attribute" is not working reliably. I am hoping for a fix for this soon.

 

I will try to get am update for you.

Link to post
Share on other sites

Kreatus,

 

The problem I think is the ability for "choose by attribute" is not working reliably. I am hoping for a fix for this soon.

 

I will try to get am update for you.

Yes choose by attribute is really not working properly inside socket.. But regex is an alternative to use while waiting on the enhancement..

 

 

Ahhh...no problem...Let me work something up for you.

 

John

Thanks John! Looking forward..

Link to post
Share on other sites

wow, that's amazing...clearly regex works in this bot...HOWEVER...it will not work for the title and I think it may have something to do with the fact that the attribute only includes the simple span tags (it does not include...<SPAN class=pp-place-title>). You aren't using the attributes in any of the other variables which is why they work. You have specific text you are searching for with those.

I can't even grab it by position.

I'll keep on trying.

John

Link to post
Share on other sites

wow, that's amazing...clearly regex works in this bot...HOWEVER...it will not work for the title and I think it may have something to do with the fact that the attribute only includes the simple span tags (it does not include...<SPAN class=pp-place-title>). You aren't using the attributes in any of the other variables which is why they work. You have specific text you are searching for with those.

I can't even grab it by position.

I'll keep on trying.

John

 

Hi John I put specific text on that just to test if it will grab the same exact text but it didnt..

 

Still trying also.

Link to post
Share on other sites

HI Kreatus

 

Aint got a clue how you managed to find the <title>*</title> to scrape the listing name - WOuld be great if you could share how you found this - Must have spent more then 30 mins trying all sorts of variations without joy -

 

The address of course was another area I coudnt get passed -

 

There is workaround - but of course im sure many will not like it as you need to keep writing to the browser -

 

Ive amended your bot and added a scrape page command outside of the socket compartment so that it works after a write to browser - However I also show how this can be used running through lists - So I have added a small block text which is line seperated - you can place your gmap urls inside of it and the software will go through upto 5 - Of course you can change this

 

looks like we are very limited with what we can do inside of sockets - But using the write to browswer then scraping outside of the sockets will work a treat - after the scraping has been done then you can continue to use sockets -

 

You may have already thought of this but thought I would make notes just incase - Also the regex for the telephone isnt working for a few uk based listings - Try the following to see what I mean

 

http://maps.google.co.uk/maps/place?client=firefox-a&rls=org.mozilla:en-GB:official&channel=s&hl=en&biw=1280&bih=746&um=1&ie=UTF-8&q=solicitors+peterborough&fb=1&gl=uk&hq=solicitors&hnear=Peterborough&cid=15462276534218036938&ei=hsGRTeHTC5yqhAeMwuSYDw&sa=X&oi=local_result&ct=placepage-link&resnum=5&ved=0CFQQ4gkwBA

http://maps.google.co.uk/maps/place?client=firefox-a&rls=org.mozilla:en-GB:official&channel=s&hl=en&biw=1280&bih=746&um=1&ie=UTF-8&q=solicitors+peterborough&fb=1&gl=uk&hq=solicitors&hnear=Peterborough&cid=14080472938216458285&ei=hsGRTeHTC5yqhAeMwuSYDw&sa=X&oi=local_result&ct=placepage-link&resnum=2&ved=0CC8Q4gkwAQ

 

Bot attached

 

thanks

Gmaps.ubot

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...