Jump to content
UBot Underground

cant scrap "href" cause they are dynamically fetched


Recommended Posts

hie to all...

well in a bot i am trying to fetch all the hyperlinks of search results...

but all the links fetched are same, what one of my friend told me that these href links are dynamically fetched from database when ON-click event happens...

 

here is the link check yourself...

and put anything in the search box say "TV".

then after that all resultset i get, am not able to fetch their href s what i get is all same links...

http://delhi.justdial.com/

 

please help

Link to post
Share on other sites

I just took a look, and I'm not sure what it is you're trying to do.

 

After searching "TV", I clicked on a listing. It didn't take me to any website, it just took me to a more comprehensive listing of the link I clicked on.

 

Here is an example. The following was one of my search results:

 

Tanya Electronics   1 review 

Location - Y-348,C/2,Sec-12, Subzi Mandi, Noida, Noida - 201301  

Call  - +(91)-(11)-66361349  

Also See  - Tv Dealers, AC Dealers-LG, AC Repair & Services 

 

After clicking it, this is what it showed me:

 

 Mr Sunil Chauhan  

 +(91)-(11)-66361349  

 +(91)-9310084459 

 Send Enquiry By Email  

 Y-348,C/2,Sec-12, Subzi Mandi, Noida, Noida - 201301 

 

So what are you trying to get from this site? If you're looking to just get the search results themselves, you could scrape the names of the search results.

 

If you're trying to get to each listings "more comprehensive" listings, you can do this:

 

(decided it was too difficult to explain. Just download the attached ubot.) Inside of the ubot, navigate to the page you mention in your original post, search for something, and then run the bot.

 

What it does is scrape the javascript commands for "onclick" for each site, then navigates to each of them individually (in essence, the bot is "clicking" on each and every listing individually)

searchjustdialcom.ubot

Link to post
Share on other sites

I just took a look, and I'm not sure what it is you're trying to do.

 

After searching "TV", I clicked on a listing. It didn't take me to any website, it just took me to a more comprehensive listing of the link I clicked on.

 

Here is an example. The following was one of my search results:

 

Tanya Electronics   1 review 

Location - Y-348,C/2,Sec-12, Subzi Mandi, Noida, Noida - 201301  

Call  - +(91)-(11)-66361349  

Also See  - Tv Dealers, AC Dealers-LG, AC Repair & Services 

 

After clicking it, this is what it showed me:

 

 Mr Sunil Chauhan  

 +(91)-(11)-66361349  

 +(91)-9310084459 

 Send Enquiry By Email  

 Y-348,C/2,Sec-12, Subzi Mandi, Noida, Noida - 201301 

 

So what are you trying to get from this site? If you're looking to just get the search results themselves, you could scrape the names of the search results.

 

If you're trying to get to each listings "more comprehensive" listings, you can do this:

 

(decided it was too difficult to explain. Just download the attached ubot.) Inside of the ubot, navigate to the page you mention in your original post, search for something, and then run the bot.

 

What it does is scrape the javascript commands for "onclick" for each site, then navigates to each of them individually (in essence, the bot is "clicking" on each and every listing individually)

yes you got it right...

after we search "TV"...in the result i want go inside each link and scrap one by one all comprehensive listings...

but if you don mind can u explain the bot u attached a little bit ...?

especially why there is a delay of 30 seconds...?

Link to post
Share on other sites

That "delay 30 seconds" was actually supposed to be a "stop script" command, as it was only set up to navigate to one of the listings. I basically just threw it together as an example.

 

I've attached a fully functional version to this post, and there is a picture below of the changes.

 

Here is a basic rundown of what the bot is doing:

 

"Choose by attribute" -> That is selecting only the listing titles of the search results.

 

"Scrape Page" -> This was actually supposed to be "scrape chosen attribute", but I ended up just using a "scrape page" command because I realized I needed to scrape javascript actions and not URL's.

 

So, the "scrape page" command is scraping the javascript action for each listing title. Normally when you click on something, you're navigating to a URL. But these listings are all using "onclick" javascript commands, so you need to scrape all of those and then run a "javascript" of the "onclick" commands you've scraped.

 

Here are the changes I made (compare to your current bot):

 

http://img188.imageshack.us/img188/7434/javascript.gif

 

The "set: #javascript" command is taking the next "onclick" javascript command that we scraped with the page scrape command above, and is setting it to a variable for use later.

 

Then, "run: #javascript" is running that "onclick" command, which is essentially causing the browser to think you clicked on a listing.

 

Then a "wait finish" command...waits until the page is finished loading.

 

Then a "run javascript: history.go(-1)" <- That makes the browser go "back" 1 page (just as if you hit the back button on the browser). The reason we need to go back, is because the next "run javascipt: #javascript" command won't work unless we are at the page that we scraped that value from originally.

 

Then another "wait finish" command.

 

Then the whole things loops all over again.

 

If you're looking to scrape the "comprehensive listings" page, you're going to need to insert all the nodes that scrape that listing right after the first "wait finish" command, and have the very last two nodes that are currently there, as the last two nodes after whatever you insert.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...