Jump to content
UBot Underground

[Solved}Scrape data challenge. Who can crack this one? Only for the Brave!!


Recommended Posts

Hi Gurus, Need to scratch your brains!!

Scraping data from G+

 

To get to see this data

Goto G+ and click on circles and then do a search on someone's name.

Click on their profile and the click on view all under the persons circles on the left navigation.

 

The ajax pop up loads with all the data.

 

What I have noticed is the data is exposed only as the css style top: 0px; changes

 

example

 

<div class="RP PY" style="top: 0px;"> shows first 18 names

<div class="RP PY" style="top: 68px;"> shows next lot

<div class="RP PY" style="top: 204px;"> shows next lot

 

We have increments of 68px which expose the next 18 names.

Very clever way of hiding the data.

 

How do you scrape this data?

Do you use the change attribute function?

 

Or is there a way to scroll down the scroll bar to expose the data.

 

Anyone know how to do this?

 

Regards

 

Matt

Link to post
Share on other sites

Hi Guys

here is the code i want to scrape

<a oid="116663504982888286028" href="./116663504982888286029">John Wright</a>

I am after

 

the url of the href ="./116663504982888286029". We want ./116663504982888286029

The inner text name example "John Wright"

 

So if the list contains an 1000 names I want to scrape them

Link to post
Share on other sites

Hi Guys

 

How do you change the html to make it load the next set of data

 

<div class="RP PY" style="top: 0px;"> shows first 18 names

<div class="RP PY" style="top: 68px;"> shows next lot

<div class="RP PY" style="top: 204px;"> shows next lot

 

So when we look at the html you get

<div class="RP PY" style="top: 0px;">

<div class="RP PY" style="top: 0px;">

<div class="RP PY" style="top: 0px;">

<div class="RP PY" style="top: 0px;">

<div class="RP PY" style="top: 0px;">

 

As you scroll down the scrol bar the html changes to

<div class="RP PY" style="top: 68px;">

<div class="RP PY" style="top: 68px;">

<div class="RP PY" style="top: 68px;">

<div class="RP PY" style="top: 68px;">

<div class="RP PY" style="top: 68px;">

<div class="RP PY" style="top: 68px;">

 

And so forth until you get the last entry thats class="RP PY c-fba"

 

The name is nested within the div tags.

 

Would you use a loop to do this?

I was thinking of scraping the total number of people in a group the dividing it by 18 as this is the amount you have per style="top: 0px". That would give use the number of loops required.

 

How do you get the div to change. would it be a change attribute?

Link to post
Share on other sites

use the focus option to focus on the footer copyright, which will page down to the bottom for you.

 

use a loop to continue focusing on the footer every say 2 seconds or however long for the jquery to load, then scrape it all at once using the class as the identifier, and scrape inner text or whatever it is you need.

 

 

 

also this is much easier to provide examples if we have the actual page your referring to.

 

you can change attributes on a page, but why would you need to? its only a visual thing, and im not even sure you would need to do that.

Link to post
Share on other sites

Hi LoWrIdErTJ

 

You need to be logged into G+

 

See https://plus.google.com/u/0/111294201325870406922/posts

click on the "view all" under "Have Rand in circles" on left hand side.

 

A div box will load

 

Its not the primary page but the pop up box that holds the names we are after.

I don't see any footer data in the box to focus on.

Link to post
Share on other sites

clear list(%names)
loop(25) {
   focus(<class=w"c-vU *">)
   add list to list(%names, $scrape attribute(<oid=w"*">, "innertext"), "Delete", "Global")
}

 

 

Careful how many loop cycles you use, you'll freeze it up like i did.

 

But seems Google Plus limits the people in the circle they show to 1000

 

Example Running: http://screencast.com/t/hWnme09yaPD

Link to post
Share on other sites
  • 1 year later...

i have to say I'm having the same issue but can't get it to work. i need to scroll down all the values and scrape the href of them is there any issue way to do this. 

But i was thinking is there a way we could add to circles from the same page . for example we take 

https://plus.google.com/u/0/people/school/New+York+City+College+of+Technology

 and we go one by one in the same page with out having to visit every members page?

If you guys could guide me on how to do that that will be awesome.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...