Jump to content
UBot Underground

Scraping A Crazy Amount Of Data


Recommended Posts

I have never gotten Ubot to scrape beyond a certain point. It seems once I hit around 42,000 entries, the whole thing collapses. I just had this happen twice on the same site. I'm guessing I'm running out of memory. At this point I'm using 16 GIG, will doubling my memory help?

 

I've recently been grabbing followers on a few websites that require you keep loading a new batch of users as you scroll down the page. (using the Javascript load command) There's no way of stopping, saving and continuing beyond a certain point, it's just offering me an endless list.

 

As an example: The Spotify Twitter account has 2.5 million followers. How the hell would I scrape 2.5 million entries with Ubot? Any other places/services that could do this?

Link to post
Share on other sites

Have you tried using this plugin? http://network.ubotstudio.com/forum/index.php/topic/16308-free-plugin-large-data/

You can also take a look at this plugin as well: http://network.ubotstudio.com/forum/index.php/topic/13088-ubot-xml-plugin-ubot-discount/

 

You can also scrape the data to a file and append a number to the file and clear the list from within ubot. Then load the files back in as needed.

Link to post
Share on other sites

 

From what I can see, these deal with data once you've acquired it. The problem is I need to hold 2.5 million entries in memory (before the scrape) before I can do anything with it. I can't parse the INPUT into manageable smaller sections.

 

Giganut, how many Twitter followers can you scrape at one time?

Link to post
Share on other sites

At this point I'm using 16 GIG, will doubling my memory help?

 

No, Ubot is 32 bit and so there is a limit to how much memory it can use, I believe its 2GB max.

 

But try the large data plugin like Giganut suggested or maybe try saving the data as you go into a database or something.

  • Like 1
Link to post
Share on other sites
clear list(%followers)
navigate("https://soundcloud.com/random-house-audio/followers","Wait")
wait for browser event("Everything Loaded","")
loop(9999) {
    add list to list(%followers,$scrape attribute(<class="userBadgeListItem__heading sc-type-small sc-link-dark sc-truncate">,"href"),"Delete","Global")
    run javascript("window.setTimeout(function() \{
window.scrollTo(0, document.body.scrollHeight)
 \}, 500)")
    wait(3)
}
save to file("C:\\Users\\Public\\Ubot\\Soundcloud\\SCRAPED USERS.txt",%followers)

I'm basically do this. Each javascript page load gives me 25 new profiles at the end of the column. Once I hit 42,000+ (aprox loop #1680) - end of game, the system locks up.

 

What I would like to do is save, then somehow delete everything from memory up to 42,000, then continue - but I can't, it's one single long page of results. From what I remember, Twitter does the same thing. What I've done is move the SAVE TO FILE command back on each loop, so I catch everything before the crash. But I'm still stuck at 42,000.

 

Elsewhere, I HAVE scraped long sequences where it loads page 1, 2, 3 etc. I save a bunch, create a new browser and reset, then continue. Not here.

Link to post
Share on other sites

So what other software can I learn/use to do this? (that won't crash)

 

or

 

I would really love to have this scraped: https://soundcloud.com/harperaudio_us/followers

And maybe this too: https://soundcloud.com/audible/followers

 

Anyone willing to run my code on their machine? Does anyone know of anyone who could/would do this? How much do you/they need?

Link to post
Share on other sites
  • 11 months later...

Alternative to make a custom ubot to follow Spotifys followers on a daily a max between 1200-1500 i believe is the api limit unless youre a verified twitter account you can follow up to 10,000 followers a day. I followed over 2 million on my @mixmastaking account doing so great way to build youre twitter account.

Link to post
Share on other sites
  • 4 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...