Netflix genre scraping

ipattis · March 9, 2014

Hi All,

I am new to Ubot and am I do not having a programming background.

I've taken an interest in the work done here with the UBOT platform and Netflix content. I am looking to recreate and expand on it in a few directions.

I have made it this far (attached)netflixGenres.ubot, but I am now in need of a few pointers. Grateful for anything anyone can contribute!

How can I go about the following:

1: isolate a numeric string within a URL and then enforce an increment to make that URL change (e.g., WiAltGenre?agid=1, WiAltGenre?agid=2, WiAltGenre?agid=3, WiAltGenre?agid=4) (can I place a loop that does this in the navigate function until there are no further values that produce a URL?)

2: scrape the name associated with each Netflix genre id as the increment advances (I should just place the above loop in my 'add list to list', no?)

Alternatively, I guess I could just make a file with concatenated columns in excel that creates each incrementally different URL and then import the file and navigate to each of those URLs. Unsure of precisely how to go about this method either.

Thanks for your help!

Edited March 9, 2014 by ipattis

blumi40 · March 9, 2014

would like to help u but im from germany and here the service is not reachable
so pls scrape the list and send it as textfile here

i thing will not a big issue to do

ds062692 · March 9, 2014

This should help.

navigate("http://www.netflix.com", "Wait")
click(<login link>, "Left Click", "No")
type text(<email field>, #username, "Standard")
type text(<password field>, #password, "Standard")
click(<login button>, "Left Click", "No")
wait for browser event("Everything Loaded", "")
wait(2)
click(<ng-click="switchProfile(profile)">, "Left Click", "No")
set(#inc, 1, "Global")
clear list(%scraped URLs)
loop while($true) {
    set(#url, "http://movies.netflix.com/WiAltGenre?agid={#inc}", "Global")
    navigate(#url, "Wait")
    wait(2)
    if($exists(<class="err-empty">)) {
        then {
        }
        else {
            add item to list(%scraped URLs, $scrape attribute($element offset(<tagname="a">, 60), "innertext"), "Delete", "Global")
        }
    }
    increment(#inc)
}

Edited March 9, 2014 by ds062692

ipattis · March 9, 2014

Thanks so much, ds062692 ! That is working perfectly!

Edit: this script now scrapes the text from categories that currently have an assortment (which is relative to an individual account). Watching the debugger as the script progresses, I can see that it skips category names where no assortment is present.

If there are no videos presented in the category, it does not scrape the title and place it in the list. I'm guessing the lack of video assortment causes the 'if exisits' clause to register as does not exisit. Is there a way to focus the if clause solely on the element offset field that i want to scrape?

Edited March 9, 2014 by ipattis

ipattis · March 9, 2014

bump edit

ds062692 · March 9, 2014

Just replace the <class="err-empty"> with $element offset(<tagname="a">, 60)

ipattis · March 9, 2014

Thanks for the ultra-prompt pointers, ds062692.

Edited March 10, 2014 by ipattis

Sign In

Netflix genre scraping

Recommended Posts

ipattis 0

Link to post

Share on other sites

blumi40 222

Link to post

Share on other sites

ds062692 19

Link to post

Share on other sites

ipattis 0

Link to post

Share on other sites

ipattis 0

Link to post

Share on other sites

ds062692 19

Link to post

Share on other sites

ipattis 0

Link to post

Share on other sites

Join the conversation

Browse

Activity