HelloInsomnia 1103 Posted July 12, 2018 Report Share Posted July 12, 2018 In this tutorial we are going to build an Instagram hashtag scraper. The idea is to enter a hashtag and then have the scraper find other hashtags which appear in the descriptions of the photos. The scraper will count the number of times each hashtag appears. You can then take the top related hashtags and use them for scraping users, posts, etc. First we need a query, mine is going to be lunch: https://www.instagram.com/explore/tags/lunch/ We can accept a hashtag using a ui text box, and then when we run the program we can remove the spaces by replacing the with nothing. From what I can tell, IG does not like using spaces, periods, etc in the tag urls, and users probably don’t use underscores if they don’t have to. ui text box("Hashtag",#hashtag) set(#inputHashtag,$replace(#hashtag," ",""),"Global") We can navigate to the tag page, and then get more results. IG uses lazy loading so we can use a bit of javascript to scroll to the bottom of the page in order to load more results. That is a handy bit of JS to save somewhere or just Google it when you need it. navigate("https://www.instagram.com/explore/tags/{#inputHashtag}","Wait") loop(10) { run javascript("window.scrollTo(0,document.body.scrollHeight);") wait(2) wait for browser event("Everything Loaded","") }Now we need to scrape the post descriptions. It seems that IG stores these in the alt tag of the image. We only care about descriptions which contain a hashtag, so we can target that by using a wildcard. clear list(%descriptions) add list to list(%descriptions,$scrape attribute(<alt=w"*#*">,"alt"),"Delete","Global") The next step is to get the hashtags out of all of the descriptions and into a list. There are several things we need to consider when we do this so let’s break it down bit by bit. To actually grab the hashtags we can use a regular expression. I played around with this a little bit before deciding to use this regex: \#\w+Basically, that says that it should find something that starts with a # and then grab all word characters after that. So that could be letters, numbers, underscores but not things like periods, hashtags, spaces, new lines, etc. Next, we need to ensure that we change the text casing to be all lowercase because we don’t want duplicate entries like #food and #Food, we can use the change text casing function for this job. Finally, we need to be sure that we are not deleting duplicates in this list. We want to be able to count the number of times each hashtag shows up, and if we remove duplicates then we won’t be able to do this. So this is how we end up grabbing the hashtags, changing them to be all lowercase and allowing duplicates in the list. clear list(%hashtags) add list to list(%hashtags,$find regular expression($change text casing(%descriptions,"Lower Case"),"\\#\\w+"),"Don\'t Delete","Global") We need to count all of the hashtags in the list and there are a variety of ways to do this. I wanted to keep this super simple, only use free plugins if necessary - and no bot bank. So I decided that the best way to do this was the following: Create a table to store the hashtag and number of occurrences Check to see if the hashtag was in the table If the hashtag is in the table, skip it If the hashtag is not in the table, count the number of occurrences We need to do a bit of setup before the loop, because we are working with a table we need to clear it and set a row variable. clear table(&hashtagPopularity) set(#row,0,"Global")Next we loop for the list total of hashtags: loop($list total(%hashtags)) { }And this part is the code that goes inside of the loop. We set a variable to be the next hashtag in the list, this allows us to use the variable in multiple places without calling next list item more than once. set(#nextHashtag,$next list item(%hashtags),"Global")Next we check to see if the hashtag is in the table, we can use the Table Command plugin which comes with Ubot to help us get a list of all the items in column 0 (the first column which contains all the hashtags). set(#hashtagExists,$find regular expression($plugin function("TableCommands.dll", "$list from table", &hashtagPopularity, "Column", 0),"{#nextHashtag}(?=\\W|$)"),"Global")We are using another regular expression and this time it’s a bit different. It starts off with the #nextHashtag variable - so this could be #food for example. Then we want to be sure that there is a non word character after the hashtag - or it's the end of the list. That is what (?=\W|$) means. The reason we want this is because we want #food to match #food and only #food. We don’t want it to match #foodfriday or something like that. So we need to ensure that there is some space or something after the hashtag. That is what \W means. And we also check for the end of the list by using $. We can now run a check to see if the hashtag already exists in the table by dropping this comparison into an if command: $comparison(#hashtagExists,"=","")Basically if the hashtag = nothing then it's not found and so we can go on to count the number of occurrences it has in the list - otherwise, we already did that for this hashtag and so we can skip it. At this point you may be wondering why we don’t just remove the hashtags that we have already counted. And the reason is because we can just do a simple check instead. This can be much faster than some other methods which may involve list manipulation. Inside of our if statement (in the then command) we can go ahead and use another list to easily count the number of occurrences of our hashtag. So we clear a list and reuse the same regex as before: clear list(%hashtag) add list to list(%hashtag,$find regular expression(%hashtags,"{#nextHashtag}(?=\\W|$)"),"Don\'t Delete","Global")Oh and don’t delete duplicates of course or all your hashtags will have a count of 1 Now that the hard part is over, we just need to simply add our hashtag and number of occurrences to the table. set table cell(&hashtagPopularity,#row,0,#nextHashtag) set table cell(&hashtagPopularity,#row,1,$list total(%hashtag)) Oh and don’t forget to increment the row variable after so that on the next loop we won’t overwrite the same row data. increment(#row)And that is pretty much it. We can go ahead and save this as a CSV file to our desktop for now, this is done outside of the loop so we don’t save on each iteration of the loop. save to file("{$special folder("Desktop")}\\hashtags.csv",&hashtagPopularity)I did a few test runs and here were the results for the top 10 of each (excluding the input hashtag). Lunch #food - 22#instafood - 15#yummy - 14#dinner - 14#tasty - 13#delicious - 12#foodporn - 11#fresh - 9#foodie - 9#breakfast - 9 Dog #puppy - 13#dogsofinstagram - 11#love - 9#cute - 9#dogs - 8#instagood - 6#instadog - 6#pet - 5#corgifeed - 4#corgiaddict - 4 Travel #photography - 12#love - 10#nature - 9#adventure - 8#photooftheday - 8#travelgram - 7#travelphotography - 6#fun - 6#happy - 6#travelling - 6 There are loads of ways to improve this basic example or build upon it. Here’s the full code: ui text box("Hashtag",#hashtag) set(#inputHashtag,$replace(#hashtag," ",""),"Global") navigate("https://www.instagram.com/explore/tags/{#inputHashtag}","Wait") loop(10) { run javascript("window.scrollTo(0,document.body.scrollHeight);") wait(2) wait for browser event("Everything Loaded","") } clear list(%descriptions) add list to list(%descriptions,$scrape attribute(<alt=w"*#*">,"alt"),"Delete","Global") clear list(%hashtags) add list to list(%hashtags,$find regular expression($change text casing(%descriptions,"Lower Case"),"\\#\\w+"),"Don\'t Delete","Global") clear table(&hashtagPopularity) set(#row,0,"Global") loop($list total(%hashtags)) { set(#nextHashtag,$next list item(%hashtags),"Global") set(#hashtagExists,$find regular expression($plugin function("TableCommands.dll", "$list from table", &hashtagPopularity, "Column", 0),"{#nextHashtag}(?=\\W|$)"),"Global") if($comparison(#hashtagExists,"=","")) { then { clear list(%hashtag) add list to list(%hashtag,$find regular expression(%hashtags,"{#nextHashtag}(?=\\W|$)"),"Don\'t Delete","Global") set table cell(&hashtagPopularity,#row,0,#nextHashtag) set table cell(&hashtagPopularity,#row,1,$list total(%hashtag)) increment(#row) } else { } } } save to file("{$special folder("Desktop")}\\hashtags.csv",&hashtagPopularity) 8 Quote Link to post Share on other sites
BigEfromDaBX 25 Posted July 13, 2018 Report Share Posted July 13, 2018 You Rock Quote Link to post Share on other sites
HelloInsomnia 1103 Posted July 14, 2018 Author Report Share Posted July 14, 2018 If you guys want to see anything else like this feel free to suggest. Quote Link to post Share on other sites
cob007 19 Posted August 17, 2018 Report Share Posted August 17, 2018 thanks nick, we nee to be logged in first to instagram for this to work right? Quote Link to post Share on other sites
daverawcus 6 Posted September 11, 2018 Report Share Posted September 11, 2018 If you could enter say 10 top hashtags and it returns the total post volume then get 25 to 50 hashtags related to each top 10 ten hashtags and also return there post volume that would be cool so you have a master list of 250 to 500 hastags (dependant on niche size) and there post volume 1 Quote Link to post Share on other sites
HelloInsomnia 1103 Posted September 12, 2018 Author Report Share Posted September 12, 2018 thanks nick, we nee to be logged in first to instagram for this to work right? I don't think I was logged in when I did this. If you could enter say 10 top hashtags and it returns the total post volume then get 25 to 50 hashtags related to each top 10 ten hashtags and also return there post volume that would be cool so you have a master list of 250 to 500 hastags (dependant on niche size) and there post volume Sounds like a cool idea Quote Link to post Share on other sites
daverawcus 6 Posted October 3, 2018 Report Share Posted October 3, 2018 Stuna is selling this for $30 lol Quote Link to post Share on other sites
Vladislav 2 Posted January 5, 2019 Report Share Posted January 5, 2019 Its not working on my end but anyways - many thanks for the share, it is always useful to see how PROs do the things!Cheers! Quote Link to post Share on other sites
HelloInsomnia 1103 Posted January 5, 2019 Author Report Share Posted January 5, 2019 Its not working on my end but anyways - many thanks for the share, it is always useful to see how PROs do the things!Cheers! I'm sure by now IG changed something, but you can still follow the steps and when you see something different update that part 1 Quote Link to post Share on other sites
hare ram 23 Posted September 30, 2019 Report Share Posted September 30, 2019 In this code i found some error that can't scrape description if you can modify sir this helps us too much to learn.. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.