Jump to content
UBot Underground

[Help] Some Guidance On Scraping Tweets And Filtering That List?


Recommended Posts

Hello UBot community!

 

I've been using for about 3 weeks now, and I'm more excited every day. So far, I've managed to put together some little bots that are actually helping me, and I'm thrilled about it, and what I think I'll be able to do in the future.

 

Now, I'm starting my first scraping venture and I feel a little lost. I've been searching/googling/youtubing, but I can't figure out which commands, or set there of, that I need to be using. If someone could guide me in the right direction, I'd really appreciate it. I'm not expecting anyone to spell it all out for me, just point me in the right direction and I'm eager to put the work in.

 

Here's what I've done so far:

I can log into twitter, find a user who's tweets I want to scrape, scroll down as far as I'd like depending on how many tweets I want....so far so good....

 

I'm scraping with  SCRAPE ATTRIBUTE / ADD LIST TO LIST. I can scrape the whole tweet and put it in a list. Example below.... 

 

"Lisa @lasergirl70 1hr1 hour ago

My emokis are all over the place today.

 

2 retweets 9 likes

Reply Retweet 2

Like 9

More"

 

Here's where I'm stuck:

Now I need to trim down the list items, and separate the data into a table for sorting. I need to be able to sort by number of likes and or retweets, whether it was a retweet itself, etc. 

 

 

Could someone please steer me towards how to separate something like that into different parts? To make matters (slightly?) more complicated, sometimes there's an extra line on top that indicates it was a retweet, and sometimes the tweet itself might take up two lines. Of course, I still don't know how to separate a list item even if those things didn't change.

 

I'd really appreciate any direction, and I'm sure that I'll be able to apply the lesson to so much more stuff in the future.

 

Thanks!

Edited by Sebastian Rooks
Link to post
Share on other sites

Ideally you would separate the data at the scrape itself and not after the fact when its just in plain text like that. So while you are scraping it should be pulling the data separately and into the proper table columns. I don't really use Twitter but if you can post the code of one of those tweets I or somebody else will likely be able to help.

Link to post
Share on other sites

You've already helped me tremendously by telling me to focus on scraping separately instead of trying to separate afterwards. I've spent a fair amount of time and frustration trying to do it the wrong way. 

 

Forgive me, if I copied too much. I'm sure that not knowing exactly which parts I need makes up a big part of my current problem.

<div class="tweet original-tweet js-original-tweet js-stream-tweet js-actionable-tweet js-profile-popup-actionable  






















" data-tweet-id="695704208687894530" data-disclosure-type="" data-item-id="695704208687894530" data-permalink-path="/Susanjmccann/status/695704208687894530" data-retweet-id="695704319836839936" data-screen-name="Susanjmccann" data-name="Susan McCann" data-user-id="313199716" data-expanded-footer="<div class="js-tweet-details-fixer tweet-details-fixer">

  <div class="js-machine-translated-tweet-container"></div>
    <div class="js-tweet-stats-container tweet-stats-container ">
    </div>

  <div class="client-and-actions">
  <span class="metadata">
    <span>12:23 p.m. - 5 Feb 2016</span>
       &middot; <a class="permalink-link js-permalink js-nav" href="/Susanjmccann/status/695704208687894530"  tabindex="-1">Details</a>
  </span>
</div>


</div>
" data-mentions="KnowledgeBishop" data-retweeter="MeemsterB" data-you-follow="false" data-follows-you="false" data-you-block="false">


    <div class="context">
      
          <div class="tweet-context with-icn
    
    ">

      <span class="Icon Icon--small Icon--retweeted"></span>






            <span class="js-retweet-text"><a class="pretty-link js-user-profile-link" href="/MeemsterB" data-user-id="2835431720"><b>MeemsterB</b></a> Retweeted</span>


      

    </div>


    </div>

    <div class="content">
      

      
      <div class="stream-item-header">
          <a class="account-group js-account-group js-action-profile js-user-profile-link js-nav" href="/Susanjmccann" data-user-id="313199716">
    <img class="avatar js-action-profile-avatar" src="https://pbs.twimg.com/profile_images/428117971139981312/9AWbrFsZ_bigger.jpeg" alt="">
    <strong class="fullname js-action-profile-name show-popup-with-id" data-aria-label-part="">Susan McCann</strong>
    <span>‏</span><span class="username js-action-profile-name" data-aria-label-part=""><s>@</s><b>Susanjmccann</b></span>
    
  </a>

        <small class="time">
  <a href="/Susanjmccann/status/695704208687894530" class="tweet-timestamp js-permalink js-nav js-tooltip" data-original-title="12:23 p.m. - 5 Feb 2016"><span class="_timestamp js-short-timestamp js-relative-timestamp" data-time="1454703783" data-time-ms="1454703783000" data-long-form="true" aria-hidden="true">1 day</span><span class="u-hiddenVisually" data-aria-label-part="last">1 day ago</span></a>
</small>

          
          
      </div>

      
        <p class="TweetTextSize TweetTextSize--16px js-tweet-text tweet-text" lang="en" data-aria-label-part="0">May your faith be unshakeable and your will, unbreakable. - <a href="/KnowledgeBishop" class="twitter-atreply pretty-link js-nav" dir="ltr" data-mentioned-user-id="117477071"><s>@</s><b>KnowledgeBishop</b></a> <a href="/hashtag/quote?src=hash" data-query-source="hashtag_click" class="twitter-hashtag pretty-link js-nav" dir="ltr"><s>#</s><b>quote</b></a></p>




      
        

      
      
  <div class="expanded-content js-tweet-details-dropdown">
    
      
  </div>


      
      <div class="stream-item-footer">
  

  
        <div class="ProfileTweet-actionCountList u-hiddenVisually">
    
    <span class="ProfileTweet-action--reply u-hiddenVisually"></span>
    <span class="ProfileTweet-action--retweet u-hiddenVisually">
      
      <span class="ProfileTweet-actionCount" data-tweet-stat-count="2">
        <span class="ProfileTweet-actionCountForAria" data-aria-label-part="">2 retweets</span>
      </span>
    </span>
    <span class="ProfileTweet-action--favorite u-hiddenVisually">
      <span class="ProfileTweet-actionCount" data-tweet-stat-count="2">
        <span class="ProfileTweet-actionCountForAria" data-aria-label-part="">2 likes</span>
      </span>
    </span>
  </div>

    <div class="ProfileTweet-actionList js-actions" role="group" aria-label="Tweet actions">
      <div class="ProfileTweet-action ProfileTweet-action--reply">
  <button class="ProfileTweet-actionButton u-textUserColorHover js-actionButton js-actionReply" data-modal="ProfileTweet-reply" type="button">
    <div class="IconContainer js-tooltip" title="Reply">
      <span class="Icon Icon--reply"></span>
      <span class="u-hiddenVisually">Reply</span>
    </div>
  </button>
</div>
      <div class="ProfileTweet-action ProfileTweet-action--retweet js-toggleState js-toggleRt">
  <button class="ProfileTweet-actionButton  js-actionButton js-actionRetweet" data-modal="ProfileTweet-retweet" type="button">
    <div class="IconContainer js-tooltip" title="Retweet">
      <span class="Icon Icon--retweet"></span>
      <span class="u-hiddenVisually">Retweet</span>
    </div>
      <div class="IconTextContainer">
        <span class="ProfileTweet-actionCount">
          <span class="ProfileTweet-actionCountForPresentation" aria-hidden="true">2</span>
        </span>
      </div>
  </button><button class="ProfileTweet-actionButtonUndo js-actionButton js-actionRetweet" data-modal="ProfileTweet-retweet" type="button">
    <div class="IconContainer js-tooltip" title="Undo retweet">
      <span class="Icon Icon--retweet"></span>
      <span class="u-hiddenVisually">Retweeted</span>
    </div>
      <div class="IconTextContainer">
        <span class="ProfileTweet-actionCount">
          <span class="ProfileTweet-actionCountForPresentation" aria-hidden="true">2</span>
        </span>
      </div>
  </button>
</div>
      <div class="ProfileTweet-action ProfileTweet-action--favorite js-toggleState">
  <button class="ProfileTweet-actionButton js-actionButton js-actionFavorite" type="button">
    <div class="IconContainer js-tooltip" title="Like">
      <div class="HeartAnimationContainer">
        <div class="HeartAnimation"></div>
      </div>
      <span class="u-hiddenVisually">Like</span>
    </div>
      <div class="IconTextContainer">
        <span class="ProfileTweet-actionCount">
          <span class="ProfileTweet-actionCountForPresentation" aria-hidden="true">2</span>
        </span>
      </div>
  </button><button class="ProfileTweet-actionButtonUndo u-linkClean js-actionButton js-actionFavorite" type="button">
    <div class="IconContainer js-tooltip" title="Undo like">
      <div class="HeartAnimationContainer">
        <div class="HeartAnimation"></div>
      </div>
      <span class="u-hiddenVisually">Liked</span>
    </div>
      <div class="IconTextContainer">
        <span class="ProfileTweet-actionCount">
          <span class="ProfileTweet-actionCountForPresentation" aria-hidden="true">2</span>
        </span>
      </div>
  </button>
</div>
      

        <div class="ProfileTweet-action ProfileTweet-action--more js-more-ProfileTweet-actions">
    <div class="dropdown">
  <button class="ProfileTweet-actionButton u-textUserColorHover dropdown-toggle js-dropdown-toggle" type="button">
      <div class="IconContainer js-tooltip" title="More">
        <span class="Icon Icon--dots"></span>
        <span class="u-hiddenVisually">More</span>
      </div>
  </button>
  <div class="dropdown-menu">
  <div class="dropdown-caret">
    <div class="caret-outer"></div>
    <div class="caret-inner"></div>
  </div>
  <ul>
      <li class="share-via-dm js-actionShareViaDM" data-nav="share_tweet_dm">
        <button type="button" class="dropdown-link">Share via Direct Message</button>
      </li>
    
      <li class="copy-link-to-tweet js-actionCopyLinkToTweet">
        <button type="button" class="dropdown-link">Copy link to Tweet</button>
      </li>
      <li class="embed-link js-actionEmbedTweet" data-nav="embed_tweet">
        <button type="button" class="dropdown-link">Embed Tweet</button>
      </li>
          <li class="mute-user-item pretty-link"><button type="button" class="dropdown-link">Mute</button></li>
  <li class="unmute-user-item pretty-link"><button type="button" class="dropdown-link">Unmute</button></li>

        <li class="block-link js-actionBlock" data-nav="block">
          <button type="button" class="dropdown-link">Block</button>
        </li>
        <li class="unblock-link js-actionUnblock" data-nav="unblock">
          <button type="button" class="dropdown-link">Unblock</button>
        </li>
        <li class="report-link js-actionReport" data-nav="report">
          <button type="button" class="dropdown-link">
            
            Report
          </button>
        </li>
  </ul>
</div>

</div>

  </div>

    </div>

</div>
  



      
      

    </div>
  </div>
Link to post
Share on other sites

Well over 12 hours into this particular problem and all I've been able to accomplish is scraping tweets with no additional data:

add list to table as column(&tweettable,0,0,$scrape attribute(<outerhtml=w"<p class=\"TweetTextSize TweetTextSize--16px js-tweet-text tweet-text\" lang=\"en\" data-aria-label-part=\"0\">*</p>">,"innertext"))

or 

add list to table as column(&tweettable,0,1,$scrape attribute(<data-tweet-id=w"*">,"innertext"))

that returns everything together leaving me with the problem of separation. I haven't been able to reliably scrape any one thing separately, other than the tweet itself. 

 

I just want to grab number of likes and retweets, along with whether or not the tweet itself was retweeted and put it all into a table.

 

Someone please put me out of my misery.

Link to post
Share on other sites

Here's what I've got so far. It's sloppy, I'm sure there must be a better way.

add list to list(%scrapedtweets,$scrape attribute(<(outerhtml=w"<p class=\"TweetTextSize TweetTextSize--*6px js-tweet-text tweet-text\" lang=\"en\" data-aria-label-part=\"0\">*</p>" OR outerhtml=w"<a class=\"QuoteTweet-link js-nav\"*")>,"innertext"),"Delete","Global")
wait(3)
add list to list(%likes,$scrape attribute(<innertext=w"Like *">,"innertext"),"Don\'t Delete","Global")
wait(3)
add list to list(%retweets,$scrape attribute(<innertext=w"Retweet  *">,"innertext"),"Don\'t Delete","Global")
set(#delete,1,"Global")
wait(3)
loop($list total(%retweets)) {
    if($comparison(#delete,"=",1)) {
        then {
            add item to list(%retweetcleaned,$next list item(%retweets),"Don\'t Delete","Global")
            set(#delete,0,"Global")
        }
        else {
            add item to list(%retweetduplicate,$next list item(%retweets),"Don\'t Delete","Global")
            set(#delete,1,"Global")
        }
    }
}
wait(3)
set list position(%retweetcleaned,0)
set list position(%scrapedtweets,0)
loop($list total(%scrapedtweets)) {
    if($comparison($next list item(%retweetcleaned),"=",$next list item(%scrapedtweets))) {
        then {
            remove from list(%retweetcleaned,$next list item(%retweetcleaned))
        }
    }
}

The only way I could get the amount of retweets to scrape, scraped double. So I separated into other lists. Also if a tweet starts with the word "like" or "retweet" it scrapes the tweet into those columns also, which screws up the list synchronization. So that's what those other two loops are about.

wait(3)
set list position(%scrapedtweets,0)
set list position(%likes,0)
loop($list total(%scrapedtweets)) {
    if($comparison($next list item(%likes),"=",$next list item(%scrapedtweets))) {
        then {
            remove from list(%likes,$next list item(%likes))
        }
    }
}
wait(3)
add list to table as column(&tweettable,0,0,%scrapedtweets)
wait(3)
add list to table as column(&tweettable,0,1,%retweetcleaned)
wait(3)
add list to table as column(&tweettable,0,2,%likes)

This seems awfully crude, and more importantly, it's still not able to distinguish between an original tweet and a retweet, and it doesn't recognize if a tweet had a picture attached. But it's my best effort so far.

 

I'd still, really, really appreciate any guidance anyone can offer me. This is just the beginning of the scraping I want to do, and I want to make sure I learn best practices to apply to the next one, once I finally get this working right.

 

Is this a job better suited to regular expressions? 

 

Thanks

Link to post
Share on other sites

Ok, still working on this thing. I couldn't get separate scrapes to match up data consistently over differing circumstances. So without knowing what else to do, I turned towards regular expressions (which I know even less about)

 

After googling, watching tutorials, etc, I came up with three regular expressions that highlight what I want in EditPad Pro, but don't work right when I try to implement them in my UBot code. It would sure be swell if someone could take a look at this and help me out. It really would.

 

Let's start with the format of the list item I'm scraping things from:

Tartlandia Retweeted
Stabbatha Christy @LoveNLunchmeat Feb 5
I don't understand the gravity of the situation because I don't understand physics.
162 retweets 272 likes
Reply Retweet 162
Like 272
More

Every list item looks like that, with the following exceptions:

 

1.Sometimes the first line isn't present

2.Sometimes the tweet portion takes up more than one line, or includes more than one line of space.

3.Sometimes the bottom two numbers (and only the bottom two, not the two that are in the same line together, end with "K")

4.Sometimes the numbers have fewer digits, or more digits, or are absent entirely.

5.Sometimes the numbers I want, the ones on the 4th line from the bottom, may contain a comma.

 

Here's the Regex that I used to highlight the tweet portion:

(?<=@.{1,100})(\s\n.*\s\n?.*\s\n.*\s\n.*\s\n.*)(?=\s\n\d{0,6}\s+retweets\s+\d{0,}\s+likes\s+\nReply\s+Retweet\s+\d{0,}\w{0,1})

In EditPadPro, it works with 1-5 lines of tweet text and or space. In my Bot it populates a column with blank spaces when used like this:

set(#count,0,"Global")
set list position(%scrapedtweets,0)
loop($list total(%scrapedtweets)) {
    set table cell(&tweettable,#count,0,$find regular expression($next list item(%scrapedtweets),"(?<=@.\{1,100\})(\\s\\n.*\\s\\n?.*\\s\\n.*\\s\\n.*\\s\\n.*)(?=\\s\\n\\d\{0,6\}\\s+retweets\\s+\\d\{0,\}\\s+likes\\s+\\nReply\\s+Retweet\\s+\\d\{0,\}\\w\{0,1\})"))
    increment(#count)
}

To pull the number of likes from the top line of likes (though I realize now it probably won't work if there's a comma in that number), this works in EditPad:

(?<=\d{0,}\s+retweets\s+)\d{0,}(.*)(?=\s+likes\s+\nReply\s+Retweet\s+\d{0,}\w{0,1})

This is what it looks like in my bot. Just like with the tweets, I get an empty column.

set(#count,0,"Global")
set list position(%scrapedtweets,0)
loop($list total(%scrapedtweets)) {
    set table cell(&tweettable,#count,2,$find regular expression($next list item(%scrapedtweets),"(?<=\\d\{0,\}\\s+retweets\\s+)\\d\{0,\}(.*)(?=\\s+likes\\s+\\nReply\\s+Retweet\\s+\\d\{0,\}\\w\{0,1\})"))
    increment(#count)
}

The last part is the number of retweets:

\d+(.*)(?=\s+retweets\s+\d+\s+likes\s+\nReply\s+Retweet\s+\d+\w{0,1})

In my bot, this almost works!

It populates my column with the correct numbers, except that it separates two or more digit numbers, stacking them on top of each other. Someone please help me correct this.

set(#count,0,"Global")
set list position(%scrapedtweets,0)
loop($list total(%scrapedtweets)) {
    set table cell(&tweettable,#count,1,$find regular expression($next list item(%scrapedtweets),"(?<=Reply\\s\{0,3\}Retweet .*)[0-9]"))
    increment(#count)
}
 

This is my first ever attempt at regular expression, I feel like I'm in over my head. I guessed and tested my way through making it work in the editor. But I don't know how to even begin debugging the problem with it running in ubot.

 

This probably belongs in the regex forum, but since it's a continuation of my original problem, I'm adding it on here. 

 

Seriously, a little help would be really appreciated. I mean really appreciated.

Edited by Sebastian Rooks
Link to post
Share on other sites

I had the same problem, I could get it to work in Edit Pad but not Ubot.

 

One thing that worked for me was taking the code that worked in EP and transferring it to the built in REGEX editor, from there it told me what Ubot didn't like and then I could fix that and get it working in my bots.

 

Additionally, do you have Regex Cheater?  One of the guys here was gracious and put together a free bot that helps with Regex.

 

Sorry that I can't help you more, I'm learning as well but if I can help even a little I try, I know what you are going through!

 

Peace,

LJ

Link to post
Share on other sites

Learjet, thank you so much.

 

Being as far out of my element as I feel, that's a tremendous help. I didn't even realize that UBot had a regex editor. (I didn't even know what regex was until yesterday. This is all still very new to me.)

 

I don't have Regex Cheater. Thanks to you, I will in a few minutes

 

It's just good to know that I'm not the only one who's experienced this situation. It's encouraging. Sometimes that's enough.

 

Thanks..

Link to post
Share on other sites

Forgive me for not reading the whole thread before posting this but here is an example of how to scrape from the text you provided. In the case of scraping multiple tweets you would use an offset. This may not necessarily work on Twitter because there may be multiple class names but given the text provided it would work. Like I said on Twitter you would be using an offset. You can put the info into variables like I did or straight into a table.

set(#tweet,"<div class=\"tweet original-tweet js-original-tweet js-stream-tweet js-actionable-tweet js-profile-popup-actionable  






















\" data-tweet-id=\"695704208687894530\" data-disclosure-type=\"\" data-item-id=\"695704208687894530\" data-permalink-path=\"/Susanjmccann/status/695704208687894530\" data-retweet-id=\"695704319836839936\" data-screen-name=\"Susanjmccann\" data-name=\"Susan McCann\" data-user-id=\"313199716\" data-expanded-footer=\"<div class="js-tweet-details-fixer tweet-details-fixer">

  <div class="js-machine-translated-tweet-container"></div>
    <div class="js-tweet-stats-container tweet-stats-container ">
    </div>

  <div class="client-and-actions">
  <span class="metadata">
    <span>12:23 p.m. - 5 Feb 2016</span>
       &middot; <a class="permalink-link js-permalink js-nav" href="/Susanjmccann/status/695704208687894530"  tabindex="-1">Details</a>
  </span>
</div>


</div>
\" data-mentions=\"KnowledgeBishop\" data-retweeter=\"MeemsterB\" data-you-follow=\"false\" data-follows-you=\"false\" data-you-block=\"false\">


    <div class=\"context\">
      
          <div class=\"tweet-context with-icn
    
    \">

      <span class=\"Icon Icon--small Icon--retweeted\"></span>






            <span class=\"js-retweet-text\"><a class=\"pretty-link js-user-profile-link\" href=\"/MeemsterB\" data-user-id=\"2835431720\"><b>MeemsterB</b></a> Retweeted</span>


      

    </div>


    </div>

    <div class=\"content\">
      

      
      <div class=\"stream-item-header\">
          <a class=\"account-group js-account-group js-action-profile js-user-profile-link js-nav\" href=\"/Susanjmccann\" data-user-id=\"313199716\">
    <img class=\"avatar js-action-profile-avatar\" src=\"https://pbs.twimg.com/profile_images/428117971139981312/9AWbrFsZ_bigger.jpeg\" alt=\"\">
    <strong class=\"fullname js-action-profile-name show-popup-with-id\" data-aria-label-part=\"\">Susan McCann</strong>
    <span>‏</span><span class=\"username js-action-profile-name\" data-aria-label-part=\"\"><s>@</s><b>Susanjmccann</b></span>
    
  </a>

        <small class=\"time\">
  <a href=\"/Susanjmccann/status/695704208687894530\" class=\"tweet-timestamp js-permalink js-nav js-tooltip\" data-original-title=\"12:23 p.m. - 5 Feb 2016\"><span class=\"_timestamp js-short-timestamp js-relative-timestamp\" data-time=\"1454703783\" data-time-ms=\"1454703783000\" data-long-form=\"true\" aria-hidden=\"true\">1 day</span><span class=\"u-hiddenVisually\" data-aria-label-part=\"last\">1 day ago</span></a>
</small>

          
          
      </div>

      
        <p class=\"TweetTextSize TweetTextSize--16px js-tweet-text tweet-text\" lang=\"en\" data-aria-label-part=\"0\">May your faith be unshakeable and your will, unbreakable. - <a href=\"/KnowledgeBishop\" class=\"twitter-atreply pretty-link js-nav\" dir=\"ltr\" data-mentioned-user-id=\"117477071\"><s>@</s><b>KnowledgeBishop</b></a> <a href=\"/hashtag/quote?src=hash\" data-query-source=\"hashtag_click\" class=\"twitter-hashtag pretty-link js-nav\" dir=\"ltr\"><s>#</s><b>quote</b></a></p>




      
        

      
      
  <div class=\"expanded-content js-tweet-details-dropdown\">
    
      
  </div>


      
      <div class=\"stream-item-footer\">
  

  
        <div class=\"ProfileTweet-actionCountList u-hiddenVisually\">
    
    <span class=\"ProfileTweet-action--reply u-hiddenVisually\"></span>
    <span class=\"ProfileTweet-action--retweet u-hiddenVisually\">
      
      <span class=\"ProfileTweet-actionCount\" data-tweet-stat-count=\"2\">
        <span class=\"ProfileTweet-actionCountForAria\" data-aria-label-part=\"\">2 retweets</span>
      </span>
    </span>
    <span class=\"ProfileTweet-action--favorite u-hiddenVisually\">
      <span class=\"ProfileTweet-actionCount\" data-tweet-stat-count=\"2\">
        <span class=\"ProfileTweet-actionCountForAria\" data-aria-label-part=\"\">2 likes</span>
      </span>
    </span>
  </div>

    <div class=\"ProfileTweet-actionList js-actions\" role=\"group\" aria-label=\"Tweet actions\">
      <div class=\"ProfileTweet-action ProfileTweet-action--reply\">
  <button class=\"ProfileTweet-actionButton u-textUserColorHover js-actionButton js-actionReply\" data-modal=\"ProfileTweet-reply\" type=\"button\">
    <div class=\"IconContainer js-tooltip\" title=\"Reply\">
      <span class=\"Icon Icon--reply\"></span>
      <span class=\"u-hiddenVisually\">Reply</span>
    </div>
  </button>
</div>
      <div class=\"ProfileTweet-action ProfileTweet-action--retweet js-toggleState js-toggleRt\">
  <button class=\"ProfileTweet-actionButton  js-actionButton js-actionRetweet\" data-modal=\"ProfileTweet-retweet\" type=\"button\">
    <div class=\"IconContainer js-tooltip\" title=\"Retweet\">
      <span class=\"Icon Icon--retweet\"></span>
      <span class=\"u-hiddenVisually\">Retweet</span>
    </div>
      <div class=\"IconTextContainer\">
        <span class=\"ProfileTweet-actionCount\">
          <span class=\"ProfileTweet-actionCountForPresentation\" aria-hidden=\"true\">2</span>
        </span>
      </div>
  </button><button class=\"ProfileTweet-actionButtonUndo js-actionButton js-actionRetweet\" data-modal=\"ProfileTweet-retweet\" type=\"button\">
    <div class=\"IconContainer js-tooltip\" title=\"Undo retweet\">
      <span class=\"Icon Icon--retweet\"></span>
      <span class=\"u-hiddenVisually\">Retweeted</span>
    </div>
      <div class=\"IconTextContainer\">
        <span class=\"ProfileTweet-actionCount\">
          <span class=\"ProfileTweet-actionCountForPresentation\" aria-hidden=\"true\">2</span>
        </span>
      </div>
  </button>
</div>
      <div class=\"ProfileTweet-action ProfileTweet-action--favorite js-toggleState\">
  <button class=\"ProfileTweet-actionButton js-actionButton js-actionFavorite\" type=\"button\">
    <div class=\"IconContainer js-tooltip\" title=\"Like\">
      <div class=\"HeartAnimationContainer\">
        <div class=\"HeartAnimation\"></div>
      </div>
      <span class=\"u-hiddenVisually\">Like</span>
    </div>
      <div class=\"IconTextContainer\">
        <span class=\"ProfileTweet-actionCount\">
          <span class=\"ProfileTweet-actionCountForPresentation\" aria-hidden=\"true\">2</span>
        </span>
      </div>
  </button><button class=\"ProfileTweet-actionButtonUndo u-linkClean js-actionButton js-actionFavorite\" type=\"button\">
    <div class=\"IconContainer js-tooltip\" title=\"Undo like\">
      <div class=\"HeartAnimationContainer\">
        <div class=\"HeartAnimation\"></div>
      </div>
      <span class=\"u-hiddenVisually\">Liked</span>
    </div>
      <div class=\"IconTextContainer\">
        <span class=\"ProfileTweet-actionCount\">
          <span class=\"ProfileTweet-actionCountForPresentation\" aria-hidden=\"true\">2</span>
        </span>
      </div>
  </button>
</div>
      

        <div class=\"ProfileTweet-action ProfileTweet-action--more js-more-ProfileTweet-actions\">
    <div class=\"dropdown\">
  <button class=\"ProfileTweet-actionButton u-textUserColorHover dropdown-toggle js-dropdown-toggle\" type=\"button\">
      <div class=\"IconContainer js-tooltip\" title=\"More\">
        <span class=\"Icon Icon--dots\"></span>
        <span class=\"u-hiddenVisually\">More</span>
      </div>
  </button>
  <div class=\"dropdown-menu\">
  <div class=\"dropdown-caret\">
    <div class=\"caret-outer\"></div>
    <div class=\"caret-inner\"></div>
  </div>
  <ul>
      <li class=\"share-via-dm js-actionShareViaDM\" data-nav=\"share_tweet_dm\">
        <button type=\"button\" class=\"dropdown-link\">Share via Direct Message</button>
      </li>
    
      <li class=\"copy-link-to-tweet js-actionCopyLinkToTweet\">
        <button type=\"button\" class=\"dropdown-link\">Copy link to Tweet</button>
      </li>
      <li class=\"embed-link js-actionEmbedTweet\" data-nav=\"embed_tweet\">
        <button type=\"button\" class=\"dropdown-link\">Embed Tweet</button>
      </li>
          <li class=\"mute-user-item pretty-link\"><button type=\"button\" class=\"dropdown-link\">Mute</button></li>
  <li class=\"unmute-user-item pretty-link\"><button type=\"button\" class=\"dropdown-link\">Unmute</button></li>

        <li class=\"block-link js-actionBlock\" data-nav=\"block\">
          <button type=\"button\" class=\"dropdown-link\">Block</button>
        </li>
        <li class=\"unblock-link js-actionUnblock\" data-nav=\"unblock\">
          <button type=\"button\" class=\"dropdown-link\">Unblock</button>
        </li>
        <li class=\"report-link js-actionReport\" data-nav=\"report\">
          <button type=\"button\" class=\"dropdown-link\">
            
            Report
          </button>
        </li>
  </ul>
</div>

</div>

  </div>

    </div>

</div>
  



      
      

    </div>
  </div>","Global")
load html(#tweet)
set(#name,$scrape attribute(<class="pretty-link js-user-profile-link">,"innertext"),"Global")
set(#profile_link,$scrape attribute(<class="pretty-link js-user-profile-link">,"fullhref"),"Global")
set(#tweet_text,$scrape attribute(<class="TweetTextSize TweetTextSize--16px js-tweet-text tweet-text">,"innertext"),"Global")
set(#retweets,$scrape attribute(<class="ProfileTweet-action--retweet u-hiddenVisually">,"innertext"),"Global")
set(#likes,$scrape attribute(<class="ProfileTweet-action--favorite u-hiddenVisually">,"innertext"),"Global")
  • Like 1
Link to post
Share on other sites

Wow, thank you. I've spent a week attacking this project from different angles, none of which have quite gotten me 100%

 

This is how I'm implementing what you gave me int in my bot, I've only got two problems left before I can finally stop working on this thing and start working with it.

 

1. I can't figure out how to scrape a tweet picture, or even just a way to indicate in the table that there was originally a picture associated with the tweet would be good enough.

 

Without knowing that, I'll be left with text tweets that don't make sense without context, but have higher like and retweet numbers, making the data not worth much. So far I haven't been able to scrape any picture info at all, but even if I could, not all tweets have pictures, so how could I keep the data straight in my table?

 

Do you have any idea what I could do about that?

 

UPDATE: I fixed problem 2! by scraping the innertext of the tweet header (instead of trying for links) then using regular expression to replace the inner text with @\w*. I'm learning!

 

2. Not as big of a problem as number 1, but trying to set the variables #profile_link and #name, by scraping a different attribute of the same thing returns different results. It reads the name properly, but always pulls the href of the account who's page I'm on, rather than the author of that particular tweet.....again, it get's the name right. I'm confused.

set(#offset,1,"Global")
set(#keepgoing,0,"Global")
loop while($comparison(#keepgoing,"= Equals",0)) {
    set(#picturetweet,$scrape attribute($element offset(<class="Grid Grid--withGutter">,1),"fullhref"),"Global")
    set(#name,$scrape attribute($element offset(<class="fullname js-action-profile-name show-popup-with-id">,#offset),"innertext"),"Global")
    set(#profile_link,$scrape attribute($element offset(<class="fullname js-action-profile-name show-popup-with-id">,#offset),"fullhref"),"Global")
    set(#tweet_text,$scrape attribute($element offset(<class=w"TweetTextSize TweetTextSize--*6px js-tweet-text tweet-text">,#offset),"innertext"),"Global")
    set(#retweets,$scrape attribute($element offset(<class="ProfileTweet-action--retweet u-hiddenVisually">,#offset),"innertext"),"Global")
    set(#likes,$scrape attribute($element offset(<class="ProfileTweet-action--favorite u-hiddenVisually">,#offset),"innertext"),"Global")
    add list to list(%tweetswithdeets,$list from text("{#tweet_text},,,{#retweets},,,{#likes},,,{#name},,,{#profile_link}",",,,"),"Delete","Global")
    if($comparison($list total(%tweetswithdeets),">= Greater than or equal to",4)) {
        then {
            add list to table as row(&tweettable,#offset,0,%tweetswithdeets)
        }
        else {
            clear list(%tweetswithdeets)
            set(#keepgoing,1,"Global")
        }
    }
    clear list(%tweetswithdeets)
    increment(#offset)
}

Thanks

Edited by Sebastian Rooks
Link to post
Share on other sites

Also, often times when I click on the advanced element selector from $scrape attribute, I get an error that pops up like this:

 

Cannoy deserialize the current JSON array
(e.g[1,,2,3] into type 'System.Boolean'
because the type requires a JSON
primitive value (e.g. string, number,
boolean, null) to deserialize correctly.
To fix this error either change the JSON to
a JSON primitive value (e.g. string,
number, boolean, null) or change the
deserialized type to an array or a type
that implements a collection interface
(e.g. Icollection, IList) like List<T> that can
be deserialized from a JSON array.
JsonArrayAttribute can also be added to 
the type to foce it to deserialize from a
Json array.
Path "", line 1, position 1.
 
Then the advanced selector window opens, says that there's no element found.
 
Immediately after that, "UBot 5 has stopped working". 
 
This sequence of events has taken place around 20 times since the start of trying to figure out scraping.
 
Am I doing something wrong?
Link to post
Share on other sites

Now I'm really closed to being finished.

I finally figured out how to scrape a tweet picture. I'm using:

set(#didtweethaveattachment,$scrape attribute($element offset(<outerhtml=w"<img src=\"https://pbs.twimg.com/media/***.jpg\"alt=\"\" style=\"***;\">">,#offset),"fullsrc"),"Global")
 
It works, and it pulls the url of the image that was in a tweet.  The only problem I'm still having is that because not all tweets have pictures, I don't know how to match the image urls to the tweets in my table. Does that make sense?

 

Let's say that there's 10 tweets on a page and 5 of them have pictures, what I get looks like this:

 

tweet 1                 image 1

tweet 2                 image 2

tweet 3                 image 3

tweet 4                 image 4

tweet 5                 image 5

tweet 6

tweet 7

tweet 8

tweet 9

tweet 10

 

I've been working on this thing for a week now, and I believe this is the last obstacle in my path. 

If anyone can help me understand how to do keep my table aligned, I'd finally be finished, and I'd certainly be thankful.

Edited by Sebastian Rooks
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...