Jump to content
UBot Underground

Need Help Catching Text Between Items On Different Lines.


Go to solution Solved by Code Docta (Nick C.),

Recommended Posts

Hi, this may actually drive me insane. I've been working on what should have been a simple project for days now, and I've come to a sticking point I'd really appreciate some help getting through.

 

I have a list populated with entries like this:

 Penelope Retweeted
 I'm yer Freckleberry ‏@PicklesnPickles  46 mins46 minutes ago
He adores my freckles.  I think I'll let him count them.
22 retweets 32 likes
Reply   Retweet  22   
Like 32  
More

This is my first project featuring regex. So far, I've been able to select the numbers of retweets and likes from the fourth line from the bottom and add it to a table. I'm pretty happy about that.

 

But I can't figure out how to get the tweet itself.

I need something like "everything after the first line containing the "@", but before the fourth line from the bottom". In some way or another.

 

This worked in testing on EditPadPro, but I can't get anything to select an item after a line break in Ubot.

(?<=@.{1,100})(\s\n.*\s\n?.*)(?=\s\n\d{0,6}\s+retweets?\s+\d{0,}\s+likes?\s+\nReply\s+Retweet?\s+\d{0,}\w{0,1}) 

I've tried shortening it down, cutting everything out of it, I just can't get it to cross a \n in Ubot. My head hurts and I believe that my brains might be melting, so please help a brother out here.

 

Thanks guys.

Link to post
Share on other sites

Thank you SOOO much!

 

This is a big advancement in the right direction. That got me something, which is a lot more than I had before, which was nothing.

 

I just need to figure out how to modify the beginning and the end. The first line doesn't always contain "ago", the only real constant I see  that it contains an @ symbol somewhere towards the middle of that line.

 

Also doesn't always end with ".", but it does always end with (?=\[d*\s]retweets?) or what I imagine looks something like that (but probably doesn't.)

 

One weird thing though.....your code doesn't catch anything for me in the UBot regex editor. But when I ran it in my bot it works perfectly. 

It's going to be hard to learn this if I can't trust the editor. When I got something to work in EditPadPro, it didn't work in my bot.

 

Is this actually a thing that happens or am I doing something wrong? 

 

Thanks again!

Link to post
Share on other sites
  • Solution

Hi,

 

By replace regex

comment("keep in mind if you use this 
the regex will see in the tweet too
and if
like
more
retweet
etc
is in the tweet the tweet they will be gone too")
set(#tweat,"Penelope Retweeted
 I\'m yer Freckleberry ‏@PicklesnPickles  46 mins46 minutes ago
He adores my freckles.  I think I\'ll let him count them.
22 retweets 32 likes
Reply   Retweet  22   
Like 32  
More","Global")
set(#clean regex,$replace regular expression(#tweat,".*Retweeted|Reply.*|Like.*|More|.*retweets.*",$nothing),"Global")
alert(#clean regex)

by remove from list

clear list(%clean)
set(#tweat,"Penelope Retweeted
 I\'m yer Freckleberry ‏@PicklesnPickles  46 mins46 minutes ago
He adores my freckles.  I think I\'ll let him count them.
22 retweets 32 likes
Reply   Retweet  22   
Like 32  
More","Global")
add list to list(%clean,$list from text(#tweat,$new line),"Delete","Global")
comment("remove last 4 items")
loop(4) {
    remove from list(%clean,$subtract($list total(%clean),1))
}
comment("remove first")
remove from list(%clean,0)
alert(%clean)
stop script
set(#new tweet,$replace regular expression(#tweat,".*Retweeted|Reply.*|Like.*|More",$nothing),"Global")
alert(#new tweet)

same as above in a function

set(#tweat,"Penelope Retweeted
 I\'m yer Freckleberry ‏@PicklesnPickles  46 mins46 minutes ago
He adores my freckles.  I think I\'ll let him count them.
22 retweets 32 likes
Reply   Retweet  22   
Like 32  
More","Global")
alert($clean retweet(#tweat))
define $clean retweet(#TWEET TO CLEAN) {
    clear list(%clean)
    add list to list(%clean,$list from text(#tweat,$new line),"Delete","Global")
    comment("remove last 4 items")
    loop(4) {
        remove from list(%clean,$subtract($list total(%clean),1))
    }
    comment("remove first")
    remove from list(%clean,0)
    return(%clean)
}

in ironpython

 

set(#tweat,"Penelope Retweeted
 I\'m yer Freckleberry ‏@PicklesnPickles  46 mins46 minutes ago
He adores my freckles.  I think I\'ll let him count them.
22 retweets 32 likes
Reply   Retweet  22   
Like 32  
More","Global")
alert($run python with result("tweet = \'\'\'{#tweat}\'\'\'

split_tweet = tweet.split(\'\\n\')
clean = split_tweet[1:-4]
joined = \'\\n\'.join(clean)

joined"))

take your pick

the two list approaches seem universally slow when I ran them more than once
I will report this in tracker

it should run generally as fast as the python script

Hope this helps,

 

CD

parse tweet-example-bug.ubot

  • Like 1
Link to post
Share on other sites

I honestly don't know how I could thank you enough for this. This has been sucking the life and productivity out of me for days now.

You gave me options that I'm going to save and study, and surely apply to different things at different times in the future.

 

I promise that when I get to the point that I've got more answers than questions, I'll take time to help people out too.

 

I went with the remove from list option, and I made a couple of changes to meet my exact needs. Removing the last 4 lines is a consistent win. But the first two lines can vary. I made some changes in how they're handled, that seem to be working every time, providing me with additional lists of data for my table like whether the tweet was a retweet, and if so, who was it retweeted from. None of which would have been possible without your help.

clear list(%clean)
clear list(%retweet or not)
set(#tweat,$next list item(%scrapedtweets),"Global")
add list to list(%clean,$list from text(#tweat,$new line),"Delete","Global")
comment("remove last 4 items")
loop(4) {
    remove from list(%clean,$subtract($list total(%clean),1))
}
comment("remove first")
if($contains($list item(%clean,0),"Retweeted")) {
    then {
        add item to list(%retweeted from,$find regular expression($list item(%clean,1),"@\\w*"),"Don\'t Delete","Global")
        add item to list(%retweet or not,"retweet","Don\'t Delete","Global")
        remove from list(%clean,0)
        remove from list(%clean,0)
    }
    else {
        add item to list(%retweet or not,$nothing,"Don\'t Delete","Global")
        add item to list(%retweeted from,$nothing,"Don\'t Delete","Global")
    }
}
set(#new tweet,%clean,"Global")
add item to list(%final tweet,#new tweet,"Don\'t Delete","Global")
alert(#new tweet)

also I use this to play with regex

 

http://regexhero.net/tester/

 

it is .Net style which is same as ubot

Thank you, that is tremendously helpful. I read that there are different dialects of regex, but I didn't know which was which.

 

tracker issue if you can confirm this +1 it please

 

http://tracker.ubotstudio.com/issues/966

 

I can absolutely confirm this.

It consistently takes just over 30 seconds to get through this loop, which is going to be a bit problematic with runs through hundreds of list items.

I tried to confirm in the tracker, but it won't let me log in, won't let me register, won't recognize my info to send me a new password. I don't know what that's about.

 

 

Thank you, Thank you, and Thank you again.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...