Jump to content
UBot Underground

Need advice on scraping only specific length texts from file.


Recommended Posts

Hello ubotters,

 

I need advice on how to create a bot that will scrape only the strings length that I want from a table (example: 3 to 15 letter texts)...Regex examples..I have tried different Regex scripts like ^.{1,15}$ which seems to working in a test website page but didn t work from the bot. Anyone has advice would be appreciated, thanks.

Link to post
Share on other sites

Here use this to get any amount of characters.

 

set(#string"This text is a test of the substring function.""Global")
set(#first ten$substring(#string, 0, 10), "Global")
load html(#first ten)

Link to post
Share on other sites

Show us some of your code and then we can help more, hopefully.

 

You can loop thru your list 15 times and add item to list...I guess???

 

set(#list"1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20""Global")
clear list(%list a)
add list to list(%list a$list from text(#list$new line), "Delete""Global")
loop(15) {
    add item to list(%list B$next list item(%list a), "Delete""Global")
}

  • Like 1
Link to post
Share on other sites

I appreciate the help..I seem to have a problem with asking the right question :) Simply put, I want to scrape only the 15 or less letter words from a file. I tried it with '$Find Regular Expression' and a Regex code with it.

set(#var000$list from file("C:\\Users\\Home\\Desktop\\something.txt"), "Global")
set(#var000$find regular expression("""(^.\{1,15\}$)"), "Global")

 

I hope this helps more.

Link to post
Share on other sites

You shouldn't say scraping then, that is understood as getting data from a web page.

 

However, I think this is what you are looking for:

add list to list(%MATCHES, $find regular expression($read file("C:\\Users\\Home\\Desktop\\something.txt"), "(?<=(\\n|^)).\{1,13\}(?=(\\n|$))"), "Delete", "Global")

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...