problems using $replace

chris weber · August 19, 2010

Hi everyone,

I am having a bit of a problem with my bot again and trying to use the $replace command. I have a list of url's that I have scraped. For some reason when I scrape them there are a few lines that have this "javascript:void(0)" instead of a website url. So what I want to do is load in this file of url's and use the $replace command to replace all the lines that say "javascript:void(0)" with $nothing. I'd then like to save the scrubbed url's list back out to a file. Please if anyone knows how I can accomplish this, please let me know. Thanks again,

Chris

MiriamMB · August 19, 2010

hey Chris,

I can take a look at your script privately for you. Just send me a message.

chris weber · August 19, 2010

Hey Lily,

Here is my google maps scraper as of right now. You may need to change file paths in it though for it to work. So the problem I'm having is in the scraping links script. I have a sub named "delete bad links" that I'm trying to get to load the file, search through it and delete the bad line and then save the scrubbed list of urls back out to the file.

So when you go to run it just enter a city like "los angeles" and a business type like "dentist" and then run it and it will search google maps and get the links but when it saves them to a file you will see the file has at least one line in it that says "javascript:void(0)" rather than a url. Those are the lines that I want to remove from the list.

Thanks again Lily, your help has been amazing:)

google maps scraper.ubot

MiriamMB · August 19, 2010

I never got any of the javascript junk when I tried it. It might be because I was logged into an account. Perhaps you would get cleaner links if you had an account and you were logged in.

The add to list part with the replace constant:

There are no string(you know, the numbers in the squigly lines? {1}{2}{3} etc.) referring to the appropriate constant. Why is that? Did you delete them?

In the Scraping Links script, I made some modification to make things look less cluttery and make your script work a bit better.

at the end of the script, I made some modification to the "add to list" and added two set commands above it.

I set a variable called "file text" to the constant $read file with the file containing your scraped list of urls.

I then created another set command where I set a variable called "modified text" to the replace constant, where the original text is the variable "file text" (remember it has the $read file set to the file containing your scraped urls), under Search, put the thing you are trying to replace(javascript:void(0)), and then under place, put in the $new line constant like you wanted.

In your "add to list", you would then out the name of the list you are trying to create after cleaning your list, and then in the content will be the constant $list from text with the variable #modified text under text in the $list from text parameter window.

The delimiter would be the $new line constant

I'm attaching the modified bot. let me know how it goes.

Modified.ubot

Hey Lily,

Here is my google maps scraper as of right now. You may need to change file paths in it though for it to work. So the problem I'm having is in the scraping links script. I have a sub named "delete bad links" that I'm trying to get to load the file, search through it and delete the bad line and then save the scrubbed list of urls back out to the file.

So when you go to run it just enter a city like "los angeles" and a business type like "dentist" and then run it and it will search google maps and get the links but when it saves them to a file you will see the file has at least one line in it that says "javascript:void(0)" rather than a url. Those are the lines that I want to remove from the list.

Thanks again Lily, your help has been amazing:)

google maps scraper.ubot

chris weber · August 19, 2010

Hi Lily,

Thanks for that great info. I have tried logging in to my google account through the script and then scraping the links but I am still having the javascript problems. I attached the txt file that has the line of javascript code in it. If you know how I could search the text file and remove those lines that would be great. Thanks again:)

business links.txt

MiriamMB · August 19, 2010

Hi Lily,

Thanks for that great info. I have tried logging in to my google account through the script and then scraping the links but I am still having the javascript problems. I attached the txt file that has the line of javascript code in it. If you know how I could search the text file and remove those lines that would be great. Thanks again:)

business links.txt

Ok, I understand better now. I thought you meant it was attached to the urls. I am taking a look now.

MiriamMB · August 19, 2010

Hi Lily,

Thanks for that great info. I have tried logging in to my google account through the script and then scraping the links but I am still having the javascript problems. I attached the txt file that has the line of javascript code in it. If you know how I could search the text file and remove those lines that would be great. Thanks again:)

business links.txt

Okey, this solution works. I have isolated it for you to run it separately and then implement if into your script later. Instead of replacing the void with a $newline I replaced it with a $nothing constant. You can change that as you wish.

let me know how it goes.

Cleaning the javascript.ubot

chris weber · August 20, 2010

THANK YOU!!! That works great for what I need to do. Now my question is that after I savethose url's I have another script that loads those urls and navigates to each url. However when the javascript gets removed from the list it leaves a blank line in the list instead, so the script stops and errors when it gets to the blank line because it doesn't know how to navigate to a blank site. So is there any way to remove that line entirely and just keep a continous flow of url's. I also attached my scrubbed list of url's. Thanks again for your amazing help:)

business links.txt

chris weber · August 20, 2010

Nevermind. I got it to work. I just have it replacing the javascript error lines with a dummy url that just goes to google.com so that it doesn't error out on those lines. Thanks again for all the help Lily:)

MiriamMB · August 20, 2010

Nevermind. I got it to work. I just have it replacing the javascript error lines with a dummy url that just goes to google.com so that it doesn't error out on those lines. Thanks again for all the help Lily:)

great job! I don't know why that didn't come to mind! lol *sigh*

Wisdom4U · September 28, 2010

Nevermind. I got it to work. I just have it replacing the javascript error lines with a dummy url that just goes to google.com so that it doesn't error out on those lines. Thanks again for all the help Lily:)

Hi Chris,

Wondering if you completed this project. Is it working? I am trying to do something similar and would appreciate your input. Here is one of my questions: Why did you decide to use maps.google.com instead of simply using google.com? And do you feel it was the right choice?

I am a new user and new to the forum, not sure if there are restrictions on PM or ...

Look forward to your response.

Steve

tbc · October 6, 2011

Is there a Google Maps Scraper for Ubot 4? These files only work for Ubot 3. Or does exist a conversion programm for old ubot.files?

problems using $replace

Recommended Posts

chris weber 0

Link to post

Share on other sites

MiriamMB 63

Link to post

Share on other sites

chris weber 0

Link to post

Share on other sites

MiriamMB 63

Link to post

Share on other sites

chris weber 0

Link to post

Share on other sites

MiriamMB 63

Link to post

Share on other sites

MiriamMB 63

Link to post

Share on other sites

chris weber 0

Link to post

Share on other sites

chris weber 0

Link to post

Share on other sites

MiriamMB 63

Link to post

Share on other sites

Wisdom4U 0

Link to post

Share on other sites

tbc 0

Link to post

Share on other sites

Join the conversation