Jump to content
UBot Underground

Insert Line Break After Random Nth Position Of A List Or Test


Recommended Posts

Hi,

I have a test data without any line break file attached, I want to insert line break or phragraph break randomly after 3-5 lines. is it possible if yes please help me.

 

Thanks a lot for your help. I still believe in ubot. 

 

Edit (would be great instead of randomly after 3-5 line, I should say Randomly after 6-9 full stops?

 

 

Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32.Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32.Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32.Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32.Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32.Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32.Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32.Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32.

Edited by spa3212
Link to post
Share on other sites

I have a test data without any line break file attached, I want to insert line break or phragraph break randomly after 3-5 lines. is it possible if yes please help me.

 

Thanks a lot for your help. I still believe in ubot. 

 

Edit (would be great instead of randomly after 3-5 line, I should say Randomly after 6-9 full stops?

 

TL;DR Mess around with the code snippets I provided and try learning more about Regular Expressions.

 

Hi! What an awesome question, and a genuinely thought provoking challenge. And by the way, I still believe in UBot Studio, too! I'll have more to share regarding that belief in UBot Studio in another post.

 

As with so many things related to programatic problem solving, there are numerous ways to solve the same problem. This is my approach, for better or worse.

 

Let's import the test data into a variable, as follows:

set(#TextBlob,"Contrary to popular belief [etc] section 1.10.32.","Global")
 
We are naming the variable #TextBlob, but you can name it anything. We are then placing the contents of your test data inside that variable. In the interest of keeping this reply legible, I'm not inserting all of the test data in the above example. (But when I tested it, I used all the test data.)
 
Next, we'll do the following:
 
set(#TextBlob,$replace regular expression(#TextBlob,"([A-Za-z^\\\\d])\\.[^\\.]","$1.{$new line}"),"Global")

We are re-setting the variable #TextBlob and this time we are processing the contents of the variable (your test data) via a "$replace regular expression" function. I recently started learning more, in earnest, about the convoluted pile of gibberish referred to as "Regular Expressions." That's the bizarre looking nonsensical stuff you'll notice embedded in the "$replace regular expressions" function above.

 

[sIDENOTE: I want to give an unsolicited recommendation for HelloInsominia's awesome Regex Ninja tool which initially helped me understand Regular Expressions a little bit better. The weird thing is that I started getting an "Error: No activation found for this installation" message which I've yet to ask Nick to help me solve, but it forced me to actually study and learn Regular Expressions without the help of the tool -- but in my limited experience Regex Ninja is a helpful tool that I sincerely recommend and look forward to using again once I get the license issue sorted out.]

 

Here's what's going on inside the $replace regular expression" function above. The initial goal is essentially to insert a "$new line" after each sentence because from my perspective it makes it easier to clump the data into the format you ultimately desire. Given your test data, we're electing to use periods as the way in which we can accurately predict the end of a sentence. If there had been question marks (?) or exclamation marks (!) in addition to periods, we could have crafted the "$replace regular expression" function to take that into consideration as well. But for the sake of this example and given the actual test data you provided, we used periods as the break points. However, there's a wrinkle... 

 

In numerous places, there are periods that are followed by digits, and those periods aren't actually the end of the sentence. And in a few spots, there are two periods in a row. So when we created the rule for the "$replace regular expression" function, this is what we had to keep in mind based on the test data you provided:

 

1 ) A sentence should only end after a letter, be it capitalized or lower case. (That's the A-Za-z stuff you see.)

2 ) A sentence should NOT end after a digit. (That's the ^\d thing you see -- ^ means not, \d means digit.)

3 ) A sentence should NOT end after the second of two sequential periods. (That's the ^\. you see.)

 

When stuff gets surrounded by parenthesis, I think it's referred to as a "capture group," and then you can represent that stuff through variable identifiers that take the form of $1, $2, etc. (Remember, I'm not a classically trained computer programmer, thus my use of UBot Studio, and also, I barely know the basics of Regular Expressions, so some of my terminology could be a bit off.) The reason we needed a "capture group" was because otherwise, when we input the "replace" part of the "$replace regular expression" function, we'd be losing whatever came right before the period that ended the sentence. 

 

So the second part of the "$replace regular expression" function contains $1, so that whatever was captured is replicated, we also include a period (.) because that's the end of a sentence, and then a "$new line" to trigger the line break.

 

At this point, there are two pieces of the puzzle that remain to be solved: 

 

The first of which is to figure out how to detach situations in which sentences that have no space between the end of the first sentence and the beginning of the second sentence. When you look at the test data you provided, the word "Contrary" would appear to be the natural start of a sentence. However, the word is pressed up against the period that ended the preceding sentence. Why don't you go ahead and wrestle with this a bit more to see if you can figure it out using the code snippets and hints I've provided as a bit of a clue. See if you can successfully break that clump of text apart.

 

The second piece of the puzzle is to figure out how to insert a line break / new line after every Nth position. There again, why not go ahead and wrestle with it a bit yourself and see if you can figure something out.

 

Something to keep in mind. The test data you provided has a repetitive consistency to it. So one way to have potentially solved the problem would have been to simply pick the words that demarcate the start/end of a sentence and insert the line breaks / new lines every Nth position. But my assumption is that you'll eventually be working with totally random data whose sentence structure you can't always predict ahead of time, thus my choice of attempting to craft a solution with Regular Expressions that is as flexible as possible.

 

I just briefly re-read my reply, and I have to admit, I'm not sure it even makes total sense.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...