Regex to remove multiple return characters

LiveCode is the premier environment for creating multi-platform solutions for all major operating systems - Windows, Mac OS X, Linux, the Web, Server environments and Mobile platforms. Brand new to LiveCode? Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

jameshale
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 474
Joined: Thu Sep 04, 2008 6:23 am
Location: Melbourne Australia

Regex to remove multiple return characters

Post by jameshale » Thu Apr 06, 2023 3:30 pm

Hi,
I have fields with multiple empty lines (return characters) preceding the actual text.
I would like to use the replacetext function to remove them.
as in

Code: Select all

put "^\r" into rt
Put replacetext(Fld "f1",rt,"") into Fld "f1
I have tried a few different expressions in rt but the most I can accomplish affects only the first occurrence.

I know I could accomplish this with a repeat loop removing empty lines but the replace text function should also be able to do it surely.
any thoughts?

Klaus
Posts: 13829
Joined: Sat Apr 08, 2006 8:41 am
Location: Germany
Contact:

Re: Regex to remove multiple return characters

Post by Klaus » Thu Apr 06, 2023 3:41 pm

Hi James,

sorry I have no idea of REGEX, but doesn't:

Code: Select all

filter fld "xyz" without empty
do the trick?

Best

Klaus

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9670
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: Regex to remove multiple return characters

Post by dunbarx » Thu Apr 06, 2023 4:30 pm

My instant reaction was "What Klaus said".

I then stopped for just one second thinking that maybe one could have "empty" somehow embedded inside a line, but then thought that I have no idea if that even makes sense. Spaces ain't empty.

The only "place" that empty can "live" (apart from being the entirety of the contents of a container as a whole) is the entirety of a line. I cannot see any other chunk, apart from a line, having empty as a member.

I think this is so. (?)

Craig

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9389
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: Regex to remove multiple return characters

Post by richmond62 » Thu Apr 06, 2023 4:40 pm

Code: Select all

on mouseUp
   put fld "xyz" into XYZ
   put empty into fld "xyz"
   repeat until XYZ is empty
      if char 1 of XYZ is cr then
         --- do nothing
      else
         put char 1 of XYZ after fld "xyz"
      end if
      delete char 1 of XYZ
   end repeat
end mouseUp

Cairoo
Posts: 107
Joined: Wed Dec 05, 2012 5:54 pm

Re: Regex to remove multiple return characters

Post by Cairoo » Thu Apr 06, 2023 4:47 pm

Klaus wrote:
Thu Apr 06, 2023 3:41 pm
Hi James,

sorry I have no idea of REGEX, but doesn't:

Code: Select all

filter fld "xyz" without empty
do the trick?

Best

Klaus
What Klaus said, except you may have to use the form "filter lines of".

Gerrie

Klaus
Posts: 13829
Joined: Sat Apr 08, 2006 8:41 am
Location: Germany
Contact:

Re: Regex to remove multiple return characters

Post by Klaus » Thu Apr 06, 2023 5:06 pm

Will also work without "... lines of ..."!

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9670
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: Regex to remove multiple return characters

Post by dunbarx » Thu Apr 06, 2023 5:54 pm

Correct.

If you read my rant, that "empty" exists only as the explicit contents of a container OR as a "character" in a container that contains lines, then eliminating empty is identical to eliminating lines that contain empty.

One cannot have a line (or any other chunk) with both empty and anything else at all. That is what made me hesitate for just a second before posting the usual "What Klaus said"

Craig
Last edited by dunbarx on Thu Apr 06, 2023 5:57 pm, edited 1 time in total.

stam
Posts: 2686
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: Regex to remove multiple return characters

Post by stam » Thu Apr 06, 2023 5:55 pm

I regularly use regex (excuse the pun!) but wouldn't use regex for this personally - filter is an excellent fit.

Regarding syntax, this is:

Code: Select all

filter lines of <container> without empty [into <container2>]
The [ ] denotes optional (ie if you want to store the filtered result in a different container)

HTH
S.



PS: If you're set on using regex, the correct expression would probably be something like (\R\s*\R) where \R is CR, LF or CRLF, \s is any whitespace (including tab etc) and * denotes zero or more of the previous char - so you'd search for return + optional whitespaces + return and replace it with a single return, ie:

Code: Select all

put replaceText(<container>, "\R\s*\R", return) into <container>
but not really sure this is a better solution that filter above.


I always recommend https://regex101.com when designing/testing regex
Last edited by stam on Thu Apr 06, 2023 6:14 pm, edited 3 times in total.

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9670
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: Regex to remove multiple return characters

Post by dunbarx » Thu Apr 06, 2023 6:08 pm

@Stam.

Cannot hurt.

@ All
If one has three lines in a container (field 2 in this test):
a

b
and one does this:

Code: Select all

on mouseUp
   repeat with y = 1 to 10
      put charToNum (char y of fld 2) & return after temp
   end repeat
end mouseUp
One gets:
97
10
10
98
Empty has no ASCII value, even though it does have a reality. When LC deletes the lines of text that are empty, it "knows" that even though those lines do indeed contain a valid character (ASCII 10) it deletes them anyway. LC just knows that this is how humans see it, and accommodates.

But otherwise I am not sure how to find empty except by other means, such as finding the length of each line. In the above example, there are three lines, but the length of line 2 is 0.

This is why "return" (CR) is considered a "control" character, whereas "a" is a real character.

Craig

Craig

stam
Posts: 2686
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: Regex to remove multiple return characters

Post by stam » Thu Apr 06, 2023 6:20 pm

Craig, you're assuming empty is indeed empty. But one should guard against three being a space or a tab in there. Or maybe even some silly ascii code that doesn't have a char, like ascii 7 or 27.

Whitespace chars (like the above) are considered 'empty' in the filter statement, but also by the \s regex token (as in my amended post above).
Personally I avoid looping like the plague ;)

Cairoo
Posts: 107
Joined: Wed Dec 05, 2012 5:54 pm

Re: Regex to remove multiple return characters

Post by Cairoo » Thu Apr 06, 2023 6:22 pm

Klaus wrote:
Thu Apr 06, 2023 5:06 pm
Will also work without "... lines of ..."!
Yes, Klaus, I stand corrected.

Something I've noticed while looking for an actual regex solution is that LiveCode wrongly interprets the regex "\r" as "\n".
So the following regex-based code should work, but doesn't:

Code: Select all

put replaceText(replaceText(replaceText(fld "f1","^\r",""),"\r\r",""),"\r$","") into fld "f1"
and the following shouln't work, but does:

Code: Select all

put replaceText(replaceText(replaceText(fld "f1","^\n",""),"\n\n",""),"\n$","") into fld "f1"
Perhaps it warrants a bug report?

Gerrie

stam
Posts: 2686
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: Regex to remove multiple return characters

Post by stam » Thu Apr 06, 2023 6:25 pm

Cairoo wrote:
Thu Apr 06, 2023 6:22 pm
Perhaps it warrants a bug report?
I posted an explanation above: the correct regex - and I say correct because it caters for any 'whitespace' characters (like space or tab) in the 'empty' line, so you can capture both return & return as well as return & space & return:

Code: Select all

put replaceText(<container>, "\R\s*\R", return) into <container>
if you want to capture all forms of a newLine char, use \R, not \r or \n. More specifically, \R captures: \r\n|\n|\x0b|\f|\r|\x85
Also, unless you want to concatenate lines you still have to replace the double returns with a single return...

HTH
S.

Cairoo
Posts: 107
Joined: Wed Dec 05, 2012 5:54 pm

Re: Regex to remove multiple return characters

Post by Cairoo » Thu Apr 06, 2023 6:37 pm

stam wrote:
Thu Apr 06, 2023 6:25 pm
...
if you want to capture all forms of a newLine char, use \R, not \r or \n....
Indeed the correct regex. It still bugs me that LiveCode wrongly interprets "\r" as "\n", though.

stam
Posts: 2686
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: Regex to remove multiple return characters

Post by stam » Thu Apr 06, 2023 6:50 pm

Cairoo wrote:
Thu Apr 06, 2023 6:37 pm

Indeed the correct regex. It still bugs me that LiveCode wrongly interprets "\r" as "\n", though.
could this have something to do with the fact the line ending could be: \r, \n, or \r\n?
are you sure your \r isn't picking up \r\n?

If you are convinced your regex is correct test it out on https://regex101.com, and you'll be able to see what exactly is being captured... it also has a hand set of features like a quick reference, hovering over tokes in your expression tells you what they do etc.

For what it's worth I practically never need nested replaceText or findText messages - regex is an incredibly flexible language that can manage all of that, but not easy to use; this website lets me tinker with regex until I get it to do what I want with a single statement...

S.

PS: in your example you nest two replaceText commands but that's unnecessary - all you need to do is replace 2 consecutive line endings with one. There is a scenario where the 'empty' line may have for example tabs in it - if you have records in TSV format but the fields of one record are all empty. The text will will just show line->empty line->line but in reality it's line->(tab tab tab)->line so important to guard for that. Filter lines does this for you but the regex I post above will as well.

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9670
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: Regex to remove multiple return characters

Post by dunbarx » Thu Apr 06, 2023 7:12 pm

Stam.

I see the ghostlike issue here, but characters like "tab" are "real".

If you have a field with what looks like an empty line, but it in fact contains a tab char, then:

Code: Select all

filter lines of fld 2 without empty
leaves those lines alone. They are not, er, empty.

I am not sure we are in disagreement overall. The issue really is that there are "real" chars and "control" chars. One has to jump through loops to really delete lines that read empty, as opposed to lines that actually are empty. :wink:

Craig

Post Reply

Return to “Getting Started with LiveCode - Experienced Developers”