Finding text without variants

Bruce Brown · Post by **Bruce Brown** » Fri Apr 14, 2023 6:17 pm

I am parsing a CGATS text file (color measurement file). If you look at the header we have "BEGIN_DATA_FORMAT" and further down we have "BEGIN_DATA". I want to skip the first one (..._FORMAT) and go directly to the second one. I have tried the various Find options but it always selects the first one. If I search again I get the second one as expected but why can't I just find "BEGIN_DATA"? I even tried to find "BEGIN_DATA " (with a space) but still, it selects the BEGIN_DATA in "BEGIN_DATA_FORMAT" first. Thanks in advance for your help.

Here is a snip-it of the header and the beginning of the data:
LOGO_ECI2002
ORIGINATOR "COLOR TUNER"
DESCRIPTOR "Output Characterisation"
CREATED 08/12/2021 12:41:13
ILLUMINATION_NAME D50
OBSERVER_ANGLE "2"
DEVICE "Eye-One Pro 3 iO SN Eye-One Pro 3 iO 50364 Eye-One Pro 3 2000669"
CONDITION "M0"
LGOROWLENGTH 44
NumberOfStrips 2
KEYWORD "SAMPLE_NAME"
NUMBER_OF_FIELDS 45
BEGIN_DATA_FORMAT
SampleID SAMPLE_NAME CMYK_C CMYK_M CMYK_Y CMYK_K LAB_L LAB_A LAB_B nm380 nm390 nm400 nm410 nm420 nm430 nm440 nm450 nm460 nm470 nm480 nm490 nm500 nm510 nm520 nm530 nm540 nm550 nm560 nm570 nm580 nm590 nm600 nm610 nm620 nm630 nm640 nm650 nm660 nm670 nm680 nm690 nm700 nm710 nm720 nm730
END_DATA_FORMAT
NUMBER_OF_SETS 1628
BEGIN_DATA
1 A1 85.00 55.00 85.00 0.00 24.86 -20.46 1.36 0.0171 0.0218 0.0297 0.0309 0.0349 0.0351 0.0352 0.0352 0.0355 0.0378 0.0455 0.0659 0.0928 0.0952 0.0746 0.0593 0.0532 0.0435 0.0342 0.0316 0.0309 0.0284 0.0244 0.0213 0.0204 0.0207 0.0224 0.0262 0.0313 0.0339 0.0328 0.03 0.0263 0.0253 0.0317 0.0485
2 A2 100.00 40.00 100.00 0.00 23.14 -41.30 4.11 0.0119 0.0149 0.0196 0.0216 0.0224 0.0227 0.0227 0.023 0.0236 0.0261 0.0356 0.0631 0.1068 0.1238 0.102 0.0721 0.0518 0.0355 0.0214 0.0138 0.0109 0.0097 0.0088 0.0082 0.008 0.0081 0.0085 0.0098 0.0116 0.0126 0.0121 0.0111 0.0099 0.0102 0.0125 0.0191

Klaus · Post by **Klaus** » Fri Apr 14, 2023 6:35 pm

Hi Bruce,

don't use FIND!

Use "lineoffset" and "set the wholematches" to true first:

Code: Select all

...
set the wholematches to TRUE
put lineoffset("BEGIN_DATA",your_data_here) into the_line_to_begin_with

## Then maybe:
put line (the_line_to_begin_with + 1) to -1 of your_data_here into the_data_containing_the_info_you_are_looking_for
## Will put the last 4 lines (in your example) into the variable with the meaningful name :-)
## You get the picture
...

WHOLEMATCHES will make sure that the following LINEOFFSET will find the EXACT phrase "BEGIN_DATA" and no variants of it!
Hope that helps! Come back if you have more (detailed) questions.

Best

Klaus

Bruce Brown · Post by **Bruce Brown** » Fri Apr 14, 2023 7:13 pm

Thank you Klaus. Despite me being a supporter of LiveCode since 2013 I really have not done much with it. My old age (63) and time to "play" has limited my programming skills and knowledge. Anyway, I am playing with what you have sent me and I will figure it out. My ultimate goal is to find in the measurement data with cmyk values = to zero and then what the lab value is. In the file I took a snip-it of, it is the line (not shown initially):
73 B29 0.00 0.00 0.00 0.00 95.69 0.45 -2.36 0.1221 0.1744 0.3934 0.5799 0.8883 0.9495 0.9643 0.9469 0.9299 0.9255 0.9175 0.9073 0.9031 0.9002 0.8992 0.896 0.8929 0.8896 0.8857 0.8871 0.8858 0.8864 0.8874 0.8891 0.8913 0.8932 0.8966 0.8988 0.9009 0.9023 0.9024 0.902 0.9037 0.9038 0.9012 0.8953
it is shown with text wrapping so you get the spectral values but it is line 73 of measurement and patch ID B29 the lab value is L95.69 a0.45 b-2.36
The lab value is the white point - basically the lab value of paper color. This project will grow from here but first baby steps.

BTW: cmyk = printing inks Cyan Magenta Yellow and Black

Klaus · Post by **Klaus** » Fri Apr 14, 2023 7:46 pm

Hi Bruce,

age is definitively no excuse, I'm 66!

I know what CMYK and LAB etc are, but I'm not sure if your post contains another question?
Are there any priciples in the data that could help to solve your problem?
If yes, just let me know and we will find an easy solution for you.

Best

Klaus

bn · Post by bn » Fri Apr 14, 2023 9:01 pm

Hi Bruce,

If I understand you correctly you want to extract from the data the lines where CMYK is 0.00 0.00 0.00 and then get the Lab values in your example
95.69 0.45 -2.36

If that is the case you could make a stack with 2 fields and a button. Set the "dontWrap" of field 1 to true and also hScroll.

paste the whole data into field 1

The script of the button could be something like this:

Code: Select all

on mouseUp
   local tData, tWorkData, tCollect, tLineNum, tSearchZero
   put field 1 into tData ## all data
   set the wholeMatches to true
   put lineOffset("Begin Data", tData) into tLineNum
   if tLineNum is empty then
      answer "Begin Data not found"
      exit mouseUp
   end if
   set the wholeMatches to false
   put line tLineNum + 1 to -1 of tData into tWorkData
   put "0.00 0.00 0.00 0.00" into tSearchZero
   
   repeat for each line aDataPoint in tWorkData
      if word 3 to 6 of aDataPoint is tSearchZero then
         put "line:" && word 1 of aDataPoint & comma && "Patch ID:" && word 2 of aDataPoint & comma && "L:" && word 7 of aDataPoint \
               && "a:" && word 8 of aDataPoint && "b:" && word 9 of aDataPoint & cr after tCollect
      end if
   end repeat
   put tCollect into field 2
end mouseUp

The line number, patch ID and Lab values will be put into field 2

I tested by using the first part of your data and appending the second part (the line with CMYK 0000) to it and hit the button.

If however your actual data is more complicated this might not work but it gives you and idea how to parse your data (assuming that I am not misunderstanding what you are after)

Kind regards
Bernd

Bruce Brown · Post by **Bruce Brown** » Sat Apr 15, 2023 3:27 am

Hi Bernd and Klaus,
You both helped me so much! Thanks. I almost have it. I will adjust some things this weekend and see if I get it to work completely. I am able to parse the text and remove the space and tab characters between data points but the 3 to 6 word part is not finding the cmyk values.
Here is my code so far for the find white point button.
on mouseUp
local tData, tWorkData, tCollect, tLineNum, tSearchZero
put field "fileIn" into tData ## all data
set the wholeMatches to true
put lineOffset("BEGIN_DATA", tData) into tLineNum
if tLineNum is empty then
answer "BEGIN_DATA not found"
exit mouseUp
end if
set the wholeMatches to false
put line tLineNum + 1 to -1 of tData into tWorkData
put replacetext( tWorkData, " *","") into tWorkData
put replaceText(tWorkData,tab," ") into field csvData
-- originally I had above line replace the white space (tab) with a comma but that did not work (hence variable csvData)
-- however it still does not put lab value in field whtPtLab. I set it back to space between values but also not working
put "0.00 0.00 0.00 0.00" into tSearchZero

repeat for each line aDataPoint in tWorkData
if word 3 to 6 of aDataPoint is tSearchZero then
answer sDataPoint
put "line:" && word 1 of aDataPoint & comma && "Patch ID:" && word 2 of aDataPoint & comma && "L:" && word 7 of aDataPoint \
&& "a:" && word 8 of aDataPoint && "b:" && word 9 of aDataPoint & cr after tCollect
end if
end repeat
put tCollect into field "whtPtLab"
end mouseUp

bn · Post by bn » Sat Apr 15, 2023 8:27 am

Hi Bruce,

Your actual data is a bit different from the first snippets that you posted.

As you noticed in your code:
Tab is the column delimiter and CYMK and Lab values have leading zeroes for 7 characters and wavelengths have trailing spaces for 6 characters.

I changed the code by removing all spaces, leaving tabs as column delimiter. Then I set the itemDelimiter in LC to tab and change the search string replacing space with tab.

This finds 1 occurence where all 4 values of CYMK are 0.00 at line 49.

Please note that I now operate on "items" which are set to "tab" and that I adjusted tSearchZero accordingly.

Code: Select all

on mouseUp
   local tData, tWorkData, tCollect, tLineNum, tSearchZero
   put field 1 into tData ## all data
   set the wholeMatches to true
   put lineOffset("Begin Data", tData) into tLineNum
   if tLineNum is empty then
      answer "Begin Data not found"
      exit mouseUp
   end if
   set the wholeMatches to false
   put line tLineNum + 1 to -1 of tData into tWorkData
   replace space with empty in tWorkData
   put "0.00 0.00 0.00 0.00" into tSearchZero
   replace space with tab in tSearchZero
   set the itemDelimiter to tab ## operate on tab delimited items
   repeat for each line aDataPoint in tWorkData
      if item 3 to 6 of aDataPoint is tSearchZero then
         put "line:" && item 1 of aDataPoint & comma && "Patch ID:" && item 2 of aDataPoint & comma && "L:" && item 7 of aDataPoint \
               && "a:" && item 8 of aDataPoint && "b:" && item 9 of aDataPoint & cr after tCollect
      end if
   end repeat
   put tCollect into field 2
end mouseUp

Kind regards
Bernd

Thierry · Post by **Thierry** » Sat Apr 15, 2023 2:02 pm

bn wrote: ↑
Sat Apr 15, 2023 8:27 am
Hi Bruce,

Your actual data is a bit different from the first snippets that you posted.

As you noticed in your code:
Tab is the column delimiter and CYMK and Lab values have leading zeroes for 7 characters and wavelengths have trailing spaces for 6 characters.

I changed the code by removing all spaces, leaving tabs as column delimiter. Then I set the itemDelimiter in LC to tab and change the search string replacing space with tab.

This finds 1 occurence where all 4 values of CYMK are 0.00 at line 49.

Please note that I now operate on "items" which are set to "tab" and that I adjusted tSearchZero accordingly.

Bernd

Hi regex Afficionados

Out of nostalgia, I came here by chance and following Bernd comments,
here is another way to solve this problem:

Code: Select all

on searchNullCYMK
   local T, theCMYKnullRex, nLine, patchID, _L, _a, _b
   set the wholeMatches to true
   put  field 1 into T
   put line lineOffset("BEGIN_DATA", T) + 1 to -1 of T into T
   // catch nLine and patchID
   put  "(?m)^(\d+)\s+(.\d+)" into theCMYKnullRex
   // null CYMK pattern
   put "(?:\s+0\.00){4}"  after theCMYKnullRex
   // catch L, a , b
   put "\s+(-?\d+\.\d+)\s+(-?\d+\.\d+)\s+(-?\d+\.\d+)" after theCMYKnullRex
   
   if matchText( T, theCMYKnullRex, nLine, patchID, _L, _a, _b) then
      put "Found: " && nLine, patchID, _L, _a, _b
   else
      put "No match :("
   end if
   
end searchNullCYMK

Hoping a couple of you could appreciate this input....

Kind regards,
Thierry

bn · Post by bn » Sat Apr 15, 2023 2:59 pm

Hi Thierry,

What a nice surprise to see you here again. And what a wonder of Regex you brought along. It works like a charm. (It even might be a charm).

Thank you very much.

Kind regards
Bernd

Thierry · Post by **Thierry** » Sat Apr 15, 2023 3:14 pm

bn wrote: ↑
Sat Apr 15, 2023 2:59 pm
Hi Thierry,

What a nice surprise to see you here again.
And what a wonder of Regex you brought along.
It works like a charm. (It even might be a charm).

Vielen Dank, Bernd

You're the winner of the day with a little variant,
euh, a bit more simple if I may say so as I catch the Lab values in one go

Kind regards,
Thierry

Code: Select all


on searchNullCYMK
   local T, theCMYKnullRex, nLine, patchID, lab
   put  field 1 into T
   put  "(?m)^(\d+)\s+(.\d+)" into theCMYKnullRex
   put "(?:\s+0\.00){4}"  after theCMYKnullRex
   // catch Lab in one go:
   put "((?:\s+(-?\d+\.\d+)){3})" after theCMYKnullRex
   if matchText( T, theCMYKnullRex, nLine, patchID, Lab) then
      put "Found: " && nLine, patchID, lab
   else
      put "No match :("
   end if
end searchNullCYMK

SparkOut · Post by **SparkOut** » Sat Apr 15, 2023 3:51 pm

Hi Thierry! Nice to see you here again! Please come back more often

Bruce Brown · Post by **Bruce Brown** » Sun Apr 16, 2023 12:53 am

Well, I think you are all great and I really appreciate the help. I like the regex solution --very nice Thierry.

LiveCode Forums

Finding text without variants

Finding text without variants

Re: Finding text without variants

Re: Finding text without variants

Re: Finding text without variants

Re: Finding text without variants

Re: Finding text without variants

Re: Finding text without variants

Re: Finding text without variants

Re: Finding text without variants

Re: Finding text without variants

Re: Finding text without variants

Re: Finding text without variants