Finding text without variants
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller
-
- VIP Livecode Opensource Backer
- Posts: 13
- Joined: Mon Aug 26, 2013 6:49 pm
Finding text without variants
I am parsing a CGATS text file (color measurement file). If you look at the header we have "BEGIN_DATA_FORMAT" and further down we have "BEGIN_DATA". I want to skip the first one (..._FORMAT) and go directly to the second one. I have tried the various Find options but it always selects the first one. If I search again I get the second one as expected but why can't I just find "BEGIN_DATA"? I even tried to find "BEGIN_DATA " (with a space) but still, it selects the BEGIN_DATA in "BEGIN_DATA_FORMAT" first. Thanks in advance for your help.
Here is a snip-it of the header and the beginning of the data:
LOGO_ECI2002
ORIGINATOR "COLOR TUNER"
DESCRIPTOR "Output Characterisation"
CREATED 08/12/2021 12:41:13
ILLUMINATION_NAME D50
OBSERVER_ANGLE "2"
DEVICE "Eye-One Pro 3 iO SN Eye-One Pro 3 iO 50364 Eye-One Pro 3 2000669"
CONDITION "M0"
LGOROWLENGTH 44
NumberOfStrips 2
KEYWORD "SAMPLE_NAME"
NUMBER_OF_FIELDS 45
BEGIN_DATA_FORMAT
SampleID SAMPLE_NAME CMYK_C CMYK_M CMYK_Y CMYK_K LAB_L LAB_A LAB_B nm380 nm390 nm400 nm410 nm420 nm430 nm440 nm450 nm460 nm470 nm480 nm490 nm500 nm510 nm520 nm530 nm540 nm550 nm560 nm570 nm580 nm590 nm600 nm610 nm620 nm630 nm640 nm650 nm660 nm670 nm680 nm690 nm700 nm710 nm720 nm730
END_DATA_FORMAT
NUMBER_OF_SETS 1628
BEGIN_DATA
1 A1 85.00 55.00 85.00 0.00 24.86 -20.46 1.36 0.0171 0.0218 0.0297 0.0309 0.0349 0.0351 0.0352 0.0352 0.0355 0.0378 0.0455 0.0659 0.0928 0.0952 0.0746 0.0593 0.0532 0.0435 0.0342 0.0316 0.0309 0.0284 0.0244 0.0213 0.0204 0.0207 0.0224 0.0262 0.0313 0.0339 0.0328 0.03 0.0263 0.0253 0.0317 0.0485
2 A2 100.00 40.00 100.00 0.00 23.14 -41.30 4.11 0.0119 0.0149 0.0196 0.0216 0.0224 0.0227 0.0227 0.023 0.0236 0.0261 0.0356 0.0631 0.1068 0.1238 0.102 0.0721 0.0518 0.0355 0.0214 0.0138 0.0109 0.0097 0.0088 0.0082 0.008 0.0081 0.0085 0.0098 0.0116 0.0126 0.0121 0.0111 0.0099 0.0102 0.0125 0.0191
Here is a snip-it of the header and the beginning of the data:
LOGO_ECI2002
ORIGINATOR "COLOR TUNER"
DESCRIPTOR "Output Characterisation"
CREATED 08/12/2021 12:41:13
ILLUMINATION_NAME D50
OBSERVER_ANGLE "2"
DEVICE "Eye-One Pro 3 iO SN Eye-One Pro 3 iO 50364 Eye-One Pro 3 2000669"
CONDITION "M0"
LGOROWLENGTH 44
NumberOfStrips 2
KEYWORD "SAMPLE_NAME"
NUMBER_OF_FIELDS 45
BEGIN_DATA_FORMAT
SampleID SAMPLE_NAME CMYK_C CMYK_M CMYK_Y CMYK_K LAB_L LAB_A LAB_B nm380 nm390 nm400 nm410 nm420 nm430 nm440 nm450 nm460 nm470 nm480 nm490 nm500 nm510 nm520 nm530 nm540 nm550 nm560 nm570 nm580 nm590 nm600 nm610 nm620 nm630 nm640 nm650 nm660 nm670 nm680 nm690 nm700 nm710 nm720 nm730
END_DATA_FORMAT
NUMBER_OF_SETS 1628
BEGIN_DATA
1 A1 85.00 55.00 85.00 0.00 24.86 -20.46 1.36 0.0171 0.0218 0.0297 0.0309 0.0349 0.0351 0.0352 0.0352 0.0355 0.0378 0.0455 0.0659 0.0928 0.0952 0.0746 0.0593 0.0532 0.0435 0.0342 0.0316 0.0309 0.0284 0.0244 0.0213 0.0204 0.0207 0.0224 0.0262 0.0313 0.0339 0.0328 0.03 0.0263 0.0253 0.0317 0.0485
2 A2 100.00 40.00 100.00 0.00 23.14 -41.30 4.11 0.0119 0.0149 0.0196 0.0216 0.0224 0.0227 0.0227 0.023 0.0236 0.0261 0.0356 0.0631 0.1068 0.1238 0.102 0.0721 0.0518 0.0355 0.0214 0.0138 0.0109 0.0097 0.0088 0.0082 0.008 0.0081 0.0085 0.0098 0.0116 0.0126 0.0121 0.0111 0.0099 0.0102 0.0125 0.0191
**************
Bruce
Bruce
Re: Finding text without variants
Hi Bruce,
don't use FIND!
Use "lineoffset" and "set the wholematches" to true first:
WHOLEMATCHES will make sure that the following LINEOFFSET will find the EXACT phrase "BEGIN_DATA" and no variants of it!
Hope that helps! Come back if you have more (detailed) questions.
Best
Klaus
don't use FIND!
Use "lineoffset" and "set the wholematches" to true first:
Code: Select all
...
set the wholematches to TRUE
put lineoffset("BEGIN_DATA",your_data_here) into the_line_to_begin_with
## Then maybe:
put line (the_line_to_begin_with + 1) to -1 of your_data_here into the_data_containing_the_info_you_are_looking_for
## Will put the last 4 lines (in your example) into the variable with the meaningful name :-)
## You get the picture
...
Hope that helps! Come back if you have more (detailed) questions.
Best
Klaus
-
- VIP Livecode Opensource Backer
- Posts: 13
- Joined: Mon Aug 26, 2013 6:49 pm
Re: Finding text without variants
Thank you Klaus. Despite me being a supporter of LiveCode since 2013 I really have not done much with it. My old age (63) and time to "play" has limited my programming skills and knowledge. Anyway, I am playing with what you have sent me and I will figure it out. My ultimate goal is to find in the measurement data with cmyk values = to zero and then what the lab value is. In the file I took a snip-it of, it is the line (not shown initially):
73 B29 0.00 0.00 0.00 0.00 95.69 0.45 -2.36 0.1221 0.1744 0.3934 0.5799 0.8883 0.9495 0.9643 0.9469 0.9299 0.9255 0.9175 0.9073 0.9031 0.9002 0.8992 0.896 0.8929 0.8896 0.8857 0.8871 0.8858 0.8864 0.8874 0.8891 0.8913 0.8932 0.8966 0.8988 0.9009 0.9023 0.9024 0.902 0.9037 0.9038 0.9012 0.8953
it is shown with text wrapping so you get the spectral values but it is line 73 of measurement and patch ID B29 the lab value is L95.69 a0.45 b-2.36
The lab value is the white point - basically the lab value of paper color. This project will grow from here but first baby steps.
BTW: cmyk = printing inks Cyan Magenta Yellow and Black
73 B29 0.00 0.00 0.00 0.00 95.69 0.45 -2.36 0.1221 0.1744 0.3934 0.5799 0.8883 0.9495 0.9643 0.9469 0.9299 0.9255 0.9175 0.9073 0.9031 0.9002 0.8992 0.896 0.8929 0.8896 0.8857 0.8871 0.8858 0.8864 0.8874 0.8891 0.8913 0.8932 0.8966 0.8988 0.9009 0.9023 0.9024 0.902 0.9037 0.9038 0.9012 0.8953
it is shown with text wrapping so you get the spectral values but it is line 73 of measurement and patch ID B29 the lab value is L95.69 a0.45 b-2.36
The lab value is the white point - basically the lab value of paper color. This project will grow from here but first baby steps.
BTW: cmyk = printing inks Cyan Magenta Yellow and Black
**************
Bruce
Bruce
Re: Finding text without variants
Hi Bruce,
age is definitively no excuse, I'm 66!
I know what CMYK and LAB etc are, but I'm not sure if your post contains another question?
Are there any priciples in the data that could help to solve your problem?
If yes, just let me know and we will find an easy solution for you.
Best
Klaus
age is definitively no excuse, I'm 66!
I know what CMYK and LAB etc are, but I'm not sure if your post contains another question?
Are there any priciples in the data that could help to solve your problem?
If yes, just let me know and we will find an easy solution for you.
Best
Klaus
-
- VIP Livecode Opensource Backer
- Posts: 4003
- Joined: Sun Jan 07, 2007 9:12 pm
- Location: Bochum, Germany
Re: Finding text without variants
Hi Bruce,
If I understand you correctly you want to extract from the data the lines where CMYK is 0.00 0.00 0.00 and then get the Lab values in your example
95.69 0.45 -2.36
If that is the case you could make a stack with 2 fields and a button. Set the "dontWrap" of field 1 to true and also hScroll.
paste the whole data into field 1
The script of the button could be something like this:
The line number, patch ID and Lab values will be put into field 2
I tested by using the first part of your data and appending the second part (the line with CMYK 0000) to it and hit the button.
If however your actual data is more complicated this might not work but it gives you and idea how to parse your data (assuming that I am not misunderstanding what you are after)
Kind regards
Bernd
If I understand you correctly you want to extract from the data the lines where CMYK is 0.00 0.00 0.00 and then get the Lab values in your example
95.69 0.45 -2.36
If that is the case you could make a stack with 2 fields and a button. Set the "dontWrap" of field 1 to true and also hScroll.
paste the whole data into field 1
The script of the button could be something like this:
Code: Select all
on mouseUp
local tData, tWorkData, tCollect, tLineNum, tSearchZero
put field 1 into tData ## all data
set the wholeMatches to true
put lineOffset("Begin Data", tData) into tLineNum
if tLineNum is empty then
answer "Begin Data not found"
exit mouseUp
end if
set the wholeMatches to false
put line tLineNum + 1 to -1 of tData into tWorkData
put "0.00 0.00 0.00 0.00" into tSearchZero
repeat for each line aDataPoint in tWorkData
if word 3 to 6 of aDataPoint is tSearchZero then
put "line:" && word 1 of aDataPoint & comma && "Patch ID:" && word 2 of aDataPoint & comma && "L:" && word 7 of aDataPoint \
&& "a:" && word 8 of aDataPoint && "b:" && word 9 of aDataPoint & cr after tCollect
end if
end repeat
put tCollect into field 2
end mouseUp
I tested by using the first part of your data and appending the second part (the line with CMYK 0000) to it and hit the button.
If however your actual data is more complicated this might not work but it gives you and idea how to parse your data (assuming that I am not misunderstanding what you are after)
Kind regards
Bernd
-
- VIP Livecode Opensource Backer
- Posts: 13
- Joined: Mon Aug 26, 2013 6:49 pm
Re: Finding text without variants
Hi Bernd and Klaus,
You both helped me so much! Thanks. I almost have it. I will adjust some things this weekend and see if I get it to work completely. I am able to parse the text and remove the space and tab characters between data points but the 3 to 6 word part is not finding the cmyk values.
Here is my code so far for the find white point button.
on mouseUp
local tData, tWorkData, tCollect, tLineNum, tSearchZero
put field "fileIn" into tData ## all data
set the wholeMatches to true
put lineOffset("BEGIN_DATA", tData) into tLineNum
if tLineNum is empty then
answer "BEGIN_DATA not found"
exit mouseUp
end if
set the wholeMatches to false
put line tLineNum + 1 to -1 of tData into tWorkData
put replacetext( tWorkData, " *","") into tWorkData
put replaceText(tWorkData,tab," ") into field csvData
-- originally I had above line replace the white space (tab) with a comma but that did not work (hence variable csvData)
-- however it still does not put lab value in field whtPtLab. I set it back to space between values but also not working
put "0.00 0.00 0.00 0.00" into tSearchZero
repeat for each line aDataPoint in tWorkData
if word 3 to 6 of aDataPoint is tSearchZero then
answer sDataPoint
put "line:" && word 1 of aDataPoint & comma && "Patch ID:" && word 2 of aDataPoint & comma && "L:" && word 7 of aDataPoint \
&& "a:" && word 8 of aDataPoint && "b:" && word 9 of aDataPoint & cr after tCollect
end if
end repeat
put tCollect into field "whtPtLab"
end mouseUp
You both helped me so much! Thanks. I almost have it. I will adjust some things this weekend and see if I get it to work completely. I am able to parse the text and remove the space and tab characters between data points but the 3 to 6 word part is not finding the cmyk values.
Here is my code so far for the find white point button.
on mouseUp
local tData, tWorkData, tCollect, tLineNum, tSearchZero
put field "fileIn" into tData ## all data
set the wholeMatches to true
put lineOffset("BEGIN_DATA", tData) into tLineNum
if tLineNum is empty then
answer "BEGIN_DATA not found"
exit mouseUp
end if
set the wholeMatches to false
put line tLineNum + 1 to -1 of tData into tWorkData
put replacetext( tWorkData, " *","") into tWorkData
put replaceText(tWorkData,tab," ") into field csvData
-- originally I had above line replace the white space (tab) with a comma but that did not work (hence variable csvData)
-- however it still does not put lab value in field whtPtLab. I set it back to space between values but also not working
put "0.00 0.00 0.00 0.00" into tSearchZero
repeat for each line aDataPoint in tWorkData
if word 3 to 6 of aDataPoint is tSearchZero then
answer sDataPoint
put "line:" && word 1 of aDataPoint & comma && "Patch ID:" && word 2 of aDataPoint & comma && "L:" && word 7 of aDataPoint \
&& "a:" && word 8 of aDataPoint && "b:" && word 9 of aDataPoint & cr after tCollect
end if
end repeat
put tCollect into field "whtPtLab"
end mouseUp
- Attachments
-
- cmf_03.zip
- cgats file sample
- (170.7 KiB) Downloaded 55 times
**************
Bruce
Bruce
-
- VIP Livecode Opensource Backer
- Posts: 4003
- Joined: Sun Jan 07, 2007 9:12 pm
- Location: Bochum, Germany
Re: Finding text without variants
Hi Bruce,
Your actual data is a bit different from the first snippets that you posted.
As you noticed in your code:
Tab is the column delimiter and CYMK and Lab values have leading zeroes for 7 characters and wavelengths have trailing spaces for 6 characters.
I changed the code by removing all spaces, leaving tabs as column delimiter. Then I set the itemDelimiter in LC to tab and change the search string replacing space with tab.
This finds 1 occurence where all 4 values of CYMK are 0.00 at line 49.
Please note that I now operate on "items" which are set to "tab" and that I adjusted tSearchZero accordingly.
Kind regards
Bernd
Your actual data is a bit different from the first snippets that you posted.
As you noticed in your code:
Tab is the column delimiter and CYMK and Lab values have leading zeroes for 7 characters and wavelengths have trailing spaces for 6 characters.
I changed the code by removing all spaces, leaving tabs as column delimiter. Then I set the itemDelimiter in LC to tab and change the search string replacing space with tab.
This finds 1 occurence where all 4 values of CYMK are 0.00 at line 49.
Please note that I now operate on "items" which are set to "tab" and that I adjusted tSearchZero accordingly.
Code: Select all
on mouseUp
local tData, tWorkData, tCollect, tLineNum, tSearchZero
put field 1 into tData ## all data
set the wholeMatches to true
put lineOffset("Begin Data", tData) into tLineNum
if tLineNum is empty then
answer "Begin Data not found"
exit mouseUp
end if
set the wholeMatches to false
put line tLineNum + 1 to -1 of tData into tWorkData
replace space with empty in tWorkData
put "0.00 0.00 0.00 0.00" into tSearchZero
replace space with tab in tSearchZero
set the itemDelimiter to tab ## operate on tab delimited items
repeat for each line aDataPoint in tWorkData
if item 3 to 6 of aDataPoint is tSearchZero then
put "line:" && item 1 of aDataPoint & comma && "Patch ID:" && item 2 of aDataPoint & comma && "L:" && item 7 of aDataPoint \
&& "a:" && item 8 of aDataPoint && "b:" && item 9 of aDataPoint & cr after tCollect
end if
end repeat
put tCollect into field 2
end mouseUp
Bernd
Re: Finding text without variants
Hi regex Afficionadosbn wrote: ↑Sat Apr 15, 2023 8:27 amHi Bruce,
Your actual data is a bit different from the first snippets that you posted.
As you noticed in your code:
Tab is the column delimiter and CYMK and Lab values have leading zeroes for 7 characters and wavelengths have trailing spaces for 6 characters.
I changed the code by removing all spaces, leaving tabs as column delimiter. Then I set the itemDelimiter in LC to tab and change the search string replacing space with tab.
This finds 1 occurence where all 4 values of CYMK are 0.00 at line 49.
Please note that I now operate on "items" which are set to "tab" and that I adjusted tSearchZero accordingly.
Bernd
Out of nostalgia, I came here by chance and following Bernd comments,
here is another way to solve this problem:
Code: Select all
on searchNullCYMK
local T, theCMYKnullRex, nLine, patchID, _L, _a, _b
set the wholeMatches to true
put field 1 into T
put line lineOffset("BEGIN_DATA", T) + 1 to -1 of T into T
// catch nLine and patchID
put "(?m)^(\d+)\s+(.\d+)" into theCMYKnullRex
// null CYMK pattern
put "(?:\s+0\.00){4}" after theCMYKnullRex
// catch L, a , b
put "\s+(-?\d+\.\d+)\s+(-?\d+\.\d+)\s+(-?\d+\.\d+)" after theCMYKnullRex
if matchText( T, theCMYKnullRex, nLine, patchID, _L, _a, _b) then
put "Found: " && nLine, patchID, _L, _a, _b
else
put "No match :("
end if
end searchNullCYMK
Hoping a couple of you could appreciate this input....
Kind regards,
Thierry
Last edited by Thierry on Mon Jun 26, 2023 6:59 am, edited 1 time in total.
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!
-
- VIP Livecode Opensource Backer
- Posts: 4003
- Joined: Sun Jan 07, 2007 9:12 pm
- Location: Bochum, Germany
Re: Finding text without variants
Hi Thierry,
What a nice surprise to see you here again. And what a wonder of Regex you brought along. It works like a charm. (It even might be a charm).
Thank you very much.
Kind regards
Bernd
What a nice surprise to see you here again. And what a wonder of Regex you brought along. It works like a charm. (It even might be a charm).
Thank you very much.
Kind regards
Bernd
Re: Finding text without variants
Vielen Dank, Bernd
You're the winner of the day with a little variant,
euh, a bit more simple if I may say so as I catch the Lab values in one go
Kind regards,
Thierry
Code: Select all
on searchNullCYMK
local T, theCMYKnullRex, nLine, patchID, lab
put field 1 into T
put "(?m)^(\d+)\s+(.\d+)" into theCMYKnullRex
put "(?:\s+0\.00){4}" after theCMYKnullRex
// catch Lab in one go:
put "((?:\s+(-?\d+\.\d+)){3})" after theCMYKnullRex
if matchText( T, theCMYKnullRex, nLine, patchID, Lab) then
put "Found: " && nLine, patchID, lab
else
put "No match :("
end if
end searchNullCYMK
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!
Re: Finding text without variants
Hi Thierry! Nice to see you here again! Please come back more often
-
- VIP Livecode Opensource Backer
- Posts: 13
- Joined: Mon Aug 26, 2013 6:49 pm
Re: Finding text without variants
Well, I think you are all great and I really appreciate the help. I like the regex solution --very nice Thierry.
**************
Bruce
Bruce