[SOLVED] Getting Index of Characters

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Xero
Posts: 152
Joined: Sat Jun 23, 2018 2:22 pm

[SOLVED] Getting Index of Characters

Post by Xero » Fri Dec 02, 2022 7:45 am

Hi all,
I am writing some script to search through a set of text divided into paragraphs.
It searches for each character of the alphabet and returns a value that states the character, the paragraph number and the character number (i.e. a=1,2 is stating that a is the second character of the first paragraph). I have scripted it to run through a selected number of times and list them by character, i.e.:
a=1,2 a=2,5 a=3,7
b=2,8 b=2,50, b=3,1
etc.
All this is working well. Box ticked.
However, for the lesser characters, (q, z, etc) I am getting repeated answers, such as q=1,200 q=1,200 q=1,200.
What I want is where a character only appears once, that only one entry will occur.
In my method below, I go through, find the character, index it, then replace it with a ? then repeat, so when it repeats, it can't find the character because it's not there, but a placeholder character is there so it gets counted for the other characters. In theory, if the script fails to find a character, it should just jump out of the script, not repeat the output. It currently doesn't output anything for Z in my sample text because it doesn't exist in the text. But only where it appears at some stage will it repeat the output.
Hope that makes sense...
What is happening? Where did I go wrong?
Thanks in advance team.

Code: Select all

on getNumbers
   --clear out the output field--
   put empty into fld "Table"
   --put the sample text into a container--
   Put Fld "InputText" into tText
   --delete all common punctuation--
   replace space with empty in tText
   replace "." with empty in tText
   replace "," with empty in tText
   replace ";" with empty in tText
   replace "?" with empty in tText
   replace "-" with empty in tText
   replace "!" with empty in tText
   replace quote with empty in tText
   replace "'" with empty in tText
   --establish alphabet to use... is there a better way of doing this?--
   put "abcdefghijklmnopqrstuvwxyz" into tAlphabet
  --work out how many entries to output--
   Put Field "NoOfTimes" into z
   --go through alphabet one letter at a time--
   repeat for each char y in tAlphabet
   --do the search for amount of times of the entries required--
      repeat z times
      --work out what the character number is--
         get Offset(y,tText)
         --if there isn't an instance of the character, exit--
         if it = 0 then exit repeat
         --hold the character index number for use in output--
         put it into tCharOffset
         --repeat the above process to find the paragraph index--
         get paragraphOffset(y,tText)
         if it = 0 then exit repeat
         put it into tParaOffset
         --remove the character so it doesn't get searched again in the next repeat and replace with a placeholder so it gets counted--
         replace y with "?" in character tCharOffset of paragraph tParaOffset in tText
         --output the character, paragraph number and character number--
         put space & y & "=" & tParaOffset & comma & tCharOffset after tOutput
      end repeat
      --put a return after the series of output for each character--
      put cr after tOutput
   end repeat
   --delete rogue character at start--
   delete character 1 of tOutput
   --show all of the output in user-readable form--
   put tOutput into fld "Table"
end getNumbers
Last edited by Xero on Fri Dec 02, 2022 1:08 pm, edited 1 time in total.

stam
Posts: 2679
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: Getting Index of Characters

Post by stam » Fri Dec 02, 2022 11:00 am

At work, so don't have time to tinker but briefly:

Why are you excluding punctuation marks? this will make the text position incorrect and not sure it helps...

Why not split the text into paragraphs and sub search each? (ie. repeat loop iterating through paragraphs). Rather than putting the whole text into a container, I'd just interate through each paragraph, put that into a container and search like you do, using offset

I think the offset function has a 3rd parameter - 'chars to skip' or some such. So instead of replacing a char just set that to your current offset in a loop?

Will tinker when I have time, unless someone else does it first ;)
S.

Xero
Posts: 152
Joined: Sat Jun 23, 2018 2:22 pm

Re: Getting Index of Characters

Post by Xero » Fri Dec 02, 2022 11:19 am

Thanks Stam. There is a purpose behind not counting the punctuation marks. It's for a certain type of cipher that uses only the letters. I guess I could exclude them in the offset function (I did consider that).
As for searching by split paragraphs, I don't want to search the paragraphs per se. I want to work out where the first 3 instances of each letter. So "e" will highly likely be in the first paragraph 3 times, so no need to search the next paragraph. It will just stop searching whenever it gets to 3.
So far, that side of things is working well, it's just the repeating indexes of low-frequency letters that is bugging (pun intended) me.
X

SparkOut
Posts: 2852
Joined: Sun Sep 23, 2007 4:58 pm

Re: Getting Index of Characters

Post by SparkOut » Fri Dec 02, 2022 12:27 pm

I haven't had a proper look at your code to understand what you are doing, but a one line check "if tNeedle is not in tHaystack then..." will maybe give you a way to skip unnecessary searches.

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Getting Index of Characters

Post by Thierry » Fri Dec 02, 2022 12:54 pm

Hi Xero,

Replace this line:

replace y with "?" in character tCharOffset of paragraph tParaOffset in tText

with:

replace y with "?" in character tCharOffset in tText

and I guess it should work better...
Your original line will work only and only if tParaOffset is 1.

and for fun all these lines:

replace space with empty in tText
replace "." with empty in tText
replace "," with empty in tText
replace ";" with empty in tText
replace "?" with empty in tText
replace "-" with empty in tText
replace "!" with empty in tText
replace quote with empty in tText
replace "'" with empty in tText


could be replaced by:

put replaceText(tText,"[\x22 .,;'?-]", empty ) into tText



Disclaimer: Not tested and didn't write a line of Livecode since more than a year...

Regards,

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

Xero
Posts: 152
Joined: Sat Jun 23, 2018 2:22 pm

Re: Getting Index of Characters

Post by Xero » Fri Dec 02, 2022 1:07 pm

Thanks Thierry.
I get why the charOffset works, but not quite sure why the paragraph number will only work when it is 1, but the code now works. I have a good list of only one instance. Guess I was getting too fancy!
X

Xero
Posts: 152
Joined: Sat Jun 23, 2018 2:22 pm

Re: [SOLVED] Getting Index of Characters

Post by Xero » Fri Dec 02, 2022 1:12 pm

Refer to Stam's post below for working code.
Last edited by Xero on Sun Dec 04, 2022 10:32 pm, edited 2 times in total.

stam
Posts: 2679
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: Getting Index of Characters

Post by stam » Fri Dec 02, 2022 7:42 pm

Xero wrote:
Fri Dec 02, 2022 11:19 am
Thanks Stam. There is a purpose behind not counting the punctuation marks. It's for a certain type of cipher that uses only the letters. I guess I could exclude them in the offset function (I did consider that).
As for searching by split paragraphs, I don't want to search the paragraphs per se. I want to work out where the first 3 instances of each letter. So "e" will highly likely be in the first paragraph 3 times, so no need to search the next paragraph. It will just stop searching whenever it gets to 3.
So far, that side of things is working well, it's just the repeating indexes of low-frequency letters that is bugging (pun intended) me.
X
Hey Xero,
just to clarify - do you actually need the paragraph data? or just the offset data?
If you don't really need the paragraph, then the simple function below returns an array with all the occurrences for each letter:

Code: Select all

function getOffsetsOfLetters pText
    local tChar, tOffset, tSkip, tArray, x
    repeat with i = 1 to 26
        put numToCodepoint(64 + i) into tChar // add i to 64 to get the ascii for capital letters A-Z
        put empty into tSkip
        put 0 into x
        repeat 
            put offset(tChar, pText, tSkip ) into tOffset
            if tOffset = 0 or tOffset is empty then exit repeat 
            add 1 to x
            put tOffset + tSkip into tArray[tChar][x]
            put tOffset + tSkip into tSkip
            put 0 into tOffset
        end repeat
    end repeat
    return tArray
end getOffsetsOfLetters
An example stack that shows the letter, it's each occurrence and it's offset in the text (as the function returns an array, I just used a tree view widget to dump the array in for display:
letterOccurences.jpg
Not sure if that's any help to you - but perhaps the code can be modified to your needs?
S.
Attachments
OccurenesOfLetter.livecode.zip
(3.1 KiB) Downloaded 53 times

stam
Posts: 2679
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: Getting Index of Characters

Post by stam » Fri Dec 02, 2022 7:47 pm

Thierry wrote:
Fri Dec 02, 2022 12:54 pm
put replaceText(tText,"[\x22 .,;'?-]", empty ) into tText
Ahhhh welcome back Thierry! Very long time no see -- we missed your contributions!

bn
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3999
Joined: Sun Jan 07, 2007 9:12 pm
Location: Bochum, Germany

Re: Getting Index of Characters

Post by bn » Fri Dec 02, 2022 11:56 pm

Thierry wrote:
Fri Dec 02, 2022 12:54 pm
Disclaimer: Not tested and didn't write a line of Livecode since more than a year...
Regards,
Thierry
Salut Thierry,

nice to see you posting.

Kind regards
Bernd

SparkOut
Posts: 2852
Joined: Sun Sep 23, 2007 4:58 pm

Re: [SOLVED] Getting Index of Characters

Post by SparkOut » Sat Dec 03, 2022 12:36 am

Hi Thierry!
Nice to see you again :D

Xero
Posts: 152
Joined: Sat Jun 23, 2018 2:22 pm

Re: [SOLVED] Getting Index of Characters

Post by Xero » Sat Dec 03, 2022 2:06 am

Thanks Stam, but yes, I do need the paragraph data as well, so the first letter of the second paragraph would be retuned as 2,1, not just 400.
After reviewing the stack with a different sample, I have realised that I don't even get that, I am getting the paragraph number, but the character number as if it was all one block. WIll need to tinker a little more to possibly do it paragraph-by-paragraph.
X

stam
Posts: 2679
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: [SOLVED] Getting Index of Characters

Post by stam » Sat Dec 03, 2022 2:39 am

Xero wrote:
Sat Dec 03, 2022 2:06 am
Thanks Stam, but yes, I do need the paragraph data as well, so the first letter of the second paragraph would be retuned as 2,1, not just 400.
After reviewing the stack with a different sample, I have realised that I don't even get that, I am getting the paragraph number, but the character number as if it was all one block. WIll need to tinker a little more to possibly do it paragraph-by-paragraph.
X
Well, it is a simple matter of adding an outer loop to pass a paragraph at a time, assuming you want the offset relative to the paragraph.

Maybe this helps or maybe not - just a slightly different approach (and also a way to avoid putting the alphabet into a variable ;) )
You probably don't have use for this as you have solved your issue - but to replicate this with my method:

Code: Select all

function paragraphAndText pText
    local tChar, tOffset, tSkip, tArray, tText
    repeat with x = 1 to the number of paragraphs of pText
        put paragraph x of pText into tText
        repeat with i = 1 to 26
            put numToCodepoint(64 + i) into tChar // add i to 64 to get the ascii for capital letters A-Z
            put empty into tSkip
            repeat 
                put offset(tChar, tText, tSkip ) into tOffset
                if tOffset = 0 or tOffset is empty then exit repeat 
                put x & comma & tOffset + tSkip into tArray[tChar][the number of lines of the keys of tArray[tChar] + 1]
                put tOffset + tSkip into tSkip
                put 0 into tOffset
            end repeat
        end repeat
    end repeat
    return tArray
end paragraphAndText
updated stack:
paragraph and offset.jpg
Attachments
OccurencesOfLetterInParagraph.livecode.zip
(3.15 KiB) Downloaded 57 times

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: [SOLVED] Getting Index of Characters

Post by Thierry » Sat Dec 03, 2022 12:30 pm

Hi Thierry!
Nice to see you again :D
Thanks Stam, Bernd and Sparkout for your welcoming :)

Actually, I did post because I edit my signature to inform Livecoders
that someone did stole my identity on sunny-tdz.com.

I did send a message to the ISP abuse service...
let's see what will happen next.

If someone want to contact me,
please use the private message in these forums.

Wish you all a nice week-end,

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Getting Index of Characters

Post by Thierry » Sat Dec 03, 2022 4:20 pm

Xero wrote:
Fri Dec 02, 2022 1:07 pm
Thanks Thierry.
I get why the charOffset works, but not quite sure why the paragraph number will only work when it is 1.
It will work on paragraph 1 because the offsets you are catching with offset(),
are based on the 1st character of your text,
the same as the first character of paragraph 1.

If it's not clear enough, here are 2 tests with your code.
The first works fine, the second not.

all in paragraph 1.jpg

all in paragraph 2.jpg

Do the replacement I've told you in my previous post, and both will work.

HTH, Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

Post Reply

Return to “Getting Started with LiveCode - Complete Beginners”