How to sort out this

alemrantareq · Post by **alemrantareq** » Thu Jul 19, 2018 5:54 pm

infinite thanks and gratitude to all who spent time and tried to provide a solution to my problem. Livecode has such a great support family who made my work with livecode proud so far.

also making apologize for being so much delayed in providing my result. my sickness due to low blood pressure sent me to bed rest for couple of months out of my town with my parents. by the grace of Almighty I'm recovered and back to my work

bwmilby wrote: ↑
Tue Mar 13, 2018 11:10 pm
You need a repeat loop before the filter. Loop through each line and record the text before the @. Check to see if the current line text is in the list. If so, exit the loop. Use the found text to filter the list.

I tested the below on the sample that you provided and it seemed to give the expected result.

########CODE to copy and paste with you mouse#######
on mouseUp
local tList, tKey, tKeys
set the itemDel to "@"
put the text of field "Source" into tList
repeat for each line tLine in tList
put (item 1 of tLine) & "@" into tKey
if lineOffset(tKey, tKeys) > 0 then exit repeat
put cr & tKey after tKeys
end repeat
set the text of field "Key" to tKey
filter tList without (tKey & "*")
set the text of field "Destination" to tList
end mouseUp
#####END OF CODE generated by this livecode app: http://tinyurl.com/j8xf3xq with livecode 9.0.0-dp-11#####

Edit 1: Add code
Edit 2: Add image
Edit 3: The code does not ensure that there was a duplicate line, so if none were found it would remove the last line of the file. That would be an easy thing to add though.

HTSOT.JPG

hello bwmilby, your code given me almost my expected output; the only thing that lacks your code is, it just removes the lines containing only one tKey, doesn't repeat for the rest tList.

here's the sample list that I wanna filter out:

matsas007@server1:info3143
phamminhman1@server5:info3605
devilcry95@server4:info8051
boysatthu98@server1:info316515
devilcry95@server4:info36054
thedevilcry95@server8:info50289
phamminhman1@server1:info4441
devilcry537@server7:info3656
xuanphungbd@server3:info51343
devilcry95@server2:info78514
44boysatthu98@server5:info96315
tuananhhau@server1:info53476
boysatthu98@server6:info316515

using your code, I get the following result:

matsas007@server1:info3143
phamminhman1@server5:info3605
boysatthu98@server1:info316515
thedevilcry95@server8:info50289
phamminhman1@server1:info4441
devilcry537@server7:info3656
xuanphungbd@server3:info51343
44boysatthu98@server5:info96315
tuananhhau@server1:info53476
boysatthu98@server6:info316515

but the actual result I want is:

matsas007@server1:info3143
thedevilcry95@server8:info50289
devilcry537@server7:info3656
xuanphungbd@server3:info51343
44boysatthu98@server5:info96315
tuananhhau@server1:info53476

I've tried in may ways to repeat the function, but failed to make it in proper way. So either you or anyone can fulfill the quest would be so much appreciated. Loving livecode (lifecode for me as it gives life to coding)

bwmilby · Post by **bwmilby** » Thu Jul 19, 2018 11:51 pm

Okay... I didn't realize there would be more than one key duplicated. Here is revised code. You have to do 2 passes. The first looks for the duplicates, the second removes them.

########CODE to copy and paste with your mouse#######
on mouseUp
local tList, tKey, tKeys, tDupes
set the itemDel to "@"
put the text of field "Source" into tList
-- look at each line in the original data
repeat for each line tLine in tList
put (item 1 of tLine) & "@" into tKey
-- check to see if the key has been seen
if lineOffset(tKey, tKeys) > 0 then
-- check to see if the dupe key has already been seen
if lineOffset(tKey, tDupes) = 0 then
-- record the dupe key
put tKey & cr after tDupes
end if
else
-- record the unseen key
put tKey & cr after tKeys
end if
end repeat
-- remove trailing cr
delete char -1 of tDupes
-- go through list of dupes and remove each
repeat for each line tKey in tDupes
filter tList without (tKey & "*")
end repeat
set the text of field "Destination" to tList
end mouseUp
########END OF CODE generated by this livecode app: http://tinyurl.com/j8xf3xq ########
########Code tested with livecode 9.0.1-rc-1########

rkriesel · Post by **rkriesel** » Fri Jul 20, 2018 6:07 am

alemrantareq wrote: ↑
Thu Jul 19, 2018 5:54 pm
...
but the actual result I want is:
matsas007@server1:info3143
thedevilcry95@server8:info50289
devilcry537@server7:info3656
xuanphungbd@server3:info51343
44boysatthu98@server5:info96315
tuananhhau@server1:info53476
...

Hi, almenrantareq. Your earlier description suggests you'd also want these:

boysatthu98@server6:info316515
devilcry95@server2:info78514
phamminhman1@server1:info4441

Why don't you want these too?
If you do want these too, then the code I offered earlier works.
-- Dick

alemrantareq · Post by **alemrantareq** » Fri Jul 20, 2018 7:19 am

bwmilby wrote: ↑
Thu Jul 19, 2018 11:51 pm
Okay... I didn't realize there would be more than one key duplicated. Here is revised code. You have to do 2 passes. The first looks for the duplicates, the second removes them.

########CODE to copy and paste with your mouse#######
on mouseUp
local tList, tKey, tKeys, tDupes
set the itemDel to "@"
put the text of field "Source" into tList
-- look at each line in the original data
repeat for each line tLine in tList
put (item 1 of tLine) & "@" into tKey
-- check to see if the key has been seen
if lineOffset(tKey, tKeys) > 0 then
-- check to see if the dupe key has already been seen
if lineOffset(tKey, tDupes) = 0 then
-- record the dupe key
put tKey & cr after tDupes
end if
else
-- record the unseen key
put tKey & cr after tKeys
end if
end repeat
-- remove trailing cr
delete char -1 of tDupes
-- go through list of dupes and remove each
repeat for each line tKey in tDupes
filter tList without (tKey & "*")
end repeat
set the text of field "Destination" to tList
end mouseUp
########END OF CODE generated by this livecode app: http://tinyurl.com/j8xf3xq ########
########Code tested with livecode 9.0.1-rc-1########

hi bwmilby, you did the exact what I wanted to. I was trying a foolish repeat loop after if lineOffset(tKey, tKeys) > 0 then which giving me unexpected result.

Thanks you again for providing the script in details; without it I couldn't understand how it's gonna work.

Hats off Dear

rkriesel wrote: ↑
Fri Jul 20, 2018 6:07 am

alemrantareq wrote: ↑
Thu Jul 19, 2018 5:54 pm
...
but the actual result I want is:
matsas007@server1:info3143
thedevilcry95@server8:info50289
devilcry537@server7:info3656
xuanphungbd@server3:info51343
44boysatthu98@server5:info96315
tuananhhau@server1:info53476
...
Hi, almenrantareq. Your earlier description suggests you'd also want these:
boysatthu98@server6:info316515
devilcry95@server2:info78514
phamminhman1@server1:info4441
Why don't you want these too?
If you do want these too, then the code I offered earlier works.
-- Dick

hi rkriesel, I've tried your script too. here's the code I applied:

Code: Select all

on mouseUp
   answer file "Select the File:" with type "Text File|txt"
   put url ("file:" & it) into Temp
   set the text of fld "f1" to digestRecords(Temp)
end mouseUp

function digestRecords t
   split t using return and "@"
   delete variable t["rawdata"]
   combine t using return and "@"
   return t
end digestRecords

and got this result:

44boysatthu98@server5:info96315
boysatthu98@server6:info316515
devilcry537@server7:info3656
devilcry95@server2:info78514
matsas007@server1:info3143
phamminhman1@server1:info4441
thedevilcry95@server8:info50289
tuananhhau@server1:info53476
xuanphungbd@server3:info51343

your script is also a good one if I want to keep the strings unique before ":".

I believe I'm gonna need this for my later project. Thanks you too, specially for making the script tiny and simple (although my brain not gonna catch up how this "split and return" thing works :p)

jacque · Post by **jacque** » Fri Jul 20, 2018 5:07 pm

I was thinking of using arrays too but alone it doesn't identify the duplicates string. So I'd combine the repeat loop and the array method. The loop identifies the repeated string and the array removes all instances.

I'd have to check, but I think using an array will also solve the problem where no repeats are found, as it should ignore an empty variable key.

alemrantareq · Post by **alemrantareq** » Sat Jul 21, 2018 6:11 am

jacque wrote: ↑
Fri Jul 20, 2018 5:07 pm
I was thinking of using arrays too but alone it doesn't identify the duplicates string. So I'd combine the repeat loop and the array method. The loop identifies the repeated string and the array removes all instances.

I'd have to check, but I think using an array will also solve the problem where no repeats are found, as it should ignore an empty variable key.

hi jacque, it also came in my mind if its possible to do it with array mthod, cause I believe the array method will make the procedure faster than repeat loop. any script from you will be more than welcome

jacque · Post by **jacque** » Sat Jul 21, 2018 8:40 pm

This is a variation of Dick's method using only arrays.

Code: Select all

on trimList
  put fld 1 into tList
  set the itemDel to "@"
  repeat for each line l in tList
    add 1 to tNames[item 1 of l]
  end repeat
  repeat for each key k in tNames
    if tNames[k] > 1 then
      delete variable tNames[k]
    end if
  end repeat
  put keys(tNames) into fld 2
end trimList

If you trace this in the debugger you can see what is happening.

bwmilby · Post by **bwmilby** » Sat Jul 21, 2018 10:07 pm

I'm not sure that produces the desired result. This does though:

Code: Select all

on mouseUp
   local tList, tKey, tKeys, tStart
   set the itemDel to "@"
   put the millisecs into tStart
   repeat with i=1 to 10000
      put the text of field "Source" into tList
      -- look at each line in the original data
      repeat for each line tLine in tList
         put item 1 of tLine into tKey
         put tLine into tKeys[tKey]["Data"]
         add 1 to tKeys[tKey]["Count"]
      end repeat
      -- go through array and remove dupes
      put empty into tList
      repeat for each key tKey in tKeys
         if tKeys[tKey]["Count"] = 1 then
            put tKeys[tKey]["Data"] & cr after tList
         end if
      end repeat
      delete variable tKeys
   end repeat
   delete char -1 of tList
   set the text of field "Destination" to tList
   put cr & the millisecs - tStart after field "Destination"
end mouseUp

At 10000 iterations, my solution seem to be 20-30ms faster which isn't enough to make a difference I don't guess. The issue is needing to retain the entire original line and not just the key portion.

Made me think though... if you used 2 arrays, then you could leverage combine again:

Code: Select all

on mouseUp
   local tList, tKey, tLines, tCounts, tStart
   set the itemDel to "@"
   put the millisecs into tStart
   repeat with i=1 to 10000
      put empty into tLines
      put the text of field "Source" into tList
      -- look at each line in the original data
      repeat for each line tLine in tList
         put item 1 of tLine into tKey
         put item 2 of tLine into tLines[tKey]
         add 1 to tCounts[tKey]
      end repeat
      -- go through array and remove dupes
      repeat for each key tKey in tLines
         if tCounts[tKey] > 1 then
            delete variable tLines[tKey]
         end if
      end repeat
      delete variable tCounts
      combine tLines using return and "@"
   end repeat
   set the text of field "Destination" to tLines
   put cr & the millisecs - tStart after field "Destination"
end mouseUp

This actually turns out to be slightly faster in a couple of test runs.

alemrantareq · Post by **alemrantareq** » Sun Jul 22, 2018 12:30 pm

jacque wrote: ↑
Sat Jul 21, 2018 8:40 pm
This is a variation of Dick's method using only arrays.
Code: Select all
on trimList
  put fld 1 into tList
  set the itemDel to "@"
  repeat for each line l in tList
    add 1 to tNames[item 1 of l]
  end repeat
  repeat for each key k in tNames
    if tNames[k] > 1 then
      delete variable tNames[k]
    end if
  end repeat
  put keys(tNames) into fld 2
end trimList
If you trace this in the debugger you can see what is happening.

hi jacque, thanks for your script. I've tried it out and here's the result:
1. missing item 2 in the output
2. skips many lines that should be enlisted in the output

------------------------------------------------------------------------------------------------------

bwmilby wrote: ↑

Sat Jul 21, 2018 10:07 pm

I'm not sure that produces the desired result. This does though:

Code: Select all

on mouseUp
   local tList, tKey, tKeys, tStart
   set the itemDel to "@"
   put the millisecs into tStart
   repeat with i=1 to 10000
      put the text of field "Source" into tList
      -- look at each line in the original data
      repeat for each line tLine in tList
         put item 1 of tLine into tKey
         put tLine into tKeys[tKey]["Data"]
         add 1 to tKeys[tKey]["Count"]
      end repeat
      -- go through array and remove dupes
      put empty into tList
      repeat for each key tKey in tKeys
         if tKeys[tKey]["Count"] = 1 then
            put tKeys[tKey]["Data"] & cr after tList
         end if
      end repeat
      delete variable tKeys
   end repeat
   delete char -1 of tList
   set the text of field "Destination" to tList
   put cr & the millisecs - tStart after field "Destination"
end mouseUp

At 10000 iterations, my solution seem to be 20-30ms faster which isn't enough to make a difference I don't guess. The issue is needing to retain the entire original line and not just the key portion.

Made me think though... if you used 2 arrays, then you could leverage combine again:

Code: Select all

on mouseUp
   local tList, tKey, tLines, tCounts, tStart
   set the itemDel to "@"
   put the millisecs into tStart
   repeat with i=1 to 10000
      put empty into tLines
      put the text of field "Source" into tList
      -- look at each line in the original data
      repeat for each line tLine in tList
         put item 1 of tLine into tKey
         put item 2 of tLine into tLines[tKey]
         add 1 to tCounts[tKey]
      end repeat
      -- go through array and remove dupes
      repeat for each key tKey in tLines
         if tCounts[tKey] > 1 then
            delete variable tLines[tKey]
         end if
      end repeat
      delete variable tCounts
      combine tLines using return and "@"
   end repeat
   set the text of field "Destination" to tLines
   put cr & the millisecs - tStart after field "Destination"
end mouseUp

This actually turns out to be slightly faster in a couple of test runs.

hi bwmilby, I've tried out both of your scripts too. Unfortunately, the app crashes when I go to filter out my original text file, because it's about 30 mb in size.

Here I'm sharing my text file to all of you so you can try your script with it:

Code: Select all

https://anonfile.com/Q1Paxefabe/sample.7z

This is just a sample file of 30 mb which I've used to try out both of your scripts. I have more similar files of 50-60 mb in size.

Note: while you are testing my sample file, please use ":" as itemDel.

Thanks to all of you who still showing me hope after I wake up from bed

jacque · Post by **jacque** » Sun Jul 22, 2018 4:44 pm

The issue is needing to retain the entire original line and not just the key portion.

I think I misunderstood the goal. I thought any duplicated first item should be entirely removed. If these are really email addresses then that's a bad assumption.

I was thinking about the size of the source files too and wondered if that would be a problem. It might help to read the file directly from disk one line at a time.

Edit: I have another idea how to handle this but I need to clarify the goal. Is it just to remove all duplicate addresses, leaving only one copy of each? The original request was to remove all instances of "rawdata" regardless of the trailing content.

alemrantareq · Post by **alemrantareq** » Sun Jul 22, 2018 5:20 pm

jacque wrote: ↑
Sun Jul 22, 2018 4:44 pm

The issue is needing to retain the entire original line and not just the key portion.
I think I misunderstood the goal. I thought any duplicated first item should be entirely removed. If these are really email addresses then that's a bad assumption.

I was thinking about the size of the source files too and wondered if that would be a problem. It might help to read the file directly from disk one line at a time.

Edit: I have another idea how to handle this but I need to clarify the goal. Is it just to remove all duplicate addresses, leaving only one copy of each? The original request was to remove all instances of "rawdata" regardless of the trailing content.

please refer to the file I shared in my last reply to help you understand what I want. If you look into the file, you will see a lot of duplicate entries before ":". So I want all the lines removed that hold the exact duplicate strings before the itemDelimiter ":".

thanks

bwmilby · Post by **bwmilby** » Sun Jul 22, 2018 7:32 pm

The goal is to completely remove any key that is repeated (not just the duplicates).

Having real data makes a difference. The offset method is terribly slow (still working). The 2 array method took 18497ms. I'm not sure if you caught the 10000 loops and removed them (I did that to compare speed of the different methods).

Modern computers should be fine with a 50-60MB file. The method that I'm using will approach 2x of that in memory (exact percentage will greatly depend on the number of duplicates).

jacque · Post by **jacque** » Sun Jul 22, 2018 8:34 pm

The goal is to completely remove any key that is repeated (not just the duplicates).

Then why track the entire string? I'll look at the actual file soon, but it sounds like the removal should omit the port only, and match the rest. That would make more sense.

alemrantareq · Post by **alemrantareq** » Sun Jul 22, 2018 8:38 pm

bwmilby wrote: ↑
Sun Jul 22, 2018 7:32 pm
The goal is to completely remove any key that is repeated (not just the duplicates).

you caught me right

bwmilby wrote: ↑
Sun Jul 22, 2018 7:32 pm
Having real data makes a difference. The offset method is terribly slow (still working). The 2 array method took 18497ms. I'm not sure if you caught the 10000 loops and removed them (I did that to compare speed of the different methods).

I've tried that 10000 loops too, and got my app crashed after waiting for too long

if any of you have faster solution to kick off the repeated keys, please...

jacque · Post by **jacque** » Sun Jul 22, 2018 8:39 pm

Well, I can't download the file. The web site is far too intrusive and the .7z extension isn't recognized.

LiveCode Forums

How to sort out this

Re: How to sort out this

Re: How to sort out this

Re: How to sort out this

Re: How to sort out this

Re: How to sort out this

Re: How to sort out this

Re: How to sort out this

Re: How to sort out this

Re: How to sort out this

Re: How to sort out this

Re: How to sort out this

Re: How to sort out this

Re: How to sort out this

Re: How to sort out this

Re: How to sort out this