OK, since I got such an awesome answer on my last question, I figure maybe it wouldn't hurt to ask this one. And I promise, I will contribute here once my level of knowledge has gotten up higher!
I work for a school district with 70,000 students. I do district server level stuff, and I've already got old rev scripts working behind the scenes to do lots of different things. But here's something I'm struggling with.
We're trying to generate unique user names for our students, based on a combination of names, parts of their student ID number, and parts of their birthdate. We can't use the whole student ID number OR birthdate because we're not allowed. So I'm trying different things.
I wrote a script that puts all 70k names, birthdates, and id numbers into a variable (one student per line,) then uses a repeat loop to go line by line through the variable and determine how many dupes there are. To overly simplify, it's this concept:
repeat with i = 1 to 70000
if listOfUsers contains userName then
put return & userName after theDuplicates
else
put return & userName after listOfUsers
end if
end repeat
This works totally fine, but takes a while to run. Is there a faster way to do this?
Generating User Names
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller
-
- Livecode Opensource Backer
- Posts: 211
- Joined: Sun Oct 24, 2010 12:13 am
-
- VIP Livecode Opensource Backer
- Posts: 3647
- Joined: Sun Jan 07, 2007 9:12 pm
- Location: Bochum, Germany
Re: Generating User Names
Hi richard,
I propose to use an array to filter out the duplicates. This seems at first counterintuitive. But have a look.
You use the list of all the users (including duplicates) to build an array. You use the line with the user information as the key to the array and add 1 to the content of the array. Then you will have a number in the array: a one if the user is unique and 2 or more if the user was a duplicate.
There are several ways to do this but I think this is the fastest and once you get the idea it is a very nifty way to find duplicates.
And I use 'repeat for each line aLine in tData'
"Repeat for" just goes through a list takes one line at a time, puts it into the variable 'aLine' and you can do something with the line. Livecode does not have to find the line each time around. With repeat with i = 1 to x Livecode counts through the lines to get to the line number.
Then you let your routine search 70.000 times through the whole content whether it has a duplicate. All this takes a lot of time.
the field "fAllUsers" is supposed to contain all users with duplicates.
try this and see if it works for you. I dont know the exact format of your user id, but I guess it should work as a key for an array.
regards
Bernd
Edit: I appended the bare-bones stack with the script above.
I propose to use an array to filter out the duplicates. This seems at first counterintuitive. But have a look.
You use the list of all the users (including duplicates) to build an array. You use the line with the user information as the key to the array and add 1 to the content of the array. Then you will have a number in the array: a one if the user is unique and 2 or more if the user was a duplicate.
There are several ways to do this but I think this is the fastest and once you get the idea it is a very nifty way to find duplicates.
And I use 'repeat for each line aLine in tData'
"Repeat for" just goes through a list takes one line at a time, puts it into the variable 'aLine' and you can do something with the line. Livecode does not have to find the line each time around. With repeat with i = 1 to x Livecode counts through the lines to get to the line number.
Then you let your routine search 70.000 times through the whole content whether it has a duplicate. All this takes a lot of time.
the field "fAllUsers" is supposed to contain all users with duplicates.
Code: Select all
on mouseUp
put field "fAllUsers" into tData
put "" into tArray
repeat for each line aLine in tData
add 1 to tArray[aLine]
end repeat
put the keys of tArray into tAllUsers
-- optional sort
sort tAllUsers
repeat for each line aUser in tAllUsers
if tArray[aUser] < 2 then
put aUser & return after tListOfUniqueUsers
else
put aUser & return after tListOfDupUsers
end if
end repeat
delete last char of tListOfUniqueUsers -- a return
delete last char of tListOfDupUsers -- a return
put tListOfUniqueUsers into field "listOfUniqueUsers"
put tListOfDupUsers into field "ListOfDupUsers"
end mouseUp
regards
Bernd
Edit: I appended the bare-bones stack with the script above.
- Attachments
-
- ListOfDuplicateUsers.livecode.zip
- (1.3 KiB) Downloaded 196 times
-
- VIP Livecode Opensource Backer
- Posts: 3370
- Joined: Mon Jan 22, 2007 7:36 am
- Location: Berkeley, CA, US
- Contact:
Re: Generating User Names
...and even without the array, the "repeat for each" form is *much* faster than the "repeat with" form.