extracting date information from string
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller
-
- Livecode Opensource Backer
- Posts: 9454
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: extracting date information from string
RFC = Rugby Football Club?
Beam me up, Scotty.
Beam me up, Scotty.
-
- VIP Livecode Opensource Backer
- Posts: 9752
- Joined: Wed May 06, 2009 2:28 pm
- Location: New York, NY
Re: extracting date information from string
Richmond.
Those strings contain no dates; my handler returns nothing if you run either. A slash buried in a handful of integers does not pass muster.
The most peculiar thing about LC is the oddity that LC thinks that ANY integer of 11 digits or less is a date. This goes back to HC, which had NO SUCH LIMIT to the number of digits. But that sort of thing is easily filtered, as my little offering shows.
The issue here, and everyone has noted and commented on it, is that there are a lot of ways that humans write dates. But if those are all sturdy and consistent enough then they can be parsed.
Craig
EDIT. If you put enough effort into it...
Those strings contain no dates; my handler returns nothing if you run either. A slash buried in a handful of integers does not pass muster.
The most peculiar thing about LC is the oddity that LC thinks that ANY integer of 11 digits or less is a date. This goes back to HC, which had NO SUCH LIMIT to the number of digits. But that sort of thing is easily filtered, as my little offering shows.
The issue here, and everyone has noted and commented on it, is that there are a lot of ways that humans write dates. But if those are all sturdy and consistent enough then they can be parsed.
Craig
EDIT. If you put enough effort into it...
Last edited by dunbarx on Wed Aug 31, 2022 2:44 pm, edited 1 time in total.
-
- VIP Livecode Opensource Backer
- Posts: 9752
- Joined: Wed May 06, 2009 2:28 pm
- Location: New York, NY
Re: extracting date information from string
I was wrong about integers. LC ( and HC) thinks that any floating point number is a date.
Craig
Craig
-
- VIP Livecode Opensource Backer
- Posts: 9857
- Joined: Sat Apr 08, 2006 7:05 am
- Location: Los Angeles
- Contact:
Re: extracting date information from string
A relevant thread which references RFCs, with links, and includes some good ideas about possible future time and date options for LC from Mark Waddingham:
https://forums.livecode.com/viewtopic.php?f=66&t=23547
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
-
- Livecode Opensource Backer
- Posts: 9454
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: extracting date information from string
And the 'dirty' word is 'consistent' . . .But if those are all sturdy and consistent enough then they can be parsed.
As we all know (or we should) computers ARE consistent, humans are NOT.
AND, let's face it: the problem is neither the computers nor the humans: the problem
is when anything tries to cross the 'membrane' between computers and humans.
1. IF very date contained a WORD representing the month, life would be significantly easier.
Today could be:
31/8/2022 (Gregorian)
8/31/2022 (Gregorian)
August 31, 2022 (Gregorian)
---
22/8/2022 (Julian)
8/22/2022 (Julian)
August 22, 2022 (Julian)
---
25/12/1738 (Coptic)
Misrah 25, 1738 (Coptic)
---
Elul 4, 5782 (Jewish)
---
Safar 4, 1444 (Islamic)
---
Badra 5, 2079 (one of about 12 Hindu calendars)
----
and so it goes, and so it goes.
-
- VIP Livecode Opensource Backer
- Posts: 7258
- Joined: Sat Apr 08, 2006 8:31 pm
- Location: Minneapolis MN
- Contact:
Re: extracting date information from string
Because they could be seconds, starting from the origin of time assigned by the OS (1904 Mac, 1970 'nix, etc.)
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com
HyperActive Software | http://www.hyperactivesw.com
-
- VIP Livecode Opensource Backer
- Posts: 9752
- Joined: Wed May 06, 2009 2:28 pm
- Location: New York, NY
Re: extracting date information from string
Jacque.
Of course. The scary part is I actually knew that back in the Pleistocene.
Craig
Of course. The scary part is I actually knew that back in the Pleistocene.
Craig
Last edited by dunbarx on Wed Aug 31, 2022 6:38 pm, edited 1 time in total.
-
- VIP Livecode Opensource Backer
- Posts: 9752
- Joined: Wed May 06, 2009 2:28 pm
- Location: New York, NY
Re: extracting date information from string
All.
Know that the beginning of time was not at the beginning of 1970, rather only the beginning of positive time. See;
https://forums.livecode.com/viewtopic.p ... ng#p138530
You can go back, "directly", to just before the Norman Conquest. With a bit of coding, to any time at all.
Craig
Know that the beginning of time was not at the beginning of 1970, rather only the beginning of positive time. See;
https://forums.livecode.com/viewtopic.p ... ng#p138530
You can go back, "directly", to just before the Norman Conquest. With a bit of coding, to any time at all.
Craig
-
- Livecode Opensource Backer
- Posts: 9454
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: extracting date information from string
Well, they could, but I wonder if one can not be a bit more 'refined' and look forBecause they could be seconds
this sort of thing:
12/12/1786
Mind you the mid-Atlantic date problem could still cause problems:
My birthday was on February 7, 1962 . . .
and, newly arrived in America I wrote 07/02/1962 on a website on the minty-new internet (1993)
and then started getting birthday greetings every 2nd of July.
-
- Livecode Opensource Backer
- Posts: 9454
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: extracting date information from string
seems all very straightforward [even if I do jalouse at the American style date].
-
And the Dictionary seems fairly straightforward as well.
So, obviously, if LiveCode "gets its knickers in a twist" with 11 digit numbers that might be seconds
one will just have to write a routine to exclude secs.
-
-
- VIP Livecode Opensource Backer
- Posts: 9752
- Joined: Wed May 06, 2009 2:28 pm
- Location: New York, NY
Re: extracting date information from string
Richmond, it isn't LC that has knickers issues, it is the lack of a single "universal" date format, and even within that, there are variations. For example, in the US, sometimes one has to fill out a form with "mm/dd/yy" and sometimes with "mm/dd/yyyy". It will fail if not done just right, and that is considered a single format.
LC has the wherewithal to work through all that, it just takes careful effort.
Craig
LC has the wherewithal to work through all that, it just takes careful effort.
Craig
-
- Livecode Opensource Backer
- Posts: 9454
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: extracting date information from string
That is awkward.that is considered a single format
Maybe, just maybe, the 'secret' such as it is, lies in the '/' slashes.
So, were one to set '/' as the itemDelimiter one ought to be able to plop item 3 into a year variable.
Of course, my story about my birthday should illustrate where another can/tin [wow, there we go again] lies.
-
- VIP Livecode Opensource Backer
- Posts: 9752
- Joined: Wed May 06, 2009 2:28 pm
- Location: New York, NY
Re: extracting date information from string
Yep, that is a great way into and out of the parsing jungle. Of course, some formats use "-".were one to set '/' as the itemDelimiter
Anyway, the point is that all this should be readily doable. It just takes work, and anticipating what might come up.
Craig
-
- Livecode Opensource Backer
- Posts: 9454
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: extracting date information from string
Today is 31.8.2022 as far as I'm concerned.some formats use "-"
Re: extracting date information from string
Using MacOS' data detectors is probably the best way to go about this if this is for Mac-only app, given the complexity.
Alternatively, and if you're willing to dance the regex dance, the following will find the first date in a text in either UK, US or SQL formats, with or without leading 0's and with *any* delimiter.
Minimal logic is applied so you can have to sanity check in LC. This should accept no day > 31 and no month > 12 - but doesn't guard against 30/2/2022 (d/m/y) for example.
The only real difficulty is that there no way to know if the date is m/d/y or d/m/y if the day is <= 12 - the usual confusion between US and UK date formats. It can probably be extended to search for textual dates (eg 3 September 2022 or Sep 3 2022) if you really need that.
The regex
Code: Select all
\s((?:0?[1-9]|[12]\d|30|31)[^\w\d\r\n:](?:0?[1-9]|1[0-2])[^\w\d\r\n:](?:\d{4}|\d{2})|(?:0?[1-9]|1[0-2])[^\w\d\r\n:](?:0?[1-9]|[12]\d|30|31)[^\w\d\r\n:](?:\d{4}|\d{2})|(?:\d{4})[^\w\d\r\n:](?:0?[1-9]|1[0-2])[^\w\d\r\n:](?:[0-2][0-9]|3[0-1]))
First of all it starts by assuming there will be a white space before the data (ie the date is separated from other words with a space): \s
then the whole code is a capturing group ( ) - this is what matchText in LC captures
The capturing group inside the outer ( and ) is built of 3 sub-searches, separated by a pipe | ('or')
the first sub search is for d-m-y, the second for m-d-y and the third is for y-m-d - if any of these returns you have a hit.
the actual components in these searches are:
delimiters: [^\w\d\r\n:] -- ie anything that is not a char, num, cr or lf. If you want to limit this to just /.or - then use [\/\.\-] instead
year: (?:\d{4}|\d{2}) - a non capturing group (?: ) is used so that it doesn't appear a result it's own right and finds groupings of 4 or 2 numbers
month: (?:0?[1-9]|1[0-2]) if a leading 0 exists then take it and the first digit can be up to 9 - or 10,11 or 12
day: (?:0?[1-9]|[12]\d|30|31) same idea with day or up to the number 31.
The only exception for SQL date where the number must be two digits:(?:[0-2][0-9]|3[0-1])
The LC code to use this is
Code: Select all
put "\s((?:0?[1-9]|[12]\d|30|31)[^\w\d\r\n:](?:0?[1-9]|1[0-2])[^\w\d\r\n:](?:\d{4}|\d{2})|(?:0?[1-9]|1[0-2])[^\w\d\r\n:](?:0?[1-9]|[12]\d|30|31)[^\w\d\r\n:](?:\d{4}|\d{2})|(?:\d{4})[^\w\d\r\n:](?:0?[1-9]|1[0-2])[^\w\d\r\n:](?:[0-2][0-9]|3[0-1]))" into tRegex
get matchText(textToSearch, tRegex, R)
HTH
Stam