extracting date information from string

LiveCode is the premier environment for creating multi-platform solutions for all major operating systems - Windows, Mac OS X, Linux, the Web, Server environments and Mobile platforms. Brand new to LiveCode? Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9454
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: extracting date information from string

Post by richmond62 » Wed Aug 31, 2022 12:29 pm

RFC = Rugby Football Club?

Beam me up, Scotty. 8)

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9752
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: extracting date information from string

Post by dunbarx » Wed Aug 31, 2022 1:59 pm

Richmond.

Those strings contain no dates; my handler returns nothing if you run either. A slash buried in a handful of integers does not pass muster.

The most peculiar thing about LC is the oddity that LC thinks that ANY integer of 11 digits or less is a date. This goes back to HC, which had NO SUCH LIMIT to the number of digits. But that sort of thing is easily filtered, as my little offering shows.

The issue here, and everyone has noted and commented on it, is that there are a lot of ways that humans write dates. But if those are all sturdy and consistent enough then they can be parsed.

Craig

EDIT. If you put enough effort into it...
Last edited by dunbarx on Wed Aug 31, 2022 2:44 pm, edited 1 time in total.

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9752
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: extracting date information from string

Post by dunbarx » Wed Aug 31, 2022 2:29 pm

I was wrong about integers. LC ( and HC) thinks that any floating point number is a date.

Craig

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9857
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: extracting date information from string

Post by FourthWorld » Wed Aug 31, 2022 3:56 pm

richmond62 wrote:
Wed Aug 31, 2022 12:29 pm
RFC = Rugby Football Club?
A relevant thread which references RFCs, with links, and includes some good ideas about possible future time and date options for LC from Mark Waddingham:

https://forums.livecode.com/viewtopic.php?f=66&t=23547
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9454
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: extracting date information from string

Post by richmond62 » Wed Aug 31, 2022 4:21 pm

But if those are all sturdy and consistent enough then they can be parsed.
And the 'dirty' word is 'consistent' . . .

As we all know (or we should) computers ARE consistent, humans are NOT.

AND, let's face it: the problem is neither the computers nor the humans: the problem
is when anything tries to cross the 'membrane' between computers and humans.

1. IF very date contained a WORD representing the month, life would be significantly easier.

Today could be:

31/8/2022 (Gregorian)
8/31/2022 (Gregorian)
August 31, 2022 (Gregorian)
---
22/8/2022 (Julian)
8/22/2022 (Julian)
August 22, 2022 (Julian)
---
25/12/1738 (Coptic)
Misrah 25, 1738 (Coptic)
---
Elul 4, 5782 (Jewish)
---
Safar 4, 1444 (Islamic)
---
Badra 5, 2079 (one of about 12 Hindu calendars)
----
and so it goes, and so it goes.

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7258
Joined: Sat Apr 08, 2006 8:31 pm
Location: Minneapolis MN
Contact:

Re: extracting date information from string

Post by jacque » Wed Aug 31, 2022 5:17 pm

dunbarx wrote:
Wed Aug 31, 2022 2:29 pm
I was wrong about integers. LC ( and HC) thinks that any floating point number is a date.
Because they could be seconds, starting from the origin of time assigned by the OS (1904 Mac, 1970 'nix, etc.)
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9752
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: extracting date information from string

Post by dunbarx » Wed Aug 31, 2022 5:57 pm

Jacque.

Of course. The scary part is I actually knew that back in the Pleistocene. :oops:

Craig
Last edited by dunbarx on Wed Aug 31, 2022 6:38 pm, edited 1 time in total.

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9752
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: extracting date information from string

Post by dunbarx » Wed Aug 31, 2022 6:01 pm

All.

Know that the beginning of time was not at the beginning of 1970, rather only the beginning of positive time. See;

https://forums.livecode.com/viewtopic.p ... ng#p138530

You can go back, "directly", to just before the Norman Conquest. With a bit of coding, to any time at all.

Craig

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9454
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: extracting date information from string

Post by richmond62 » Wed Aug 31, 2022 6:32 pm

Because they could be seconds
Well, they could, but I wonder if one can not be a bit more 'refined' and look for
this sort of thing:

12/12/1786

Mind you the mid-Atlantic date problem could still cause problems:

My birthday was on February 7, 1962 . . .

and, newly arrived in America I wrote 07/02/1962 on a website on the minty-new internet (1993)
and then started getting birthday greetings every 2nd of July. 8)

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9454
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: extracting date information from string

Post by richmond62 » Wed Aug 31, 2022 6:59 pm

SShot 2022-08-31 at 20.36.38.png
-
seems all very straightforward [even if I do jalouse at the American style date].
-
And the Dictionary seems fairly straightforward as well.

So, obviously, if LiveCode "gets its knickers in a twist" with 11 digit numbers that might be seconds
one will just have to write a routine to exclude secs.
-
noSecs.jpg

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9752
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: extracting date information from string

Post by dunbarx » Wed Aug 31, 2022 8:11 pm

Richmond, it isn't LC that has knickers issues, it is the lack of a single "universal" date format, and even within that, there are variations. For example, in the US, sometimes one has to fill out a form with "mm/dd/yy" and sometimes with "mm/dd/yyyy". It will fail if not done just right, and that is considered a single format.

LC has the wherewithal to work through all that, it just takes careful effort.

Craig

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9454
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: extracting date information from string

Post by richmond62 » Wed Aug 31, 2022 8:25 pm

that is considered a single format
That is awkward.

Maybe, just maybe, the 'secret' such as it is, lies in the '/' slashes.

So, were one to set '/' as the itemDelimiter one ought to be able to plop item 3 into a year variable.

Of course, my story about my birthday should illustrate where another can/tin [wow, there we go again] lies.

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9752
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: extracting date information from string

Post by dunbarx » Wed Aug 31, 2022 8:47 pm

were one to set '/' as the itemDelimiter
Yep, that is a great way into and out of the parsing jungle. Of course, some formats use "-".

Anyway, the point is that all this should be readily doable. It just takes work, and anticipating what might come up.

Craig

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9454
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: extracting date information from string

Post by richmond62 » Wed Aug 31, 2022 9:04 pm

some formats use "-"
Today is 31.8.2022 as far as I'm concerned. 8)

stam
Posts: 2758
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: extracting date information from string

Post by stam » Wed Aug 31, 2022 10:04 pm

rodneyt wrote:
Wed Aug 31, 2022 9:33 am
In my case I'm building a MacOS (only) app, so the Applescript solution is interesting, given how well it seems to work in testing
Using MacOS' data detectors is probably the best way to go about this if this is for Mac-only app, given the complexity.

Alternatively, and if you're willing to dance the regex dance, the following will find the first date in a text in either UK, US or SQL formats, with or without leading 0's and with *any* delimiter.

Minimal logic is applied so you can have to sanity check in LC. This should accept no day > 31 and no month > 12 - but doesn't guard against 30/2/2022 (d/m/y) for example.

The only real difficulty is that there no way to know if the date is m/d/y or d/m/y if the day is <= 12 - the usual confusion between US and UK date formats. It can probably be extended to search for textual dates (eg 3 September 2022 or Sep 3 2022) if you really need that.

The regex :)

Code: Select all

\s((?:0?[1-9]|[12]\d|30|31)[^\w\d\r\n:](?:0?[1-9]|1[0-2])[^\w\d\r\n:](?:\d{4}|\d{2})|(?:0?[1-9]|1[0-2])[^\w\d\r\n:](?:0?[1-9]|[12]\d|30|31)[^\w\d\r\n:](?:\d{4}|\d{2})|(?:\d{4})[^\w\d\r\n:](?:0?[1-9]|1[0-2])[^\w\d\r\n:](?:[0-2][0-9]|3[0-1]))
this looks crazy i know, but there is method to the madness. The long regex above just means (UK date or US date or SQL date)
First of all it starts by assuming there will be a white space before the data (ie the date is separated from other words with a space): \s

then the whole code is a capturing group ( ) - this is what matchText in LC captures
The capturing group inside the outer ( and ) is built of 3 sub-searches, separated by a pipe | ('or')

the first sub search is for d-m-y, the second for m-d-y and the third is for y-m-d - if any of these returns you have a hit.

the actual components in these searches are:
delimiters: [^\w\d\r\n:] -- ie anything that is not a char, num, cr or lf. If you want to limit this to just /.or - then use [\/\.\-] instead

year: (?:\d{4}|\d{2}) - a non capturing group (?: ) is used so that it doesn't appear a result it's own right and finds groupings of 4 or 2 numbers

month: (?:0?[1-9]|1[0-2]) if a leading 0 exists then take it and the first digit can be up to 9 - or 10,11 or 12

day: (?:0?[1-9]|[12]\d|30|31) same idea with day or up to the number 31.
The only exception for SQL date where the number must be two digits:(?:[0-2][0-9]|3[0-1])

The LC code to use this is

Code: Select all

put "\s((?:0?[1-9]|[12]\d|30|31)[^\w\d\r\n:](?:0?[1-9]|1[0-2])[^\w\d\r\n:](?:\d{4}|\d{2})|(?:0?[1-9]|1[0-2])[^\w\d\r\n:](?:0?[1-9]|[12]\d|30|31)[^\w\d\r\n:](?:\d{4}|\d{2})|(?:\d{4})[^\w\d\r\n:](?:0?[1-9]|1[0-2])[^\w\d\r\n:](?:[0-2][0-9]|3[0-1]))" into tRegex
get matchText(textToSearch, tRegex, R)
if IT is true then R contains the first date in the text (in either UK, US or SQL format)

HTH
Stam

Post Reply

Return to “Getting Started with LiveCode - Experienced Developers”