OCR?

LiveCode is the premier environment for creating multi-platform solutions for all major operating systems - Windows, Mac OS X, Linux, the Web, Server environments and Mobile platforms. Brand new to LiveCode? Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

stam
Posts: 2686
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

OCR?

Post by stam » Tue Jun 20, 2023 12:53 pm

Hi all,

In this late day and age, healthcare in the UK still thinks it's a good idea to create paper forms from a huge disparate number of IT systems and funnel them our department - ridiculous, I know.

Because the vendors of the various IT systems that feed into us will not change in the foreseeable and we are forced to deal with paper referrals, I'm creating a system that will store scanned PDFs of these paper documents and allow us to keep track and action these electronically.

To try and smooth the process, it would be good to extract some text (eg patient demographics) from these documents but they are images, not text-based, so the only option would be either having an error-prone monkey entering in data manually or, preferably, using OCR.

We can't upload these documents to any kind of online service for OCR as they contain patient-sensitive data.

Is anyone aware of a way of doing this?

Many thanks
Stam

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9388
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: OCR?

Post by richmond62 » Tue Jun 20, 2023 1:31 pm

For my sins I recently OCRed (was that a verb? Well, it is now.) a 95 page book in Bulgarian published in 1993: page by screaming page

[it was me that was screaming after about the tenth]

This: https://solutions.weblite.ca/pdfocrx/

Took scanned PNG images and got the text 100%.

Of course, using this, you'll have a 'right' dance:

1 Scan to image.
2. OCR.
3. Import the text into something that does pattern recog. to suck out the text you need.

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9388
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: OCR?

Post by richmond62 » Tue Jun 20, 2023 1:35 pm

healthcare in the UK
Come over to Bulgaria, where everything is computerised, BUT the computers are not linked up, so, when you
visit a specialist, you take a CD-ROM with you from your GP!

I am working on a theory that in 30-40 years, when I am busy manuring some flowers, both the British NHS and the Bulgarian equivalent will have . . . what? . . . found more interesting ways to piss people off. 8)
-
Screen Shot 2023-06-20 at 3.41.06 pm.png
Last edited by richmond62 on Tue Jun 20, 2023 8:06 pm, edited 1 time in total.

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7238
Joined: Sat Apr 08, 2006 8:31 pm
Location: Minneapolis MN
Contact:

Re: OCR?

Post by jacque » Tue Jun 20, 2023 5:09 pm

Google Lens will do it and it's free, but for now it's a manual process. And like Richmond said,, it would be page by page.

It's very cool though and can also translate scanned text to your specified language. My husband used it to read instructions for a hobby kit made in China.
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

markosborne
Posts: 15
Joined: Sat Mar 20, 2010 6:03 pm

Re: OCR?

Post by markosborne » Tue Jun 20, 2023 5:29 pm

Hello All,

If you look more closely at Richmond's suggestion (https://solutions.weblite.ca/pdfocrx/download) you'll see that the paid-for version supports automation:

"Enterprise Edition : All of the same features as the Community Edition plus support for multi-page PDF files, batch conversion, and no-prompt mode (which allows you to automate conversion using tools like Automator and Applescript."

stam
Posts: 2686
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: OCR?

Post by stam » Tue Jun 20, 2023 7:50 pm

markosborne wrote:
Tue Jun 20, 2023 5:29 pm
Hello All,

If you look more closely at Richmond's suggestion (https://solutions.weblite.ca/pdfocrx/download) you'll see that the paid-for version supports automation:

"Enterprise Edition : All of the same features as the Community Edition plus support for multi-page PDF files, batch conversion, and no-prompt mode (which allows you to automate conversion using tools like Automator and Applescript."
Thanks Richmond and Mark - certainly seems interesting but I was looking for something to integrate with LC. Sadly, as the work environment is Windows Automator/AppleScript is irrelevant. Not sure if the Windows version of this can be controlled with the command line - if so that may be something to look at...

As for workflow, if this did work I'd simply copy the pdf and output a text file to a folder named with a patient identifier (these need to be stored on a server anyway) and I have a number of algorithms already in use for extracting patient/clinical data so that would be straightforward. But the key thing is automation / mining human interaction in this process as it is so error-prone (examples could literally fill several tomes)...

In the ideal world a widget would work well but not sure if such a widget/framework exists. An internet API is a no-go for security reasons. I loathe to go down the monkey route (this app is meant to eliminate/minimise issues that arise from exactly this approach, don't get me started...)

stam
Posts: 2686
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: OCR?

Post by stam » Tue Jun 20, 2023 8:27 pm

Looking into this most of these apps - the linked one above include - are based on framework initially developed by HP, maintained by google and now on GitHub call Tesseract https://github.com/tesseract-ocr/tesseract

It's beyond my time constraints and abilities to create a LC widget wrapper for this but maybe someone has done it ;)
However many have used this to create open source apps - now looking at Rescribe https://rescribe.xyz/rescribe/, as that has a shell interface.

Turns out there's a few choices of open source OCR out there. But not many that can be packaged with a standalone app for random distribution, as many require installation, setting of environment paths etc, all of which is blocked by our IT dept...

--------------------
EDIT: I tested Rescribe on a simple note from my bank which I had as PDF.
Apparently my bank is the
Notionol Westminster Bonk Pic Registereò in Englond
Disappointing... I mean I'm all for a notional bonk & all, but that's quite poor :-/

paul@researchware.com
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 136
Joined: Wed Aug 26, 2009 7:42 pm
Location: Randolph, MA USA
Contact:

Re: OCR?

Post by paul@researchware.com » Wed Jun 21, 2023 5:03 pm

I have a vague recollection that some affiliated with Livecode, Ltd. (Not the company itself) was working on a LC based knowledge base tool that used Tesseract? I think Kevin Miller did a webinar on it years ago?

That implied an interface to Tesseract, whether a widget/library wrapper or a shell interface...

Of course, my memory could be complete bogus.
Paul Dupuis
Researchware, Inc.

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7238
Joined: Sat Apr 08, 2006 8:31 pm
Location: Minneapolis MN
Contact:

Re: OCR?

Post by jacque » Wed Jun 21, 2023 6:07 pm

Disappointing... I mean I'm all for a notional bonk & all, but...
Indeed. It's well known that real bonks are so much more effective.
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9388
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: OCR?

Post by richmond62 » Fri Mar 15, 2024 2:27 pm

I have a funny feeling about that last post . . .

Klaus
Posts: 13829
Joined: Sat Apr 08, 2006 8:41 am
Location: Germany
Contact:

Re: OCR?

Post by Klaus » Fri Mar 15, 2024 2:38 pm

Don't worry, everything here is on my "watchlist"! :-)

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9388
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: OCR?

Post by richmond62 » Fri Mar 15, 2024 2:41 pm

Right: and I am the one you watch the most. :mrgreen:

Klaus
Posts: 13829
Joined: Sat Apr 08, 2006 8:41 am
Location: Germany
Contact:

Re: OCR?

Post by Klaus » Fri Mar 15, 2024 3:29 pm

That's a FACT! :D :D :D :D

SparkOut
Posts: 2852
Joined: Sun Sep 23, 2007 4:58 pm

Re: OCR?

Post by SparkOut » Fri Mar 15, 2024 8:15 pm


Klaus
Posts: 13829
Joined: Sat Apr 08, 2006 8:41 am
Location: Germany
Contact:

Re: OCR?

Post by Klaus » Sat Mar 16, 2024 1:08 pm

Today Monsieur or Madame gave him-/herself the "coup de grace":
https://forums.livecode.com/viewtopic.p ... 77#p228777
8)

Post Reply

Return to “Getting Started with LiveCode - Experienced Developers”