I have an idea that I want to add some sailing hazards to my Garmin GPS as "Points of Interest". These can be hand crafted but being lazy I've been seeking some data on line. I found the OpenStreetMap site and discovered that they have a sub set named OpenSeaMap. The map data is freely available is several formats from various web sites and I downloaded the data for Scotland because I enjoy getting wet and bitten by midges.
If you are interested the data may be downloaded from this web page http://download.geofabrik.de/europe/gre ... tland.html by downloading the file on the link scotland-latest.osm.bz2, the bz2 file is a zip archive and is 320 Mbytes which expands to 3.2Gbytes of xml.
My stack parses the data and extracts xml records named/tagged as nodes to a new file which is saved in the same folder as the original xml file. On my mac the stack takes 27.5 minutes to run and outputs 149 records that describe rocks. I have no idea if this is correct but it seems a little on the low side.
I have avoided the Livecode XML tools as Mark Waddingham has briefed that they are not very fast on large files of several megabytes. My code is very much a first draft and I'm sure it can be improved upon.
The main parsing routine is here for those who don't want to download the stack file:
Code: Select all
On ParseFile pFile
-- Constant kStartChar = "<"
-- Constant kEndChar = ">"
-- Constant kNodeEnd = "</node>"
Put the milliseconds into tStartTime
put "Started at : " & tStartTime & "ms" & cr into field "debug"
## define two vars that are used to filter the records output to file
put ("<tag k=" & quote & "seamark:") into tTag
put quote & "rock" & quote into tTag2
## Create name for output file
set itemdel to slash
put pFile into tFileSpec
delete the last item of tFileSpec
put slash & "OSM-Scotland-Extract-Seamarks-Rocks.xml" after tFileSpec
## Open the outputfile for er output
open file tFileSpec for append
## Open the inputfile for reading
Open file pFile for text read
## position file pointer to begining of the data i.e. after the header cruft
read from file pFile until "<node" -- first node in file
put the length of it into tcharPosn
put tcharPosn-5 into tcharPosn -- move the file pointer back to begining of node data
read from file pFile at 1 for tCharPosn
put it into tData -- for debug
## Setup some variables
put "Running" into tStatus
put 0 into tCounter
--put true into tEndNotFound
--put true into tSeekStartChar
## Loop through the file
repeat until tStatus = "eof" --OR tCounter = 11
read from file pFile until "<node"
put the Result into tStatus
put it into tData -- for debug purposes
read from file pFile until ">"
put the Result into tStatus
put it into tData
--put word 1 of tData into tTagName
--if tTagName is not "node" then next repeat
if char -2 of tData is "/" then
next repeat
end if
## it is a node and it is not an isolated node with no tags
put tData into tRecord -- store first portion of record
## Read rest of record until the node close
read from file pFile until "</node>"
put the Result into tStatus
put it into tData
put it after tRecord
## output the record to the output file if it is a rock
put "<node " before tRecord
if tRecord contains tTag AND trecord contains tTag2 then
add one to tCounter
write tRecord & cr to file tFileSpec
end if
end repeat
close file pFile
close file tFileSpec
Put the milliseconds into tEndTime
put "Ended at : " & tEndTime & "ms" & cr after field "debug"
Put tEndTime - tStartTime into tElapsedTime
put tElapsedTime/60000 into tElapsedTime
put "Run Time : " & tElapsedTime & " mins" & cr after field "debug"
put tCounter & " records created in output file" & cr after field "debug"
end ParseFile
Simon