how to is written assuming the hospital system is on Windows(R),
however some of these steps may be even easier if it is Linux or Unix.
all software is open source unless identified as not
Step 1. Have the hospital set up a pdf printer
dead easy on a Linux system
for windows not hard pdfcreator run as a service
http://sourceforge.net/projects/pdfcreator/we use PDFcreator currently 0.9.7
set to automatically time stamp the file and deliver into C:\HL7
Step 2. Print hospital reports to pdf as well as the regular paper
jobbie (for now)
Step 3. Strip out the text out of the pdf
for windows I use
http://text-mining-tool.com/its free but closed source
Step 4. run a cron job to run your script
the script will stuff the text into a HL7 "wrapper" but first you need
to remove the line feeds as this is not tolerated in HL7
in windows
Taskmanager runs C:\HL7\pdf2txt2hl7.bat hourly which in turn
a) copies all pdfs to c:\HL7\temp for processing
b) uses Text Miner to extract the ASCII from the pdf currently 1.1.42
http://text-mining-tool.com/c) uses TR to tokenize carriage returns and clean up some characters
it a port of the unix tool
http://gnuwin32.sourceforge.net/d) prepends and appends start.bnk and end.bnk - a nominal HL7 wrapper
for the text
e) moves the pdf's to processed and the HL7 to C:\HL7
pdf2txt2hl7.bat
REM @ECHO OFF
ECHO. PDF2TXT2HL7.BAT 2009 (C) Peter H-C
IF NOT (%1)==(/?) GOTO :START
ECHO.
ECHO.::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
ECHO.:: This batch file extracts text from hospital pdf ::
ECHO.:: and preprocesses it removing illegal characters ::
ECHO.:: and carriage return line feeds <CR><LF> ::
ECHO.:: And then wraps it in a HL7 ish wrapper for Mirth ::
ECHO.::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
ECHO.
ECHO. usage
ECHO. PDF2TXT2HL7
ECHO. /? gives this text
GOTO :END
:START
REM temporarily add path to GNUwin32 toolset and Text_Mining
PATH=%PATH%;"C:\Program Files (x86)\GnuWin32\bin";C:\Text_Mining;
COPY *.pdf C:\HL7\Testing\
MOVE *.pdf C:\HL7\temp\
CD C:\HL7\temp
for %%I in (*.pdf) do minetext "%%I" "%%~nI.tx1
REM remove ’ from the text with tr first by stripping the first
for %%I in (*.tx1) do tr -d '\342\200' < "%%I" > "%%~nI.tx2
REM remove ’ from the text with tr second converting
for %%I in (*.tx2) do tr '\231' '\047' < "%%I" > "%%~nI.tx3
REM remove <CR><LF> and <LF> from the text with tr
for %%I in (*.tx3) do tr '[\r\n]' '[%%%%]' < "%%I" > "%%~nI.tx4"
REM copy pdf to processed
MOVE *.pdf C:\HL7\processed\
REM append the prototype files to form a validish HL7 with placeholders
for %%I in (*.tx4) do copy start.bnk /B + "%%I" /B + end.bnk /B "%%~nI.HL7"
MOVE *.HL7 C:\HL7\
REM Test for exact ERRORLEVEL 0
IF ERRORLEVEL 0 IF NOT ERRORLEVEL 1 GOTO :CLEAN
ECHO. *********************************
ECHO. ***Failure***Failure***Failure***
ECHO. *********************************
REM insert delay of 1 hour
ECHO. Control C to quit
ping -n 3500 127.0.0.1 > NUL
GOTO :END
:CLEAN
REM And if good then can cleanup files
del *.tx?
del *.pdf
:END
Step 5. Install Mirth (open source multiplatform) at the hospital end
This, once they realise what it is, becomes very interesting for the
hospital IT people as they will find all sorts of uses for it as they
struggle with HL7 transport for their own internal purposes
I use Mirth currently 1.8.0.4126 with Sun Java 1.6.0_11
http://www.mirthproject.org/it transforms the HL7 by using _javascript_ RegEx to strip out data from
the text to populate the HL7 fields and routes the fully formed HL7
messages as appropriate to
a) me by email for blank pdf's and unassigned reports (DR.DOCTOR)
c) the [local] clinic by unencrypted low level protocol via SSH
tunnel to another Mirth instance
Step 6. Install Mirth on the clinic end
We use Mirth to channel the output to an unused Mule HL7 input on the
[EMR] end, in our case as "CML" input
the _javascript_ invoked by Mirth on both ends (hospital and clinic) are
saved as XML configurations, a bit obtuse to list plain text but I
attach for your pleasure. The LLP to Peter sits on the hospital Mirth
instance and the LLP from Hospital to file sits on our clinic server
which drops files into a file directory that is configured to import
CML type HL7