August 7th, 2013, 05:51 PM
Postgresl loading tree structured XML
Is there a method to load tree structured XML into postgresql as type XML without having to first remove the CR/LF.
To illustrate the issue here is a very basic XML file that is structured with indents and has CR/LF characters.
<Title>Database How To</Title>
If I try to copy this into the postgresql database with the following sql.
COPY BOOKS from 'books.txt';
I get an error. "Entity: line 1: parser error : Premature end of data in tag Book."
I can successfully load the file if I structure the file differently so that all the xml is on one line as per below.
<Book><Title>Database How To</Title></Book>
Is there any method to load without having to first parse the file via "sed" to remove all CR/LF characters?
August 8th, 2013, 09:28 PM
COPY adheres to the same defaults as most text-processing tools (AWK comes to mind...): a line is equivalent to a record. So any newline or (if your stuff was generated by Windows) CR/LF begins a new line, which indicates a new record. COPY expects that an escaped period "/." or EOF ends the list of tuples and hence that invocation of the command.
So COPY requires a special format (with some influence possible through the use of options). You can either massage your data into that format (escape the offending characters or compact the XML with xmllint or sed) or use a different command set -- like writing a Python script that performs an INSERT over the data.
Its been my experience that XML creates more problems than it solves. Unless you have other tools that demand input in XML format (and even then an XML generation export function makes more sense imo), I would write a script that interprets the XML and performs inserts of actual values into a proper schema instead of trying to preserve the original.