[Tickets #7237] Outlook-generated CSV/TSV files parse errors
bugs at horde.org
bugs at horde.org
Mon Aug 25 23:16:59 UTC 2008
DO NOT REPLY TO THIS MESSAGE. THIS EMAIL ADDRESS IS NOT MONITORED.
Ticket URL: http://bugs.horde.org/ticket/7237
------------------------------------------------------------------------------
Ticket | 7237
Created By | Ben Klang <ben at alkaloid.net>
Summary | Outlook-generated CSV/TSV files parse errors
Queue | Horde Framework Packages
Version | FRAMEWORK_3
Type | Bug
State | Unconfirmed
Priority | 1. Low
Milestone |
Patch |
Owners |
------------------------------------------------------------------------------
Ben Klang <ben at alkaloid.net> (2008-08-25 19:16) wrote:
While trying to import a TSV file created by Outlook I found that
Horde was calculating an incorrect number of rows. Closer inspection
of the exported data and Horde's Data/tsv.php showed that the parser
simply splits the files on line endings. The data I exported from
Outlook contained numerous fields that contained newlines embedded in
quotation marks. To make matters worse Outlook did not consistently
quote each field. It appears to only have quoted fields which
contained quotes, the delimeter or the newline.
Example of a single field:
<tab>"""John's Barbeque""
Good food here."<tab>
Outlook intended for this to be the string
"John's Barbeque"\nGoodfoodhere.
but the Horde Framework parser sees the newline and assumes it's the
next record.
I've experimented with different ways of writing the parser to look
for newlines but each time I find new corner cases. Rather than spin
our wheels on this it might make sense to look at the PEAR library
(which seems only to operate on files rather than strings) or find a
reference implementation. I haven't yet read the RFC to determine
whether Outlook violates the standard or not by not consistently
quoting, but regardless its how the file was generated.
More information about the bugs
mailing list