[Tickets #7237] Outlook-generated CSV/TSV files parse errors

bugs at horde.org bugs at horde.org
Mon Aug 25 23:16:59 UTC 2008


DO NOT REPLY TO THIS MESSAGE. THIS EMAIL ADDRESS IS NOT MONITORED.

Ticket URL: http://bugs.horde.org/ticket/7237
------------------------------------------------------------------------------
  Ticket             | 7237
  Created By         | Ben Klang <ben at alkaloid.net>
  Summary            | Outlook-generated CSV/TSV files parse errors
  Queue              | Horde Framework Packages
  Version            | FRAMEWORK_3
  Type               | Bug
  State              | Unconfirmed
  Priority           | 1. Low
  Milestone          |
  Patch              |
  Owners             |
------------------------------------------------------------------------------


Ben Klang <ben at alkaloid.net> (2008-08-25 19:16) wrote:

While trying to import a TSV file created by Outlook I found that  
Horde was calculating an incorrect number of rows.  Closer inspection  
of the exported data and Horde's Data/tsv.php showed that the parser  
simply splits the files on line endings.  The data I exported from  
Outlook contained numerous fields that contained newlines embedded in  
quotation marks.  To make matters worse Outlook did not consistently  
quote each field.  It appears to only have quoted fields which  
contained quotes, the delimeter or the newline.

Example of a single field:
<tab>"""John's Barbeque""
Good food here."<tab>

Outlook intended for this to be the string
"John's Barbeque"\nGoodfoodhere.

but the Horde Framework parser sees the newline and assumes it's the  
next record.

I've experimented with different ways of writing the parser to look  
for newlines but each time I find new corner cases.  Rather than spin  
our wheels on this it might make sense to look at the PEAR library  
(which seems only to operate on files rather than strings) or find a  
reference implementation.  I haven't yet read the RFC to determine  
whether Outlook violates the standard or not by not consistently  
quoting, but regardless its how the file was generated.







More information about the bugs mailing list