[dev] regex gurus listen

Brent J. Nordquist bjn@horde.org
Thu, 12 Jul 2001 12:10:50 -0500 (CDT)


On Thu, 12 Jul 2001, Jan Schneider <janmailing@gmx.de> wrote:

> Take this line (remove linebreaks):
> ATTENDEE;PARTSTAT=DELEGATED;DELEGATED-
> TO="Mailto:E@example.com":Mailto:C@example.com
>
> It has to be split at the first occuring colon _not_ in quoted string.
> In this case it has to be split at colon before the 2nd 'Mailto'.

I'm not sure you can do that in a completely general way with only one
regex.  Are there any other constraints that would narrow the problem?
Is the above format always the way it will be (one set of double-quoted
"something", that could be removed with a first pass)?  Do you need all
the fields that you're splitting, or only the one you mentioned?

- You could change all the :'s inside ""'s to be some other character,
then split it, then change them back.

- You could use a pattern that would grab from the right, if you know
there won't be any ""'s to the right of the part you need broken out.

Let me know which of these sounds promising and I can write the regex's...
I just have to know how general you need it to be.

> One general question: As you can imagine after this question I use
> regex to parse the files. All other ical/vcard-parsers I found use
> character parsing.  What's your opinion about that?

It may be telling you something about the difficulty of parsing this
format; it sounds very un-regular at first glance.

-- 
Brent J. Nordquist <bjn@horde.org> N0BJN
Yahoo!: Brent_Nordquist / AIM: BrentJNordquist / ICQ: 76158942