Entry
I need to parse a string and it's getting hairy!
Jul 24th, 2002 11:33
jsWalter, Jean-Bernard Valentaten,
I am 99% complete on a date format parser.
The last item is how to a substring bracketed by single quotes.
i.e.:
I have this var...
var format = "EEEE (D - F - w - W), MMMM dd, yyyy ''G'' 'at' hh
[KK|HH|kk]:mm:ss a z" ;
I can walk down this string and pull out "tokens" (same character of 1
or more in length: ex: EEEE, hh, z )
while ( i < intFormatLen )
{
var token = "";
curChar = strFormat.charAt(i);
while (( strFormat.charAt(i) == curChar ) && ( i < intFormatLen ))
token += strFormat.charAt(i++);
result += formatOptions ( token );
}
What I can't figure out is how can I look for and pull out this...
'walter'
Now, a double single quote needs to be grapped and passed on as a
single token. but not 3. It needs to grab 2 at a time.
If it finds one single quote, it needs to grab all characters until it
sees and other single quote. Yes, it needs to be able to handle an
embedded pair of single quotes.
So, if I have this...
EEEE 'is walter''s birthday'
This should give me 3 tokens
1) EEEE
2) a SPACE
3) is walter's birthday
I have been banging my head for 3 days on this one, I just can't see it.
Help?
Walter
---
OK, I see what you want to achieve. The reason why you don't get #3 is
that the nested while-loop will only return tokens that are composed of
equal characters. The line: while while ((strFormat.charAt(i) ==
curChar)... equals "while you find the same character over and over
again and the end of the string is not reached, append to token" and
since 'walter ...' does not consist of concatenated equal chars, it is
not found.
Your approach is very naive and does not use js's most powerfull tool:
regular expressions. Using those you'll find it easy to parse your
string. The methods that should be used is search(RegExp) and match
(RegExp). The search-method will return the position of the pattern-
match and match will return the string that matches the pattern. RegExp
is to be replaced by a regular expression.
HTH,
Jean
---
I understand your comment about RegExp, but I am trying to *not* to use
RegExp, since they are not backward compatible with older browsers.
I am looking for another way to solve this.
Walter
---
Ok, so here's what I would do then:
while ( i < intFormatLen )
{
var token = "";
curChar = strFormat.charAt(i);
if (curChar == "'")
{
var lastQuoteFound = false;
while ((strFormat.charAt(++i) != "'") && (!lastQuoteFound))
{
if ((strFormat.charAt(i) == "'") &&
xor((strFormat.charAt(i-1) != "'"),
(strFormat.charAt(i+1)!= "'")))
{
lastQuoteFound = true;
}
else
{
token += strFormat.chatAt(i);
}
}
}
else if (currChar == "["))
{
while (strFormat.charAt(++i) != "]")
{
token += strFormat.charAt(i);
}
}
else
{
while ((strFormat.charAt(i) == curChar) && (i < intFormatLen))
token += strFormat.charAt(i++);
}
result += formatOptions ( token );
}
function xor(bool1, bool2)
{
return (bool1 != bool2);
}
A you might notice, I had to use the xor logical operator, which
doesn't exist in js (who the heck knows why *g*). This is because
either the predeseccor or the successor may be a quote.
Basically I would say that using the same character as delimiter and
escaper is not a good idea. I'd use a backslash for escaping purposes.
Your string would then look like this:
EEEE 'is walter\'s birthday'
I guess you got the idea from VisualBasic, VBScript or ASP which
interpret a triple quote as one escaped quote (i.e. """ == \"), but you
should keep in mind that VB doesn't use a stringparser but a grammar-
recognition automaton (basically a stack automaton) that can be tought
rules, for such purposes. Programming an automaton in js is something
that I wouldn't try, since js lacks the possibility to create complex
datastructures, thus making it very hard (not impossible though) to
programm such a thing.
Aside of this, I don't think that this question is interresting for all
the folks out there reading this knowledgebase since it is too
specific, so if you have further questions about this subject I'd be
happy to answer your emails :)
HTH,
Jean
---
Jean, thanks for the effort on this...
> A you might notice, I had to use the xor logical operator,
> which doesn't exist in js (who the heck knows why *g*).
> This is because either the predeseccor or the successor
> may be a quote.
Since JS does not support 'xor', then how is this to work?
> I would say that using the same character as delimiter and
> escaper is not a good idea. I'd use a backslash for escaping
> purposes.
>
> I guess you got the idea from VisualBasic,... <snip>
No, not anything Microsoft...
http://java.sun.com/products/jdk/1.1/docs/api/java.text.SimpleDateFormat
.html
This is the "C" and Java standard.
> so if you have further questions about this subject <snip>
No further questions. But thanks for taking the time and effort for
this.
You have made me think of somethings that helped me solve my problem.
Walter
BTW: Here is how I solved it...
// Loop through the format string
while ( i < intFormatLen )
{
// clear token var
token = "";
// Retrieve individual character
curChar = strFormat.charAt(i);
// Build the format tokens
while (( strFormat.charAt(i) == curChar ) && ( i < intFormatLen ))
{
// Add current character to token string
token += strFormat.charAt(i);
// Increment to retrieve next char
i++;
// See if we have a single qoute with a pair next to it
if ( ( token == "'" ) && ( strFormat.charAt(i) != "'" ) )
{
// clear token var
token = "";
// Loop through format string until we see another single
quote
while (( strFormat.charAt(i) != "'" ) && ( i <
intFormatLen ))
{
// Pull out character
cChr = strFormat.charAt(i);
// Add it to to token string if it is not a single quote
token += ( cChr != "'" ) ? cChr : '';
// Increment to retrieve next char
i++;
}
// Increment to retrieve next char
i++;
}
}
// Look at the individual token, one or more characters in length
// pull coorsponding value from format collection
// otherwise just pass the token through
result += formatOptions ( token );
}