Here's how
I'm stripping out the "non-content" HTML from the Outlook posts.
static
string CleanHtml(string
html) { // strip out
everything up to and including the open body tag Regex start = new
Regex("^(.*\n)*<body[^>]*>",RegexOptions.IgnoreCase |
RegexOptions.Multiline); // strip
out everything from the close body tag to the end Regex end = new
Regex("</body>\\s*</html>",RegexOptions.IgnoreCase |
RegexOptions.Multiline); html =
start.Replace(html,""); html =
end.Replace(html,""); return
html; }
I think what I
really want is to strip out font (family/face, size) formatting so that the
posts can just make use of the styles specified by the consuming
page. However, this will mean that the rss descriptions rendered in
aggregators, such as NewsGator
(which I really like), will not have font information when
rendered. The right thing would seem to be to fork the rss and
the web site data file, which would also make things like dates easier
and categories possible, since the data file wouldn't be constrained by the rss
schema.
a whole mess of RegEx. Scott Hanselman has nicely collected a whole mess of RegEx-related links and resources. Check them out. My favorite is the... [Incessant Ramblings]
This is impressive. Take a few tylenol before further reading...
I finally made Hotmail Popper to work with Radio. It allowed me to blog from Taipei by e-mail last week. The problem was with the IP setting. I shouldn't have used 127.0.0.1 but the real IP (behind the firewall) of the machine.
5:28:18 PM comment [] - See Also: Radio