['parse'] is in these articles:
is a Python XML reader/parser/writer that's been implemented in both pure
Python and C. Using non-standard
encodings in cElementTree. Here is a talk on using
ElementTree to process XML. A recommendation
a simple module for HTML parsing, supposed to be easier to use that the
HTMLParser module for some things.
Soup, is a Python HTML/XML parser for projects like screen-scraping. It is available here too. An example of doing some markup massage to clean up problematic HTML prior to running Beautiful Soup on it.
the Pyparsing module
to implement recursive decent parsers in Python, for when
string.split() is not enough. Includes an example of doing HTML
scraping. There is a support Wiki for this here. O'Reilly is publishing a small book on using this: Getting Started with Pyparsing.
is a command line argument parser that offers some more capabilities
common file format parsers
is a small parsing toolkit for Python
jpeg.py, a module to
parse, read and write JPEG EXIF and COM metadata
- PyMOTW looks at the optparse module 
expressions using the compiler.parse
Using the Python code parser to parse
(in and out) XML
scraping using the parser component of Internet Explorer
The big earthquake
of Dec 26, 2004. India seems to have been hit but news is sparse.
Scotland on Sunday.
The European Space Agency has some satelite
pictures of this. And there are some very good before/after
shows that most of the Indian east coast is 2000km from the epicentre,
while Shrilanka is about 1600-1800km away, so the potential for damage
along the Indian east coast is still very high. Here are some before/after
shots of a few points on the Indian coast. Locating
the earthquake by listening to sound waves in the ocean. In Aug'06
satellite data was used to see gravity
changes caused by this quake.
- Prolink's new PixelView PlayTV media box will be able to play and record video and other forms of media. It has a built in flash card reader and it does not have a built in drive (you add whatever you want with a USB interface which is a great idea). It might (the docs are a bit sparse) only have composite video output, which would be a real dumb move.
on software downloads , I would think there is at least a million
examples of prior art out there (consider the BBS world and FidoNet which surely
predated this, also even older is UUCP)
The SANS Institute runs incidents.org
which tracks the progress of some worms and things (including the Code
Red Worm). Caida.org has some dynamic
graphs of the code red worm's progress. And in the end even Microsoft (hotmail) got hit by the worm. A script that can be used to notify
the victim of code red that their system is infested. Another
script, this one will shut down
the infested system to prevent it from further abuse. Another script,
this one is in python and it just
parses your web server's access log to find the sites to notify (the way it
notifies them is to start a browser on them pointed to a web
page about code red). Some thoughts on what the next
generation (Warhol Worms) of worms might be like.
- In this Show Me Do video Jeff Rush walks through the code of a simple web server built on the twisted platform. This demonstrates Nevow, STAN (for the template language) and an RSS feedparser. 
- Using the ANTLR parser generator to create Python code for parsing discussed here and here. 
- The PyMOTW looks at the urlparse module.  
- PyXML, XML Parsers and API for Python, the project home page is here.  
- Looking at the performance of various HTML parsers for Python (lxml, BeautifulSoup, html5lib, ElementTree, cElementTree, HTMLParser, htmlfill, Genshi, xml.dom.minidom).  
- For an RSS feed file to be valid you need to escape any "<" and ">" bracket characters that are part of the data. This is because the RSS file is XML so these will be taken as XML tokens by the parser in the feed reader. This is an issue because it is quite natural to want to put HTML fragments into the item/description elements. One way to do this is to do a simple substitution of "& l t ;" and "& g t ;" (ignore the embedded spaces) for the two angle brackets. Another thing to note is that because some URLs contain & characters you can run into an issue with parsers thinking those & are the start of an HTML special character sequence, so you also need to replace & with "& a m p ;". This sort of thing would really be much simpler if XML had just included a proper opaque data blob tag from the beginning (or perhaps a special attribute that could be used with any tag), something to indicate that the contained data is a base 64 encoded ASCII string and all the parser is to do is to read it, decode it back to the original form (which may include anything, even non-printable binary) but then do no further parsing on this content. The CDATA is somewhat intended to do this but its not a very clean solution. 
- If you need really fast parsing of XML you might want to take a look at AsmXml, which claims to be able to parse XML at about 200MB/s on an Athlon XP 1800+ type chip. Despite this being an assembly language implementation there are versions for a number of operating systems (presumably all running on X86 chips). 
- LEPL is a recursive descent with full backtracking parser for Python. 
- Creoleparser is a Python library for converting Creole wiki markup to HTML. 
- Mini-XML aims to be a small, portable, XML parser written in ANSI C. 
- The PyMOTX takes a look at the robotparser module that is used to parse the robots.txt file.  
- West Texas may be getting a 600MW wind farm populated with 240 x 2.5MW turbines from China. The cost of this is about $1.5G which makes it under $3/W. This will occupy 36000 acres of land and should supply the needs of about 150,000 homes - which makes for an interesting statistic of 4.2 acres per home, or 150 acres per turbine (which is only 4 turbines per square mile, which seems rather sparse). 
['parse'] is in these pages: