An emerging trend in feed syndication is the inclusion of microformats. Besides the semantics defined by individual feed formats, publishers can add additional semantics using rel and class attributes in embedded HTML content.
Note
To parse microformats. Universal Feed Parser relies on a third-party library called Beautiful Soup, which is distributed separately. If Beautiful Soup is not installed, Universal Feed Parser will silently skip microformats parsing.
The following elements are parsed for microformats:
The rel=enclosure microformat provides a way for embedded HTML content to specify that a certain link should be treated as an enclosure. Universal Feed Parser looks for links within embedded markup that meet any of the following conditions:
When Universal Feed Parser finds a link that satisfies any of these conditions, it adds it to entries[i].enclosures.
Parsing embedded enclosures
>>> import feedparser
>>> d = feedparser.parse('http://feedparser.org/docs/examples/rel-enclosure.xml')
>>> d.entries[0].enclosures
[{u'href': u'http://example.com/movie.mp4', 'title': u'awesome movie'}]
The rel=tag microformat allows you to define tags within embedded HTML content. Universal Feed Parser looks for these attribute values in embedded markup and maps them to entries[i].tags.
Parsing embedded tags
>>> import feedparser
>>> d = feedparser.parse('http://feedparser.org/docs/examples/rel-tag.xml')
>>> d.entries[0].tags
[{'term': u'tech', 'scheme': u'http://del.icio.us/tag/', 'label': u'Technology'}]
The XFN microformat allows you to define human relationships between URIs. For example, you could link from your weblog to your spouse’s weblog with the rel="spouse" relation. It is intended primarily for “blogrolls” or other static lists of links, but the relations can occur anywhere in HTML content. If found, Universal Feed Parser will return the XFN information in entries[i].xfn.
Universal Feed Parser supports all of the relationships listed in the XFN 1.1 profile, as well as the following variations:
Parsing XFN relationships
>>> import feedparser
>>> d = feedparser.parse('http://feedparser.org/docs/examples/xfn.xml')
>>> person = d.entries[0].xfn[0]
>>> person.name
u'John Doe'
>>> person.href
u'http://example.com/johndoe'
>>> person.relationships
[u'coworker', u'friend']
The hCard microformat allows you to embed address book information within HTML content. If Universal Feed Parser finds an hCard within supported elements, it converts it into an RFC 2426-compliant vCard and returns it in entries[i].vcard.
Converting embedded hCard markup into a vCard
>>> import feedparser
>>> d = feedparser.parse('http://feedparser.org/docs/examples/hcard.xml')
>>> print d.entries[0].vcard
BEGIN:vCard
VERSION:3.0
FN:Frank Dawson
N:Dawson;Frank
ADR;TYPE=work,postal,parcel:;;6544 Battleford Drive;Raleigh;NC;27613-3502;U
.S.A.
TEL;TYPE=WORK,VOICE,MSG:+1-919-676-9515
TEL;TYPE=WORK,FAX:+1-919-676-9564
EMAIL;TYPE=internet,pref:Frank_Dawson at Lotus.com
EMAIL;TYPE=internet:fdawson at earthlink.net
ORG:Lotus Development Corporation
URL:http://home.earthlink.net/~fdawson
END:vCard
BEGIN:vCard
VERSION:3.0
FN:Tim Howes
N:Howes;Tim
ADR;TYPE=work:;;501 E. Middlefield Rd.;Mountain View;CA;94043;U.S.A.
TEL;TYPE=WORK,VOICE,MSG:+1-415-937-3419
TEL;TYPE=WORK,FAX:+1-415-528-4164
EMAIL;TYPE=internet:howes at netscape.com
ORG:Netscape Communications Corp.
END:vCard
Note
There are a growing number of microformats, and Universal Feed Parser does not parse all of them. However, both the rel and class attributes survive HTML sanitizing, so applications built on Universal Feed Parser that wish to parse additional microformat content are free to do so.