Converting HTML Entities Into Character Code Equivalents

7 Sep2007

I'm working on a job for a client where legacy database data are being used to generate an XML document for processing with an XSLT stylesheet.

The data are encoded HTML entities in the database. So when I created my DOMDocument, I got the following warnings:

Warning: DOMDocument::loadXML() [function.DOMDocument-loadXML]: Entity 'middot' not defined in Entity, line: 963 in /usr/local/www/data-dist/sheds/includes/SDEHSFunctions.php on line 414

Instead of passing in '·' in the XML string to the constructor of the DOMDocument object, I needed to either declare all entities in the XML doctype (bothersome) or I needed to convert these text entities into numeric ones (eg. '·' becomes '·').

I took a look around and found this handy function:

http://php.net/get_html_translation_table

I did a print_r on the translation table returned and found that it returns an array where the key is the actual character represented and the element is the textual HTML entity. So here's a quick function to get the character coded equivalent:

html_entity_convert.txt

This entry was posted on Friday, September 7th, 2007 at 7:01 pm author iain dooley, php, recipe, recipes, xml, html, xslt

blog comments powered by Disqus

Subscribe

Subscribe via RSS

Building software in the real world - the Working Software blog

We write about our experiences, ideas and interests in business, software and the business of software. We also sometimes write about our own products (in order to promote them).