We can add a <!DOCTYPE>
element to our XML output in Cocoon very easy. The default XMLSerializer can be configured to include a <!DOCTYPE>
element. We only then have to use this new configured serializer and the <!DOCTYPE>
is added to the output.
Consider a scenario where we have to convert an XML file to another XML format. The original XML file contains several HTML entities (like °) in CDATA sections. We want to transform those entities to the UNICODE equivalents in our output XML format. Here is the sample input XML:
<?xml version="1.0" encoding="UTF-8"?> <input> <sample><![CDATA[The current temperature is 17 °C.]]></sample> </input>
We can transform this XML with a simple XSL transformation:
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <output> <xsl:apply-templates/> </output> </xsl:template> <xsl:template match="sample"> <xsl:copy> <xsl:value-of select="." disable-output-escaping="yes"/> </xsl:copy> </xsl:template> </xsl:stylesheet>
Now we create a simple Cocoon pipeline to do this transformation:
<map:match pattern="article"> <map:generate src="resources/input.xml"/> <map:transform src="xslt/article.xslt"/> <map:serialize type="xml"/> </map:match>
This produces the following output XML:
<?xml version="1.0" encoding="UTF-8"?> <output> <sample>The current temperature is 17 °C.</sample> </output>
Looks fine, but if we want this output XML to be parsed we get an error saying that ° is an unknown entity. And that is correct. An XML parser cannot resolve this entity. Therefore we need to add a <!DOCTYPE>
section to the output XML. This <!DOCTYPE>
must reference the entity ° and provide an alternative value. The good thing is we can use a XHTML DTD with a list of HTML entities. One of the entities is °. In Cocoon we create a new serializer:
<map:serializer logger="sitemap.serializer.xml" mime-type="text/xml" name="xml-entity" src="org.apache.cocoon.serialization.XMLSerializer"> <doctype-public>-//W3C//DTD XHTML 1.1//EN</doctype-public> <doctype-system>xhtml11-flat.dtd</doctype-system> </map:serializer>
Notice we use the standard XMLSerializer, we only add two configuration elements: doctype-public and doctype-system. The values will be added to the output XML. We change our pipeline and use this new serializer:
<map:match pattern="article"> <map:generate src="resources/input.xml"/> <map:transform src="xslt/article.xslt"/> <map:serialize type="xml-entity"/> </map:match>
And now we get the following XML output if we run the pipeline again:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE output PUBLIC "-//W3C//DTD XHTML 1.1//EN" "xhtml11-flat.dtd"> <output> <sample>The current temperature is 17 °C.</sample> </output>
And the entity can now be resolved and we see a nice little degree sign. This is just a simple example. We can extend this and use our own DTD for example.