Archive

Posts Tagged ‘perl’

perl-XML::LibXML + global external entity loader

May 15th, 2011 No comments

Just created a quick patch against perl XML::LibXML module, that adds global external entity loader support. Till now it was only possible to have per instance entity loader, but this is not enough if you want i.e., XML::LibXSLT to also use yours entity loader for imports, and input callbacks doesn’t suit all your needs.

Usage is simple:

XML::LibXML::externalEntityLoader(\&_entity_handler);

where _entity_handler is subroutine like by option ext_ent_handler described in (http://search.cpan.org/dist/XML-LibXML/lib/XML/LibXML/Parser.pod#PARSER_OPTIONS).

Note: when you define global entity loader, per instance entity loader is simply ignored.

You can download this patch  from http://devel.dob.sk/patches/perl-XML::LibXML+global_entity_loader-0.1.diff. Just download XML::LibXML from cpan, patch it and install. I’ll try to push it to CPAN if possible ;-)

Categories: devel, perl, xml Tags: ,

perl utf8 and using Digest functions

April 27th, 2011 No comments

I’ve implemented new neat feature (to store unique content only once in cache) to my perl based etl tool and suddenly it started to print sometimes ‘Wide character in subroutine entry perl warning in sha1_hex call. As if this was not enough processed content after being stored in cache  started to be utf-8 corrupted in comparition to the one stored in the cache.

It took few funny hours with of playing with perl, till I’ve found that sha1_hex function somehow destroyed parameter content but in such a beautiful way, that it was almost impossible to detect it. What the best is the cached content was output of LibXML toString() function, but the XML tree itself (or what) has been also corrupted. Well it must one of that  ‘perl secrets’.

Afterwards Google given me some explanation of this activity – I’ve found similar problem reported for md5_sum function from digest::md5 package.

So finally to fix that problem, one must call encode_utf8() on sha1_hex parameter to let sha1_hex work on this copied content.

 

Categories: devel Tags: ,