I’ve implemented new neat feature (to store unique content only once in cache) to my perl based etl tool and suddenly it started to print sometimes ‘Wide character in subroutine entry‘ perl warning in sha1_hex call. As if this was not enough processed content after being stored in cache started to be utf-8 corrupted in comparition to the one stored in the cache.
It took few funny hours with of playing with perl, till I’ve found that sha1_hex function somehow destroyed parameter content but in such a beautiful way, that it was almost impossible to detect it. What the best is the cached content was output of LibXML toString() function, but the XML tree itself (or what) has been also corrupted. Well it must one of that ‘perl secrets’.
Afterwards Google given me some explanation of this activity – I’ve found similar problem reported for md5_sum function from digest::md5 package.
So finally to fix that problem, one must call encode_utf8() on sha1_hex parameter to let sha1_hex work on this copied content.