Php xml encoding error

Php – Fix “Input is not proper UTF-8, indicate encoding” error when loading xml

When loading xml files in php through simplexml_load_string or domDocument class, sometimes an error like this might popup

The error occurs when the xml has some invalid characters that do not fit in the utf-8 character set. The solution to fix this error is quite simple. Just convert the entire xml string to ut8 first and then load.

The utf8_encode function will convert the string to proper utf8 and invalid characters would be fixed, making the xml parseable by simplexml or domdocument.

A Tech Enthusiast, Blogger, Linux Fan and a Software Developer. Writes about Computer hardware, Linux and Open Source software and coding in Python, Php and Javascript. He can be reached at [email protected] .


The “strait answer” that worked for me was encoding the load_string.

I was already encoding in the API call, that had fixed an issue before, but adding it here fixed my issue.

kindly publish the ‘straight answer to this error message’. thanks.

It was very helpful for me too

Thank you for this article. It waz useful for me.

Can you tell me exactly what you did to fix this issue?

Here’s my line of code: simplexml_load_string(utf8_encode(strip_tags($_product->getDescription())));

In my case the product description contains some invalid chars and some html tags so I have to use the strip_tags function to remove them and then a can use the other 2 functions to proper encode the string.

Читайте также:  Ошибка exception processing message 0xc00000a3

Hope this help’s you.

After much Googling, I finally find a straight answer to this error message. Thanks!

kindly publish the ‘straight answer to this error message’. thanks.


Character Encoding

PHP’s XML extension supports the » Unicode character set through different character encoding s. There are two types of character encodings, source encoding and target encoding . PHP’s internal representation of the document is always encoded with UTF-8 .

Source encoding is done when an XML document is parsed. Upon creating an XML parser, a source encoding can be specified (this encoding can not be changed later in the XML parser’s lifetime). The supported source encodings are ISO-8859-1 , US-ASCII and UTF-8 . The former two are single-byte encodings, which means that each character is represented by a single byte. UTF-8 can encode characters composed by a variable number of bits (up to 21) in one to four bytes. The default source encoding used by PHP is ISO-8859-1 .

Target encoding is done when PHP passes data to XML handler functions. When an XML parser is created, the target encoding is set to the same as the source encoding, but this may be changed at any point. The target encoding will affect character data as well as tag names and processing instruction targets.

If the XML parser encounters characters outside the range that its source encoding is capable of representing, it will return an error.

If PHP encounters characters in the parsed XML document that can not be represented in the chosen target encoding, the problem characters will be «demoted». Currently, this means that such characters are replaced by a question mark.

Читайте также:  This software license checkout failed error 20