Problem
To determine the character encoding, determining sequence is as follows:
* HTTP Header (Content-type: text/html; charset=utf-8)
* HTML meta tag (meta http-equiv="Content-Type" content="text/html; charset=utf-8")
iOS handring is correct, but Android ignores HTML meta tag encoding name.
Test Case
I make a sample code which accesses each charset sample webpage and display their results.
var win1 = Ti.UI.createWindow({
backgroundColor: 'white',
});
var text = Ti.UI.createLabel({
text: '',
});
win1.add(text);
win1.open();
charsetTest("http://kangaechu.com/utf8.html");
charsetTest("http://kangaechu.com/shiftjis.html");
charsetTest("http://kangaechu.com/eucjp.html");
charsetTest("http://kangaechu.com/iso2022jp.html");
function charsetTest(url){
var xhr = Ti.Network.createHTTPClient();
xhr.onload = function() {
if(this.status == 200) {
var html = this.responseText;
if(html == null){
text.text = text.text + url + ' is null.\n\n';
}
else if(html.match(/<body>([\s\S]*)<\/body>/)){
var body = RegExp.$1;
text.text = text.text + url + body;
}else{
text.text = text.text + url + ' is 200, but no body element.\n\n';
}
}else{
text.text = text.text + url + ' status code is ' + this.status + '\n\n';
}
};
xhr.onerror = function() {
text.text = text.text + url + ' status code is ' + this.status + '\n\n';
};
xhr.open("GET", url, false);
xhr.send();
}
Result and screenshot is below:
||OS||UTF-8||Shift_JIS||EUC-JP||ISO-2022-JP||
|Android|OK|NG|NG|NG|
Suggested Fix
To fix this bug, set charset in getResponseText()
See
https://github.com/appcelerator/titanium_mobile/pull/590
iOS can also detect [encoding](http://www.w3schools.com/xml/xml_encoding.asp) for text/xml content. We should also support this in Android for parity.
The link for github is not correct. The correct link for github is: https://github.com/appcelerator/titanium_mobile/pull/2143
Behavior for responseText property
This is the correct behavior for getting the response text from the raw data and is currently implemented this way in iOS. 1. Attempt to convert the data using the encoding specified in the Content-Type header. If no encoding is specified, default to UTF-8. If successful, return the generated string. 2. If the data looks like a XML document (maybe use content-type MIME or scan for root xml tag), search for the encoding attribute. If found use this encoding to convert the data and return. 3. If the data looks like a HTML document, try scanning for the charset attribute. If found use this encoding to convert the data and return. 4. Return null if we fail to parse the data. (Maybe log an error?)Attached a test case for detecting the encoding with XML document responses. To test just run the application and select various tests from the picker. You should see the correctly decoded text.
Created [PR #2490](https://github.com/appcelerator/titanium_mobile/pull/2490) to resolve issue.
This patch is great, but attribute value delimiter can use only double quotation. From W3C, attribute delimiter can use double quotation and single quotation. {quote} By default, SGML requires that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). {quote} http://www.w3.org/TR/REC-html40/intro/sgmltut.html#h-3.2.2 Created PR. https://github.com/appcelerator/titanium_mobile/pull/2495
New PR is below: https://github.com/appcelerator/titanium_mobile/pull/2644 Test case code:
screenshot before: !http://kangaechu.com/timob-8906/before.png! after: !http://kangaechu.com/timob-8906/after.png!
Closing as fixed. Tested and verified with: Titanium Studio, build: 3.0.1.201212181159 Titanium SDK, build: 3.0.0.GA Mountain Lion 10.8.2 Nexus4 Android version 4.2 Nexus7 Android version 4.1.2 Chrome 23.0