Titanium JIRA Archive
Titanium SDK/CLI (TIMOB)

[TIMOB-8906] Android: createHttpClient cannot determine the charset encoding by html meta tag

GitHub Issuen/a
TypeBug
PriorityMedium
StatusClosed
ResolutionFixed
Resolution Date2012-06-29T17:26:22.000+0000
Affected Version/sn/a
Fix Version/sSprint 2012-13 API, Release 3.0.0
ComponentsAndroid
Labelsapi, httpclient, parity, qe-port
ReporterSatoshi Tanaka
AssigneeJosh Roesslein
Created2012-04-26T00:06:53.000+0000
Updated2013-01-15T16:54:03.000+0000

Description

Problem

To determine the character encoding, determining sequence is as follows: * HTTP Header (Content-type: text/html; charset=utf-8) * HTML meta tag (meta http-equiv="Content-Type" content="text/html; charset=utf-8") iOS handring is correct, but Android ignores HTML meta tag encoding name.

Test Case

I make a sample code which accesses each charset sample webpage and display their results.
var win1 = Ti.UI.createWindow({
	backgroundColor: 'white',
});
var text = Ti.UI.createLabel({
	text: '',
});
win1.add(text);
win1.open();
charsetTest("http://kangaechu.com/utf8.html");
charsetTest("http://kangaechu.com/shiftjis.html");
charsetTest("http://kangaechu.com/eucjp.html");
charsetTest("http://kangaechu.com/iso2022jp.html");

function charsetTest(url){
	var xhr = Ti.Network.createHTTPClient();
	
	xhr.onload = function() {
		if(this.status == 200) {
			var html = this.responseText;
			if(html == null){
				text.text = text.text + url + ' is null.\n\n';
			}
			else if(html.match(/<body>([\s\S]*)<\/body>/)){
				var body = RegExp.$1;
				text.text = text.text + url + body;
			}else{
				text.text = text.text + url + ' is 200, but no body element.\n\n';
			}
		}else{
			text.text = text.text + url + ' status code is ' + this.status + '\n\n';
		}
	};
	xhr.onerror = function() {
		text.text = text.text + url + ' status code is ' + this.status + '\n\n';

	};
	xhr.open("GET", url, false);
	xhr.send();
	
}
Result and screenshot is below: ||OS||UTF-8||Shift_JIS||EUC-JP||ISO-2022-JP|| |Android|OK|NG|NG|NG|

Suggested Fix

To fix this bug, set charset in getResponseText() See https://github.com/appcelerator/titanium_mobile/pull/590

Attachments

FileDateSize
120426-0001.png2012-04-26T00:06:53.000+000063205
httpclient_xmlencoding_tests.js2012-06-28T12:42:13.000+00001647

Comments

  1. Josh Roesslein 2012-05-01

    iOS can also detect [encoding](http://www.w3schools.com/xml/xml_encoding.asp) for text/xml content. We should also support this in Android for parity.
  2. Satoshi Tanaka 2012-05-17

    The link for github is not correct. The correct link for github is: https://github.com/appcelerator/titanium_mobile/pull/2143
  3. Josh Roesslein 2012-06-07

    Behavior for responseText property

    This is the correct behavior for getting the response text from the raw data and is currently implemented this way in iOS. 1. Attempt to convert the data using the encoding specified in the Content-Type header. If no encoding is specified, default to UTF-8. If successful, return the generated string. 2. If the data looks like a XML document (maybe use content-type MIME or scan for root xml tag), search for the encoding attribute. If found use this encoding to convert the data and return. 3. If the data looks like a HTML document, try scanning for the charset attribute. If found use this encoding to convert the data and return. 4. Return null if we fail to parse the data. (Maybe log an error?)
  4. Josh Roesslein 2012-06-28

    Attached a test case for detecting the encoding with XML document responses. To test just run the application and select various tests from the picker. You should see the correctly decoded text.
  5. Josh Roesslein 2012-06-28

    Created [PR #2490](https://github.com/appcelerator/titanium_mobile/pull/2490) to resolve issue.
  6. Satoshi Tanaka 2012-07-01

    This patch is great, but attribute value delimiter can use only double quotation. From W3C, attribute delimiter can use double quotation and single quotation. {quote} By default, SGML requires that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). {quote} http://www.w3.org/TR/REC-html40/intro/sgmltut.html#h-3.2.2 Created PR. https://github.com/appcelerator/titanium_mobile/pull/2495
  7. Satoshi Tanaka 2012-07-25

    New PR is below: https://github.com/appcelerator/titanium_mobile/pull/2644 Test case code:
       var win1 = Ti.UI.createWindow({
           backgroundColor: 'white',
       });
       var text = Ti.UI.createLabel({
           text: '',
       });
       win1.add(text);
       win1.open();
       charsetTest("http://kangaechu.com/timob-8906/singleQuotation.html");
       charsetTest("http://kangaechu.com/timob-8906/singleQuotation.xml");
       charsetTest("http://kangaechu.com/timob-8906/doubleQuotation.html");
       charsetTest("http://kangaechu.com/timob-8906/doubleQuotation.xml");
        
       function charsetTest(url){
           var xhr = Ti.Network.createHTTPClient();
            
           xhr.onload = function() {
               if(this.status == 200) {
                   var html = this.responseText;
                   if(html == null){
                       text.text = text.text + url + ' is null.\n\n';
                   }
                   else if(html.match(/<body>([\s\S]*)<\/body>/)){
                       var body = RegExp.$1;
                       text.text = text.text + url + body;
                   }else{
                       text.text = text.text + url + ' is 200, but no body element.\n\n';
                   }
               }else{
                   text.text = text.text + url + ' status code is ' + this.status + '\n\n';
               }
           };
           xhr.onerror = function() {
               text.text = text.text + url + ' status code is ' + this.status + '\n\n';
        
           };
           xhr.open("GET", url, false);
           xhr.send();
            
       }
       
    screenshot before: !http://kangaechu.com/timob-8906/before.png! after: !http://kangaechu.com/timob-8906/after.png!
  8. Olga Romero 2013-01-15

    Closing as fixed. Tested and verified with: Titanium Studio, build: 3.0.1.201212181159 Titanium SDK, build: 3.0.0.GA Mountain Lion 10.8.2 Nexus4 Android version 4.2 Nexus7 Android version 4.1.2 Chrome 23.0

JSON Source