[TIMOB-8906] Android: createHttpClient cannot determine the charset encoding by html meta tag

GitHub Issue	n/a
Type	Bug
Priority	Medium
Status	Closed
Resolution	Fixed
Resolution Date	2012-06-29T17:26:22.000+0000
Affected Version/s	n/a
Fix Version/s	Sprint 2012-13 API, Release 3.0.0
Components	Android
Labels	api, httpclient, parity, qe-port
Reporter	Satoshi Tanaka
Assignee	Josh Roesslein
Created	2012-04-26T00:06:53.000+0000
Updated	2013-01-15T16:54:03.000+0000

Description

Problem

To determine the character encoding, determining sequence is as follows: * HTTP Header (Content-type: text/html; charset=utf-8) * HTML meta tag (meta http-equiv="Content-Type" content="text/html; charset=utf-8") iOS handring is correct, but Android ignores HTML meta tag encoding name.

Test Case

I make a sample code which accesses each charset sample webpage and display their results.

var win1 = Ti.UI.createWindow({
	backgroundColor: 'white',
});
var text = Ti.UI.createLabel({
	text: '',
});
win1.add(text);
win1.open();
charsetTest("http://kangaechu.com/utf8.html");
charsetTest("http://kangaechu.com/shiftjis.html");
charsetTest("http://kangaechu.com/eucjp.html");
charsetTest("http://kangaechu.com/iso2022jp.html");

function charsetTest(url){
	var xhr = Ti.Network.createHTTPClient();
	
	xhr.onload = function() {
		if(this.status == 200) {
			var html = this.responseText;
			if(html == null){
				text.text = text.text + url + ' is null.\n\n';
			}
			else if(html.match(/<body>([\s\S]*)<\/body>/)){
				var body = RegExp.$1;
				text.text = text.text + url + body;
			}else{
				text.text = text.text + url + ' is 200, but no body element.\n\n';
			}
		}else{
			text.text = text.text + url + ' status code is ' + this.status + '\n\n';
		}
	};
	xhr.onerror = function() {
		text.text = text.text + url + ' status code is ' + this.status + '\n\n';

	};
	xhr.open("GET", url, false);
	xhr.send();
	
}

Result and screenshot is below: ||OS||UTF-8||Shift_JIS||EUC-JP||ISO-2022-JP|| |Android|OK|NG|NG|NG|

Suggested Fix

To fix this bug, set charset in getResponseText() See https://github.com/appcelerator/titanium_mobile/pull/590

Attachments

File	Date	Size
120426-0001.png	2012-04-26T00:06:53.000+0000	63205
httpclient_xmlencoding_tests.js	2012-06-28T12:42:13.000+0000	1647

Comments

Josh Roesslein 2012-05-01

iOS can also detect [encoding](http://www.w3schools.com/xml/xml_encoding.asp) for text/xml content. We should also support this in Android for parity.
Satoshi Tanaka 2012-05-17

The link for github is not correct. The correct link for github is: https://github.com/appcelerator/titanium_mobile/pull/2143
Josh Roesslein 2012-06-07

Behavior for responseText property
This is the correct behavior for getting the response text from the raw data and is currently implemented this way in iOS. 1. Attempt to convert the data using the encoding specified in the Content-Type header. If no encoding is specified, default to UTF-8. If successful, return the generated string. 2. If the data looks like a XML document (maybe use content-type MIME or scan for root xml tag), search for the encoding attribute. If found use this encoding to convert the data and return. 3. If the data looks like a HTML document, try scanning for the charset attribute. If found use this encoding to convert the data and return. 4. Return null if we fail to parse the data. (Maybe log an error?)
Josh Roesslein 2012-06-28

Attached a test case for detecting the encoding with XML document responses. To test just run the application and select various tests from the picker. You should see the correctly decoded text.
Josh Roesslein 2012-06-28

Created [PR #2490](https://github.com/appcelerator/titanium_mobile/pull/2490) to resolve issue.
Satoshi Tanaka 2012-07-01

This patch is great, but attribute value delimiter can use only double quotation. From W3C, attribute delimiter can use double quotation and single quotation. {quote} By default, SGML requires that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). {quote} http://www.w3.org/TR/REC-html40/intro/sgmltut.html#h-3.2.2 Created PR. https://github.com/appcelerator/titanium_mobile/pull/2495

Satoshi Tanaka 2012-07-25

New PR is below: https://github.com/appcelerator/titanium_mobile/pull/2644 Test case code:

   var win1 = Ti.UI.createWindow({
       backgroundColor: 'white',
   });
   var text = Ti.UI.createLabel({
       text: '',
   });
   win1.add(text);
   win1.open();
   charsetTest("http://kangaechu.com/timob-8906/singleQuotation.html");
   charsetTest("http://kangaechu.com/timob-8906/singleQuotation.xml");
   charsetTest("http://kangaechu.com/timob-8906/doubleQuotation.html");
   charsetTest("http://kangaechu.com/timob-8906/doubleQuotation.xml");
    
   function charsetTest(url){
       var xhr = Ti.Network.createHTTPClient();
        
       xhr.onload = function() {
           if(this.status == 200) {
               var html = this.responseText;
               if(html == null){
                   text.text = text.text + url + ' is null.\n\n';
               }
               else if(html.match(/<body>([\s\S]*)<\/body>/)){
                   var body = RegExp.$1;
                   text.text = text.text + url + body;
               }else{
                   text.text = text.text + url + ' is 200, but no body element.\n\n';
               }
           }else{
               text.text = text.text + url + ' status code is ' + this.status + '\n\n';
           }
       };
       xhr.onerror = function() {
           text.text = text.text + url + ' status code is ' + this.status + '\n\n';
    
       };
       xhr.open("GET", url, false);
       xhr.send();
        
   }

screenshot before: !http://kangaechu.com/timob-8906/before.png! after: !http://kangaechu.com/timob-8906/after.png!

Olga Romero 2013-01-15

Closing as fixed. Tested and verified with: Titanium Studio, build: 3.0.1.201212181159 Titanium SDK, build: 3.0.0.GA Mountain Lion 10.8.2 Nexus4 Android version 4.2 Nexus7 Android version 4.1.2 Chrome 23.0

JSON Source