[TIMOB-7434] Android: V8 & Rhino - keyDispatchingTimedOut logged then crash

GitHub Issue	n/a
Type	Bug
Priority	Critical
Status	Closed
Resolution	Duplicate
Resolution Date	2012-03-12T14:38:12.000+0000
Affected Version/s	Release 1.8.1
Fix Version/s	n/a
Components	Android
Labels	qe-and012312, qe-and021312
Reporter	Eric Merriman
Assignee	Hieu Pham
Created	2012-01-26T15:13:16.000+0000
Updated	2017-03-22T22:03:00.000+0000

Description

Description: While testing Kitchensink for about 30-45 minutes, the Droid 1 with a V8 compile crashed. Almost simultaneously the Nexus S with Rhino compile started logging Garbage collection notifications, then crashed. Crash trace files are attached for each device. I attempted to reproduce the crashes by performing a subset of the preceding tests but was unable to. Console out for each device:

 
Droid V8

E/ActivityManager( 1096): ANR in com.appcelerator.titanium (com.appcelerator.titanium/.KitchensinkActivity)
E/ActivityManager( 1096): Reason: keyDispatchingTimedOut
E/ActivityManager( 1096): Load: 0.21 / 0.43 / 0.28
E/ActivityManager( 1096): CPU usage from 7351ms to 2220ms ago:
E/ActivityManager( 1096):   system_server: 13% = 10% user + 3% kernel / faults: 1515 minor
E/ActivityManager( 1096):   droid.gallery3d: 3% = 2% user + 0% kernel / faults: 796 minor 1 major
E/ActivityManager( 1096):   e.process.gapps: 1% = 1% user + 0% kernel / faults: 104 minor
E/ActivityManager( 1096):   battd: 0% = 0% user + 0% kernel / faults: 5 minor
E/ActivityManager( 1096):   putmethod.latin: 0% = 0% user + 0% kernel / faults: 51 minor
E/ActivityManager( 1096):   erator.titanium: 0% = 0% user + 0% kernel
E/ActivityManager( 1096):   events/0: 0% = 0% user + 0% kernel
E/ActivityManager( 1096):   omap2_mcspi: 0% = 0% user + 0% kernel
E/ActivityManager( 1096):   cpcap_irq/0: 0% = 0% user + 0% kernel
E/ActivityManager( 1096):   akmd2: 0% = 0% user + 0% kernel
E/ActivityManager( 1096):   oid.voicesearch: 0% = 0% user + 0% kernel
E/ActivityManager( 1096):   torola.calendar: 0% = 0% user + 0% kernel / faults: 14 minor
E/ActivityManager( 1096):   d.process.media: 0% = 0% user + 0% kernel
E/ActivityManager( 1096):   com.amazon.mp3: 0% = 0% user + 0% kernel
E/ActivityManager( 1096):   ogle.android.gm: 0% = 0% user + 0% kernel / faults: 2 minor
E/ActivityManager( 1096): TOTAL: 21% = 16% user + 5% kernel + 0% irq

Nexus S Rhino

E/ActivityManager(  110): ANR in com.appcelerator.titanium (com.appcelerator.titanium/org.appcelerator.titanium.TiActivity)
E/ActivityManager(  110): Reason: keyDispatchingTimedOut
E/ActivityManager(  110): Load: 4.85 / 3.01 / 2.27
E/ActivityManager(  110): CPU usage from 11668ms to 4ms ago with 99% awake:
E/ActivityManager(  110):   93% 2203/com.appcelerator.titanium: 87% user + 5.9% kernel / faults: 2356 minor
E/ActivityManager(  110):   5% 110/system_server: 1.3% user + 3.6% kernel / faults: 10 minor
E/ActivityManager(  110):   0% 14/kondemand/0: 0% user + 0% kernel
E/ActivityManager(  110):   0% 52/gp2a_wq: 0% user + 0% kernel
E/ActivityManager(  110):   0% 176/dhd_dpc: 0% user + 0% kernel
E/ActivityManager(  110):   0% 179/com.android.systemui: 0% user + 0% kernel / faults: 4 minor
E/ActivityManager(  110):   0% 249/wpa_supplicant: 0% user + 0% kernel
E/ActivityManager(  110): 98% TOTAL: 88% user + 9.9% kernel
E/ActivityManager(  110): CPU usage from 271ms to 785ms later:
E/ActivityManager(  110):   92% 2203/com.appcelerator.titanium: 90% user + 1.9% kernel / faults: 162 minor
E/ActivityManager(  110):     64% 2211/KrollRuntimeThr: 64% user + 0% kernel
E/ActivityManager(  110):     17% 2205/GC: 17% user + 0% kernel
E/ActivityManager(  110):     9.8% 2464/er$SensorThread: 7.8% user + 1.9% kernel
E/ActivityManager(  110):   5.8% 110/system_server: 0% user + 5.8% kernel
E/ActivityManager(  110):     1.9% 121/system_server: 0% user + 1.9% kernel
E/ActivityManager(  110):     1.9% 147/InputDispatcher: 0% user + 1.9% kernel
E/ActivityManager(  110): 98% TOTAL: 90% user + 7.8% kernel

Steps to reproduce: 1) Install Kitchensink compiled with 1.0.8.201201210622 2) Run through each test sequentially 3) Observe the console output Result: The devices either report GC activity and then crash, or simply crash Expected: No crash, normal GC activity

Attachments

File	Date	Size
kindle fire v8 1.8.2.X log.txt	2012-02-25T13:08:23.000+0000	42944
kindle fire v8 1.8.2.X traces.txt	2012-02-25T13:08:23.000+0000	75321
tracesDroidv8.txt	2012-01-26T15:13:16.000+0000	102853
tracesNexSRhino.txt	2012-01-26T15:13:16.000+0000	48010
V8console.rtf	2012-01-26T17:35:59.000+0000	66967

Comments

Eric Merriman 2012-01-26

Additional note, on the Nexus S Rhino compile, after the crash, I selected Kill. Even as the device sat on the home screen, the following continued to log over and over, almost 24,000 times before I pulled the cable. This may just be the result of the crash putting the device in a bad state, but in case there is value here:

   
   W/TiJSError( 2672): (main) [2,266411] Activity is null or already finishing, skipping dialog.
   E/RhinoObject( 2672): (KrollRuntimeThread) [4,266415] TypeError: Cannot find default value for object. (/tmp/events.js#94)
   E/RhinoObject( 2672): org.mozilla.javascript.EcmaError: TypeError: Cannot find default value for object. (/tmp/events.js#94)
   E/RhinoObject( 2672): 	at org.mozilla.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3784)
   E/RhinoObject( 2672): 	at org.mozilla.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3762)
   E/RhinoObject( 2672): 	at org.mozilla.javascript.ScriptRuntime.typeError(ScriptRuntime.java:3790)
   E/RhinoObject( 2672): 	at org.mozilla.javascript.ScriptRuntime.typeError1(ScriptRuntime.java:3802)
   E/RhinoObject( 2672): 	at org.mozilla.javascript.ScriptableObject.getDefaultValue(ScriptableObject.java:942)
   E/RhinoObject( 2672): 	at org.mozilla.javascript.ScriptableObject.getDefaultValue(ScriptableObject.java:861)
   E/RhinoObject( 2672): 	at org.mozilla.javascript.ScriptRuntime.toString(ScriptRuntime.java:794)
   E/RhinoObject( 2672): 	at org.mozilla.javascript.ScriptRuntime.notFunctionError(ScriptRuntime.java:3871)
   E/RhinoObject( 2672): 	at org.mozilla.javascript.ScriptRuntime.getPropFunctionAndThisHelper(ScriptRuntime.java:2359)
   E/RhinoObject( 2672): 	at org.mozilla.javascript.ScriptRuntime.getPropFunctionAndThis(ScriptRuntime.java:2326)
   E/RhinoObject( 2672): 	at org.appcelerator.kroll.runtime.rhino.js.events._c_anonymous_3(/tmp/events.js:94)
   E/RhinoObject( 2672): 	at org.appcelerator.kroll.runtime.rhino.js.events.call(/tmp/events.js)
   E/RhinoObject( 2672): 	at org.mozilla.javascript.ContextFactory.doTopCall(ContextFactory.java:426)
   E/RhinoObject( 2672): 	at org.mozilla.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3177)
   E/RhinoObject( 2672): 	at org.appcelerator.kroll.runtime.rhino.js.events.call(/tmp/events.js)
   E/RhinoObject( 2672): 	at org.appcelerator.kroll.runtime.rhino.RhinoObject.fireEvent(RhinoObject.java:73)
   E/RhinoObject( 2672): 	at org.appcelerator.kroll.KrollProxy.doFireEvent(KrollProxy.java:441)
   E/RhinoObject( 2672): 	at org.appcelerator.kroll.KrollProxy.handleMessage(KrollProxy.java:602)
   E/RhinoObject( 2672): 	at android.os.Handler.dispatchMessage(Handler.java:95)
   E/RhinoObject( 2672): 	at android.os.Looper.loop(Looper.java:130)
   E/RhinoObject( 2672): 	at org.appcelerator.kroll.KrollRuntime$KrollRuntimeThread.run(KrollRuntime.java:102)

Bill Dawson 2012-01-26

I'm looking at the **RHINO** part. Someone is very welcome to look at the V8 part simultaneously. As there is only one ticket, I'll take it for now.
Bill Dawson 2012-01-26

Further info from eric, in Rhino here's what was happening at one of the lockups: D/Module ( 2203): Loading module: ../examples/accelerometer -> Resources/examples/accelerometer.js for Rhino
Eric Merriman 2012-01-26

FOR V8: Relevant information since these are potentially different issues: I encountered a runtime error in a picker test where it allows you to remove various items from the list and will error when it runs out. I killed the app and used Settings>Applications>Force quit and then erased application data. I also encountered a geolocation error since the device was left pointing to our staging network environment. I fixed the wifi setting and continued. Shortly after I got the crash. Just reviewed this with Hieu. Attaching file "V8console" for more detail. Hieu suggested I retry the error prior to getting the crash. Will update.
Eric Merriman 2012-01-26

FOR V8: Attempted to reproduce the error with geolocation that preceded the ANR. I was unsuccessful but ran through a few of the KS tests prior to the ANR. I could not reproduce the ANR either.
Bill Dawson 2012-01-26

(RHINO) I can't reproduce the problem, but the rhino trace from Eric shows that the KrollRuntimeThread gets suspended somehow when event.js line 42 calls handler.call, which leads to Rhino's ScriptRuntime calling getPropFunctionAndThis (see [Rhino code](https://github.com/appcelerator/rhino_titanium/blob/master/src/org/mozilla/javascript/ScriptRuntime.java#L2306)), which _I think_ means trying to find call in handler. If you look at getPropFunctionAndThisHelper in Rhino (just below getPropFunctionAndThis), you'll see there are plenty of places where exceptions can be thrown, such as if call isn't found or it's not callable. This could happen if handler is undefined or null, I imagine. Anyway, I fear this may be happening because of an attempt to call a null/undefined handler. But it's not clear to me why -- if there is an exception being thrown from Rhino-- we don't get a chance to deal with it (don't we have some kind of global exception handler in Rhino?) So I'm submitting a pull request that protects us from what I'm describing, but as I can't reproduce it, I'm not sure what we'll do for functional testing it. Maybe Eric can re-do a couple of passes on KS on his failing device.
Bill Dawson 2012-01-26

(V8) Just from looking at the trace (I haven't done anything else since I'm working on Rhino), it seems that this is a case where the application has been backed-out-of and there are still geolocation events happening, then the application is being re-entered. While it's re-entering and re-initing the runtime, another geolocation event fires. Some sort of deadlock then occurs. It seems like one opportunity for deadlock would be if V8Object.cpp's Java_org_appcelerator_kroll_runtime_v8_V8Object_nativeFireEvent hit line 104 where it tries V8Util::openJSErrorDialog(tryCatch); which presumably tries to do something on the main thread, which is waiting for the waitForInit latch.
Bill Dawson 2012-01-26

Pull request ready for RHINO portion https://github.com/appcelerator/titanium_mobile/pull/1286
Bill Dawson 2012-01-27

(RHINO) Now that I'm using a device (the only Android handheld I have, a crappy 2.2 HTC Desire that's almost two years old) instead of emulator, it's actually very easy to re-create the lockup in Rhino: 1) Enter KitchenSink. 2) Go to Phone. 3) Go to Geolocation. After a bit in there, go back. 4) Go to Accelerometer. After a bit in there, go back. 5) Back out of KS altogether. 6) Go back in to KS. Here it locks. With my patch (PR 1286), it doesn't lockup.
Thomas Huelbert 2012-02-09

will close on next deep pass
Dustin Hyde 2012-02-25

Reopening. SDK: 1.8.2.v20120223174636 Android: V8 Studio: 1.0.9.201202141208 OS: Snow Leopard Devices Tested: Kindle Fire 2.3.3 Kitchen Sink Nook performance gradually slows down, followed by an eventual crash. Log and stack trace attached.
Hieu Pham 2012-03-09

I was able to consistently reproduce this issue on V8 on 2.3.3 device. ANR is basically a time out for Android main thread. In other words, if the main thread is not responding for more than 5 seconds, ANR will occur. In this case, ANR is caused by leaking memory. When heap size approaches the limit, performance drastically decreases. I was able to reproduce this by leaking memory on purpose by repeatedly open/closing Base UI -> Views -> ImageViews -> Image View(layout 2) until my heap was 23/24mb. Here's one of the logs: http://pastie.org/private/cuy3dvuwxytvf9sqyrqnkq At about 20/24mb, the app takes roughly 3-4 seconds to respond to my clicks, so I'm not surprised that it will take 5+ seconds at 23/24mb. Based on these findings, and analysis of the most recent log/trace, this issue is mostly likely a side effect of memory leaks. I'll be linking this bug with other memory related bugs, so we can re-visit this after those memory issues are fixed.
Marshall Culpepper 2012-03-12

The general ANR error can be caused by a number of factors, including: 1) Running out of heap space 2) Indefinitely blocking the main thread The most recent re-open of this issue is primarily concerned with the ANRs found from TIMOB-7897 Since ANRs can be caused by many factors, it would be best if we opened separate ANR issues for each test case that we encounter them in, since there might be different underlying causes
Lee Morris 2017-03-22

Closing ticket as duplicate and links to the related ticket has been provided above.

JSON Source