[TIMOB-7434] Android: V8 & Rhino - keyDispatchingTimedOut logged then crash
GitHub Issue | n/a |
---|---|
Type | Bug |
Priority | Critical |
Status | Closed |
Resolution | Duplicate |
Resolution Date | 2012-03-12T14:38:12.000+0000 |
Affected Version/s | Release 1.8.1 |
Fix Version/s | n/a |
Components | Android |
Labels | qe-and012312, qe-and021312 |
Reporter | Eric Merriman |
Assignee | Hieu Pham |
Created | 2012-01-26T15:13:16.000+0000 |
Updated | 2017-03-22T22:03:00.000+0000 |
Description
Description:
While testing Kitchensink for about 30-45 minutes, the Droid 1 with a V8 compile crashed. Almost simultaneously the Nexus S with Rhino compile started logging Garbage collection notifications, then crashed. Crash trace files are attached for each device. I attempted to reproduce the crashes by performing a subset of the preceding tests but was unable to.
Console out for each device:
Droid V8
E/ActivityManager( 1096): ANR in com.appcelerator.titanium (com.appcelerator.titanium/.KitchensinkActivity)
E/ActivityManager( 1096): Reason: keyDispatchingTimedOut
E/ActivityManager( 1096): Load: 0.21 / 0.43 / 0.28
E/ActivityManager( 1096): CPU usage from 7351ms to 2220ms ago:
E/ActivityManager( 1096): system_server: 13% = 10% user + 3% kernel / faults: 1515 minor
E/ActivityManager( 1096): droid.gallery3d: 3% = 2% user + 0% kernel / faults: 796 minor 1 major
E/ActivityManager( 1096): e.process.gapps: 1% = 1% user + 0% kernel / faults: 104 minor
E/ActivityManager( 1096): battd: 0% = 0% user + 0% kernel / faults: 5 minor
E/ActivityManager( 1096): putmethod.latin: 0% = 0% user + 0% kernel / faults: 51 minor
E/ActivityManager( 1096): erator.titanium: 0% = 0% user + 0% kernel
E/ActivityManager( 1096): events/0: 0% = 0% user + 0% kernel
E/ActivityManager( 1096): omap2_mcspi: 0% = 0% user + 0% kernel
E/ActivityManager( 1096): cpcap_irq/0: 0% = 0% user + 0% kernel
E/ActivityManager( 1096): akmd2: 0% = 0% user + 0% kernel
E/ActivityManager( 1096): oid.voicesearch: 0% = 0% user + 0% kernel
E/ActivityManager( 1096): torola.calendar: 0% = 0% user + 0% kernel / faults: 14 minor
E/ActivityManager( 1096): d.process.media: 0% = 0% user + 0% kernel
E/ActivityManager( 1096): com.amazon.mp3: 0% = 0% user + 0% kernel
E/ActivityManager( 1096): ogle.android.gm: 0% = 0% user + 0% kernel / faults: 2 minor
E/ActivityManager( 1096): TOTAL: 21% = 16% user + 5% kernel + 0% irq
Nexus S Rhino
E/ActivityManager( 110): ANR in com.appcelerator.titanium (com.appcelerator.titanium/org.appcelerator.titanium.TiActivity)
E/ActivityManager( 110): Reason: keyDispatchingTimedOut
E/ActivityManager( 110): Load: 4.85 / 3.01 / 2.27
E/ActivityManager( 110): CPU usage from 11668ms to 4ms ago with 99% awake:
E/ActivityManager( 110): 93% 2203/com.appcelerator.titanium: 87% user + 5.9% kernel / faults: 2356 minor
E/ActivityManager( 110): 5% 110/system_server: 1.3% user + 3.6% kernel / faults: 10 minor
E/ActivityManager( 110): 0% 14/kondemand/0: 0% user + 0% kernel
E/ActivityManager( 110): 0% 52/gp2a_wq: 0% user + 0% kernel
E/ActivityManager( 110): 0% 176/dhd_dpc: 0% user + 0% kernel
E/ActivityManager( 110): 0% 179/com.android.systemui: 0% user + 0% kernel / faults: 4 minor
E/ActivityManager( 110): 0% 249/wpa_supplicant: 0% user + 0% kernel
E/ActivityManager( 110): 98% TOTAL: 88% user + 9.9% kernel
E/ActivityManager( 110): CPU usage from 271ms to 785ms later:
E/ActivityManager( 110): 92% 2203/com.appcelerator.titanium: 90% user + 1.9% kernel / faults: 162 minor
E/ActivityManager( 110): 64% 2211/KrollRuntimeThr: 64% user + 0% kernel
E/ActivityManager( 110): 17% 2205/GC: 17% user + 0% kernel
E/ActivityManager( 110): 9.8% 2464/er$SensorThread: 7.8% user + 1.9% kernel
E/ActivityManager( 110): 5.8% 110/system_server: 0% user + 5.8% kernel
E/ActivityManager( 110): 1.9% 121/system_server: 0% user + 1.9% kernel
E/ActivityManager( 110): 1.9% 147/InputDispatcher: 0% user + 1.9% kernel
E/ActivityManager( 110): 98% TOTAL: 90% user + 7.8% kernel
Steps to reproduce:
1) Install Kitchensink compiled with 1.0.8.201201210622
2) Run through each test sequentially
3) Observe the console output
Result:
The devices either report GC activity and then crash, or simply crash
Expected:
No crash, normal GC activity
Attachments
File | Date | Size |
---|---|---|
kindle fire v8 1.8.2.X log.txt | 2012-02-25T13:08:23.000+0000 | 42944 |
kindle fire v8 1.8.2.X traces.txt | 2012-02-25T13:08:23.000+0000 | 75321 |
tracesDroidv8.txt | 2012-01-26T15:13:16.000+0000 | 102853 |
tracesNexSRhino.txt | 2012-01-26T15:13:16.000+0000 | 48010 |
V8console.rtf | 2012-01-26T17:35:59.000+0000 | 66967 |
Additional note, on the Nexus S Rhino compile, after the crash, I selected Kill. Even as the device sat on the home screen, the following continued to log over and over, almost 24,000 times before I pulled the cable. This may just be the result of the crash putting the device in a bad state, but in case there is value here:
I'm looking at the **RHINO** part. Someone is very welcome to look at the V8 part simultaneously. As there is only one ticket, I'll take it for now.
Further info from eric, in Rhino here's what was happening at one of the lockups: D/Module ( 2203): Loading module: ../examples/accelerometer -> Resources/examples/accelerometer.js for Rhino
FOR V8: Relevant information since these are potentially different issues: I encountered a runtime error in a picker test where it allows you to remove various items from the list and will error when it runs out. I killed the app and used Settings>Applications>Force quit and then erased application data. I also encountered a geolocation error since the device was left pointing to our staging network environment. I fixed the wifi setting and continued. Shortly after I got the crash. Just reviewed this with Hieu. Attaching file "V8console" for more detail. Hieu suggested I retry the error prior to getting the crash. Will update.
FOR V8: Attempted to reproduce the error with geolocation that preceded the ANR. I was unsuccessful but ran through a few of the KS tests prior to the ANR. I could not reproduce the ANR either.
(RHINO) I can't reproduce the problem, but the rhino trace from Eric shows that the KrollRuntimeThread gets suspended somehow when event.js line 42 calls
handler.call
, which leads to Rhino's ScriptRuntime callinggetPropFunctionAndThis
(see [Rhino code](https://github.com/appcelerator/rhino_titanium/blob/master/src/org/mozilla/javascript/ScriptRuntime.java#L2306)), which _I think_ means trying to findcall
inhandler
. If you look atgetPropFunctionAndThisHelper
in Rhino (just belowgetPropFunctionAndThis
), you'll see there are plenty of places where exceptions can be thrown, such as ifcall
isn't found or it's not callable. This could happen ifhandler
isundefined
ornull
, I imagine. Anyway, I fear this may be happening because of an attempt to call a null/undefined handler. But it's not clear to me why -- if there is an exception being thrown from Rhino-- we don't get a chance to deal with it (don't we have some kind of global exception handler in Rhino?) So I'm submitting a pull request that protects us from what I'm describing, but as I can't reproduce it, I'm not sure what we'll do for functional testing it. Maybe Eric can re-do a couple of passes on KS on his failing device.(V8) Just from looking at the trace (I haven't done anything else since I'm working on Rhino), it seems that this is a case where the application has been backed-out-of and there are still geolocation events happening, then the application is being re-entered. While it's re-entering and re-initing the runtime, another geolocation event fires. Some sort of deadlock then occurs. It seems like one opportunity for deadlock would be if V8Object.cpp's
Java_org_appcelerator_kroll_runtime_v8_V8Object_nativeFireEvent
hit line 104 where it triesV8Util::openJSErrorDialog(tryCatch);
which presumably tries to do something on the main thread, which is waiting for thewaitForInit
latch.Pull request ready for RHINO portion https://github.com/appcelerator/titanium_mobile/pull/1286
(RHINO) Now that I'm using a device (the only Android handheld I have, a crappy 2.2 HTC Desire that's almost two years old) instead of emulator, it's actually very easy to re-create the lockup in Rhino: 1) Enter KitchenSink. 2) Go to Phone. 3) Go to Geolocation. After a bit in there, go back. 4) Go to Accelerometer. After a bit in there, go back. 5) Back out of KS altogether. 6) Go back in to KS. Here it locks. With my patch (PR 1286), it doesn't lockup.
will close on next deep pass
Reopening. SDK: 1.8.2.v20120223174636 Android: V8 Studio: 1.0.9.201202141208 OS: Snow Leopard Devices Tested: Kindle Fire 2.3.3 Kitchen Sink Nook performance gradually slows down, followed by an eventual crash. Log and stack trace attached.
I was able to consistently reproduce this issue on V8 on 2.3.3 device. ANR is basically a time out for Android main thread. In other words, if the main thread is not responding for more than 5 seconds, ANR will occur. In this case, ANR is caused by leaking memory. When heap size approaches the limit, performance drastically decreases. I was able to reproduce this by leaking memory on purpose by repeatedly open/closing Base UI -> Views -> ImageViews -> Image View(layout 2) until my heap was 23/24mb. Here's one of the logs: http://pastie.org/private/cuy3dvuwxytvf9sqyrqnkq At about 20/24mb, the app takes roughly 3-4 seconds to respond to my clicks, so I'm not surprised that it will take 5+ seconds at 23/24mb. Based on these findings, and analysis of the most recent log/trace, this issue is mostly likely a side effect of memory leaks. I'll be linking this bug with other memory related bugs, so we can re-visit this after those memory issues are fixed.
The general ANR error can be caused by a number of factors, including: 1) Running out of heap space 2) Indefinitely blocking the main thread The most recent re-open of this issue is primarily concerned with the ANRs found from TIMOB-7897 Since ANRs can be caused by many factors, it would be best if we opened separate ANR issues for each test case that we encounter them in, since there might be different underlying causes
Closing ticket as duplicate and links to the related ticket has been provided above.