[TIMOB-15336] Android TiViewProxy crashes with NullPointerException with blur()
GitHub Issue | n/a |
Type | Bug |
Priority | High |
Status | Closed |
Resolution | Fixed |
Resolution Date | 2013-10-10T22:24:00.000+0000 |
Affected Version/s | n/a |
Fix Version/s | 2013 Sprint 21, 2013 Sprint 21 API, Release 3.2.0 |
Components | Android |
Labels | triage |
Reporter | Steven Lee |
Assignee | Biju pm |
Created | 2013-09-19T22:29:33.000+0000 |
Updated | 2013-10-21T23:11:03.000+0000 |
Description
Included is a sample app which demonstrates this bug; it seems like running blur on a text view right before it get removed from its parent view causes this crash reliably (the buggy section noted with a XXX comment).
Digging into it more, it seems like a general race issue; underlying this is blur uses TiMessenger#postOnMain, which uses Message#sendToTarget underneath, whereas TiUIView#remove uses TiMessenger#sendBlockingMessage, which if another sendBlockingMessage was already running, will be ran before that sendBlockingMessage finishes, and more notably before the blur message is processed.
Attachments
For reference, the entirety of app.js below: ==============================================
Below are the two calls that occur in direct succession to each other, which causes a null pointer exception. It also shows the stack trace necessary to understand why this null pointer exception occurs; the underlying issue has nothing to do with textfield or blurring specifically, but rather an inconsistency with how Android's Handler, Message and Titanium's TiMessenger classes work with each other in message handling. * textfield.blur() *# Inherited from TiViewProxy, so calls TiViewProxy#blur. *# Is being called from runtime thread, so sends to main handler with msg MSG_FOCUS (async call). *# Handled by TiViewProxy#handleMessage, calls TiViewProxy#handleBlur, which calls TiUIView#blur. *# Within it calls TiMessenger#postOnMain (with a runnable referencing TiUIView#nativeView private field), which just tells the main handler to run the runnable later on (another async call). *# Once the runnable is run, nativeView is already null, thus causing a NullPointerException due to trying to dot reference it. * view.remove(textfield) *# Remove is mapped to TiViewProxy#remove, which calls TiMessenger#sendBlockingMainMessage with msg MSG_REMOVE_CHILD because it is being called from runtime thread (and child view textfild as arg). *# Within here it delegates to the current threadlocal's TiMessenger instance (so runtime thread instance) to the instance call TiMessenger#sendBlockingMessage with the main messenger passed in as an argument. *# Within this method call (targetMessenger#sendMessage) the targetMessegner is the main thread's ti messenger with a message whose obj is set to an aysnc result (which contains the textfield as arg to async result). *# In sendMessage, we note that the current thread's id and the main messenger's thread id is different (runtime vs main UI thread), so we either directly do message#sendToTarget if its not blocking (which is when no blocking message is currently running), or if it is blocking, just put in onto the queue to be processed later. *# The bug occurs when it is blocking and placed on the queue, because that means another blocking message is currently running. This logic is located within the anonymous subclass of AsyncResult defined in TiMessenger#sendMessage, specifically getResult. In there it calls dispatchMessage when the blocking message isn't done responding yet, which *synchronously* processes all queued messages, including MSG_REMOVE_CHILD. *# From there it is easy to see that MSG_REMOVE_CHILD leads to calling the textfield's releaseViews method, which nullifies the underlying nativeView field. Also, at this point we are still technically still completing a message previous to the text view's MSG_FOCUS above, as that is queued within the main handler's internal queue. From following the traces above, it goes without saying that this issue, while only concretely shown with text text blurring, can occur anywhere within Ti code when using TiMessenger#sendBlockingMainMessage after TiMessenger#postOnMain while expecting TiMessenger#postOnMain method to complete first.
Hi Steven Lee, I have tested your code you just create a view and add this a textfield and remove this through the button event listener and it occurs. So you say what is the expected behavior and actual behavior. And do you know actually why blur() method is used if you know this clearly, please write this. Thanks
Hi Motiur, The reason why we were originally blurring the text view was to remove the keyboard view in Android when we switched views (so our own original code we were removing the whole parent view rather than just the text field), but since then for that particular issue we have sidestepped past the issue with just using Ti.UI.Android.hideSoftKeyboard method call instead. That being said, the *real issue* I am reporting on is what is causing blur to fail to begin with, because it shouldn't ever in this circumstance. And what the root issue is I have documented clearly in my second comment; *there is an inherent race issue that happens when using TiMessenger#postOnMain versus TiMessenger#sendBlockingMainMessage*. Please focus on that rather than the blurring issue; this low level issue is prevalent since most async calls go through TiMessenger. The blur code is just an example of this larger issue at hand. BTW, I found an even shorter app.js that reproduces this; just look at the error logs to see the null pointer exception stack trace:
Hi Steven Lee, Test that code and know me is that the problem createing yet or not. Arrange that code like this and test the issue.
Here your first code something change here....
Thanks
Hi Steven Lee, Do you want to say when blur() method and remove() method are in the same button eventListner the NullPointerException is occurred? If you want to remove a textfield so it need not to blur simple. Thanks
Like I said before, while we can avoid using those two calls together, *it is still your guy's bug that these two calls cannot be used next to each other, and potentially more location given what the root cause is*. I strongly advise you to please escalate this to a developer's attention, as you are failing to see the bigger issue due to this bug's title.
Hi, I experienced this bug too. I used blur() and remove() when i wanted to switch views and hide keyboard. It worked in 3.0.0 and earlier versions. When I switched to 3.1.3 this issue appeared. Solved by using hideSoftKeyboard method instead. But anyway this is BUG and I think it need to be fixed. Thanks
Now that this bug has been reassigned, can the status of the bug be reopened? Thanks.
PR : https://github.com/appcelerator/titanium_mobile/pull/4767
I have been watching this ticket and must say the above is not a solution for the underlying problem. It is not even really a solution for the MANIFESTATION of the problem. Sure, it eliminates the NullPointerException in this particular repro case... but it also will fail to hide the software keyboard, which means instead of crashing the app, the code snippet just doesn't work instead. Steven Lee can correct me, if I'm wrong on that, but it certainly looks that way. That's not even the biggest issue with that solution either. As Steven explained, the particular blur() repro case is a MANIFESTATION of a deep underlying bug in the way TiMessenger is used in *many* places in the code. This was added after 3.0.0 and not only added the crash in question but also potentially a multitude of other issues, since this technique is used in many places all over the code. Steven went to the trouble of explaining this in his comments on 9/19, 9/22, and 9/23.
Verified the fix & the Android TiViewProxy does not throw any null pointer exception with blur. Thus closing. Environment: Appcel Studio : 3.2.0.201310181700 Ti SDK : 3.2.0.v20131021142445 Mac OSX : 10.8.5 Alloy : 1.2.2-beta CLI - 3.2.0 Device: Samsung Galaxy S4 running android 4.2.2
Here is an analogy for what happened in this ticket: Reporter: This car has a flaw where the electrical system is defective, and therefore starting the car makes it explode. Moreover the same defect probably affects multiple other parts of the car. Appcelerator: OK, fixed. Reporter: Actually even though you've eliminated the car exploding when one tries to start it, now the car just won't start. Also you in no way addressed the underlying electrical system problem, which will adversely affect the car's operation in multiple unknown ways. Appcelerator: The car no longer explodes when starting it. Thus closing.