A family of droids -- Android malware detection via behavioral modeling: static vs dynamic analysis
Cristofaro, Emiliano De
MetadataShow full item record
Citation (published version)Lucky Onwuzurike, Mario Almeida, Enrico Mariconti, Jeremy Blackburn, Gianluca Stringhini, Emiliano De Cristofaro. "A Family of Droids -- Android Malware Detection via Behavioral Modeling: Static vs Dynamic Analysis."
Following the increasing popularity of mobile ecosystems, cybercriminals have increasingly targeted them, designing and distributing malicious apps that steal information or cause harm to the device's owner. Aiming to counter them, detection techniques based on either static or dynamic analysis that model Android malware, have been proposed. While the pros and cons of these analysis techniques are known, they are usually compared in the context of their limitations e.g., static analysis is not able to capture runtime behaviors, full code coverage is usually not achieved during dynamic analysis, etc. Whereas, in this paper, we analyze the performance of static and dynamic analysis methods in the detection of Android malware and attempt to compare them in terms of their detection performance, using the same modeling approach. To this end, we build on MaMaDroid, a state-of-the-art detection system that relies on static analysis to create a behavioral model from the sequences of abstracted API calls. Then, aiming to apply the same technique in a dynamic analysis setting, we modify CHIMP, a platform recently proposed to crowdsource human inputs for app testing, in order to extract API calls' sequences from the traces produced while executing the app on a CHIMP virtual device. We call this system AuntieDroid and instantiate it by using both automated (Monkey) and user-generated inputs. We find that combining both static and dynamic analysis yields the best performance, with F-measure reaching 0.92. We also show that static analysis is at least as effective as dynamic analysis, depending on how apps are stimulated during execution, and, finally, investigate the reasons for inconsistent misclassifications across methods.