Challenges in software attribution: The case of Android apps
by Juan Tapiador (Universidad Carlos III de Madrid)
The ability to identify the author responsible for a piece of software is critical for many research studies and for enhancing software transparency and accountability. However, as opposed to other application markets like iOS, attribution in the Android ecosystem is known to be hard. Android app authors can, either intentionally or by mistake, hide their true identity. A recent study based on the analysis of 2.5 million market entries from 5 Android markets explores the availability, volatility, and overall aptness of publicly available metadata for author attribution in Android app markets.
Software attribution is the process of matching a piece of software to its author. Attribution is critical for software analysis, platform measurements, threat analysis, transparency, and regulatory enforcement [1, 2]. Today, all major software platforms implement some form of attribution mechanism. Windows relies on a Public Key Infrastructure (PKI) that provides authenticity guarantees about the organizaion offering the software through the use of X.509 certificates issued by trusted Certificate Authorities (CAs). Apple follows a similar approach: All iOS apps must be signed with a developer certificate issued by Apple. These certificates are part of Apple’s developer program, which involves the verification of developers’ legal identity [3, 4].
Android implements a more permissive attribution scheme regardless of the app market. During the development and publication process of an app, developers can disclose attribution data both during the app signing process and on their market profile (e.g., developer name, email and website) . This information is self-declared by the developer and is not endorsed nor validated by a trusted third party. Even the cryptographic signing certificates can be self-signed . While other software platforms also distribute software under potentially unverified, self-declared attribution data, their PKI ensures some form of control by the platform operator, which is absent in the Android ecosystem. To complicate things further, the diversity of publication policies across Android app markets translates into a lack of a robust Android-wide attribution mechanisms.
The lack of sound attribution mechanisms prevents external actors, such as researchers and regulators (and, possibly, the market operators themselves) from automatically studying developer practices, enhancing software accountability, or effectively detecting harmful, cloned, and deceptive apps. As a result, end users are potential victims of impersonation attacks, such as repackaged malware or phishing attacks, which may also have a negative impact on the revenue streams and reputation of legitimate developers.
A recent study  by members of the TRUST aWARE consortium has studied empirically the availability, volatility, and overall consistency of different attribution signals in the Android ecosystem. Using a large dataset with 2.5M sets of signals and 1.4M apps from 5 different Android markets, the study identifies factors that have a negative impact on accurate author attribution at the app and market levels, both within and across Android markets. One key result is that the use of market metadata and app signing certificates as attribution signals is unsound due to the inaccuracy, volatility, and incompleteness of the data. There are several reasons that explain this finding. On the one hand, attribution signals often conflict with each other both at the app-level (same author using different signals in different apps) and across market (same app using different signals in different markets). On the other hand, the belief that the signing certificate relates to a single company is invalid. For example, the subject and issuer field in self-signed certificates are the same and this information is filled in arbitrarily by the party creating the certificate. In addition, for apps that delegate their signing process, all certificates generated by each platform share the same subject field.
Imprecise app attribution has negative consequences for several research areas and applications. It affects measurements and analyses of the app ecosystem, market dynamics, and automatic detection of deceptive actors and practices. The unavailability of signals forces the software attribution community to rely on a combination of signals to improve upon attribution based on single signals. The volatility of signals threaten the validity of studies on the long term, as drawn conclusions may not hold months later. Imprecise attribution also damages transparency for users, as the absence of clear signals about which company is accountable for an app limits their ability to take an informed decision about whether to install it or not, or when exercising GDPR rights. In the case of privacy abuses and security vulnerabilities, it makes disclosures harder. Furthermore, inconsistencies across markets can lead users to install an app by mistake, only because it presents an ambiguous signal (e.g., the app name).
Improving attribution in the Android ecosystem requires moving away from self-signing certificates and vet the information disclosed in apps’ market profiles. Other prominent software ecosystems already rely on more sound signature mechanisms, such as Apple issuing valid certificates for its app store and Windows relying on a PKI. While far from perfect, these approaches limit the number of certificates with incomplete or invalid information while also raising the bar for malicious actors.
 S. Alrabaee, P. Shirani, M. Debbabi, and L. Wang, “On the feasibility of malware authorship attribution,” in International Symposium on Foundations and Practice of Security. Springer, 2016.
 H. Wang, Z. Liu, J. Liang, N. Vallina-Rodriguez, Y. Guo, L. Li, J. Tapiador, J. Cao, and G. Xu, “Beyond google play: A large- scale comparative study of chinese android app markets,” in Proceedings of the Internet Measurement Conference (IMC), 2018.
 “Apple Developer Program: What You Need To Enroll,” https://developer.apple.com/programs/enroll/, 2021.
 “Identity Verification,” https://developer.apple.com/support/ identity- verification/, 2021.
 “Google Play Policy Center,” https://support.google.com/googleplay/android-developer/answer/9898842, 2021.
 “Sign your app,” https://developer.android.com/studio/ publish/app-signing, 2021.
 K. Hageman, A. Feal, J. Gamba, A. Girish, J. Bleier, M. Lindorfer, J. Tapiador, N. Vallina-Rodriguez. “Mixed Signals: Analyzing Software Attribution Challenges in the Android Ecosystem.” November 2022. Pre-print available at https://arxiv.org/abs/2211.13104.