Abstract
Browser-based defenses have recently been advocated as an effective mechanism to protect potentially insecure web applications against the threats of session hijacking, fixation, and related attacks. In existing approaches, all such defenses ultimately rely on client-side heuristics to automatically detect cookies containing session information, to then protect them against theft or otherwise unintended use. While clearly crucial to the effectiveness of the resulting defense mechanisms, these heuristics have not, as yet, undergone any rigorous assessment of their adequacy. In this article, we conduct the first such formal assessment, based on a ground truth of 2,464 cookies we collect from 215 popular websites of the Alexa ranking.
To obtain the ground truth, we devise a semiautomatic procedure that draws on the novel notion of authentication token, which we introduce to capture multiple web authentication schemes. We test existing browser-based defenses in the literature against our ground truth, unveiling several pitfalls both in the heuristics adopted and in the methods used to assess them. We then propose a new detection method based on supervised learning, where our ground truth is used to train a set of binary classifiers, and report on experimental evidence that our method outperforms existing proposals. Interestingly, the resulting classifiers, together with our hands-on experience in the construction of the ground truth, provide new insight on how web authentication is actually implemented in practice.
- Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules. In International Conference on Very Large Data Bases (VLDB’94). 487--499. Google ScholarDigital Library
- Kellie J. Archer and Ryan V. Kimes. 2008. Empirical characterization of random forest variable importance measures. Computational Statistics & Data Analysis 52, 4, 2249--2260. Google ScholarDigital Library
- L. Breiman. 2001. Random forests. Machine Learning 45, 5--32. Google ScholarDigital Library
- Carla E. Brodley and Paul E. Utgoff. 1995. Multivariate decision trees. Machine Learning 19, 1, 45--77. Google ScholarDigital Library
- Michele Bugliesi, Stefano Calzavara, Riccardo Focardi, and Wilayat Khan. 2014a. Automatic and robust client-side protection for cookie-based sessions. In Engineering Secure Software and Systems (ESSoS’14). 161--178. Google ScholarDigital Library
- Michele Bugliesi, Stefano Calzavara, Riccardo Focardi, Wilayat Khan, and Mauro Tempesta. 2014b. Provably sound browser-based enforcement of web session integrity. In IEEE Computer Security Foundations Symposium (CSF’14). Google ScholarDigital Library
- Stefano Calzavara, Gabriele Tolomei, Michele Bugliesi, and Salvatore Orlando. 2014. Quite a mess in my cookie jar!: Leveraging machine learning to protect web authentication. In 23rd International World Wide Web Conference (WWW’14). 189--200. Google ScholarDigital Library
- Nitesh V. Chawla. 2005. Data mining for imbalanced datasets: An overview. In Data Mining and Knowledge Discovery Handbook. Springer, 853--867.Google Scholar
- Italo Dacosta, Saurabh Chakradeo, Mustaque Ahamad, and Patrick Traynor. 2012. One-time cookies: Preventing session hijacking attacks with stateless authentication tokens. ACM Transactions on Internet Technology 12, 1, 1. Google ScholarDigital Library
- Philippe De Ryck, Lieven Desmet, Thomas Heyman, Frank Piessens, and Wouter Joosen. 2010. CsFire: Transparent client-side mitigation of malicious cross-domain requests. In Engineering Secure Software and Systems (ESSoS’10). 18--34. Google ScholarDigital Library
- Philippe De Ryck, Lieven Desmet, Wouter Joosen, and Frank Piessens. 2011. Automatic and precise client-side protection against CSRF attacks. In European Symposium on Research in Computer Security (ESORICS’11). 100--116. Google ScholarDigital Library
- Philippe De Ryck, Nick Nikiforakis, Lieven Desmet, Frank Piessens, and Wouter Joosen. 2012. Serene: Self-reliant client-side protection against session fixation. In Distributed Applications and Interoperable Systems (DAIS’12). 59--72. Google ScholarDigital Library
- P. A. Devyver and J. Kittler. 1982. Pattern Recognition: A Statistical Approach. Prentice-Hall.Google Scholar
- Michael Dietz, Alexei Czeskis, Dirk Balfanz, and Dan S. Wallach. 2012. Origin-bound certificates: A fresh approach to strong client authentication for the web. In 21th USENIX Security Symposium. 317--331. Google ScholarDigital Library
- Charles Elkan. 2001. The foundations of cost-sensitive learning. In 17th International Joint Conference on Artificial Intelligence. 973--978. Google ScholarDigital Library
- Dinei A. F. Florêncio and Cormac Herley. 2007. A large-scale study of web password habits. In International Conference on World Wide Web (WWW’07). 657--666. Google ScholarDigital Library
- Seth Fogie, Jeremiah Grossman, Robert Hansen, Anton Rager, and Petko D. Petkov. 2007. XSS Attacks: Cross Site Scripting Exploits and Defense. Syngress Publishing. Google ScholarDigital Library
- William F. Friedman. 1922. The Index of Coincidence and Its Applications to Cryptanalysis. Cryptographic Series.Google Scholar
- Kevin Fu, Emil Sit, Kendra Smith, and Nick Feamster. 2001. The Dos and Don’ts of client authentication on the web. In 10th USENIX Security Symposium. Google ScholarDigital Library
- Stuart Geman, Elie Bienenstock, and René Doursat. 1992. Neural networks and the bias/variance dilemma. Neural Computation 4, 1 (Jan. 1992), 1--58. Google ScholarDigital Library
- Thiago S. Guzella and Walmir M. Caminhas. 2009. A review of machine learning approaches to Spam filtering. Expert Systems with Applications 36, 7, 10206--10222. Google ScholarDigital Library
- Jeff Hodges, Collin Jackson, and Adam Barth. 2012. HTTP Strict Transport Security. Retrieved from http://tools.ietf.org/html/rfc6797.Google Scholar
- Collin Jackson and Adam Barth. 2008. ForceHTTPS: Protecting high-security web sites from network attacks. In International Conference on World Wide Web (WWW’08). 525--534. Google ScholarDigital Library
- Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2014. An Introduction to Statistical Learning: With Applications in R. Springer. Google Scholar
- Nathalie Japkowicz and Shaju Stephen. 2002. The class imbalance problem: A systematic study. Intelligent Data Analysis 6, 5, 429--449. Google ScholarDigital Library
- Martin Johns, Bastian Braun, Michael Schrank, and Joachim Posegga. 2011. Reliable protection against session fixation attacks. In ACM Symposium on Applied Computing (SAC’11). 1531--1537. Google ScholarDigital Library
- Martin Johns and Justus Winter. 2006. RequestRodeo: Client side protection against session riding. In OWASP Europe Conference. 5--17.Google Scholar
- Michal Kranch and Joseph Bonneau. 2015. Upgrading HTTPS in mid-air: An empirical study of strict transport security and key pinning. In Network and Distributed System Symposium (NDSS’15).Google ScholarCross Ref
- E. Kreyszig. 1979. Advanced Engineering Mathematics (4 ed.). Wiley.Google Scholar
- Thomas M. Mitchell. 1997. Machine Learning (1 ed.). McGraw-Hill, New York. Google ScholarDigital Library
- Nick Nikiforakis, Wannes Meert, Yves Younan, Martin Johns, and Wouter Joosen. 2011. SessionShield: Lightweight protection against session hijacking. In Engineering Secure Software and Systems (ESSoS’11). 87--100. Google ScholarDigital Library
- Claudia Perlich, Foster Provost, and Jeffrey S. Simonoff. 2003. Tree induction vs. logistic regression: A learning-curve analysis. Journal of Machine Learning Research 4 (Dec. 2003), 211--255. Google ScholarDigital Library
- M. M. Rahman and D. N. Davis. 2013. Addressing the class imbalance problem in medical datasets. International Journal of Machine Learning and Computing 3, 3, 224--228.Google ScholarCross Ref
- Franziska Roesner, Tadayoshi Kohno, and David Wetherall. 2012. Detecting and defending against third-party tracking on the web. In USENIX Conference on Networked Systems Design and Implementation (NSDI’12). 1--14. Google ScholarDigital Library
- G. Salton and M. J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill, New York. Google ScholarDigital Library
- Claude Shannon. 1948. A mathematical theory of communication. The Bell System Technical Journal 27, 379--423.Google ScholarCross Ref
- Robin Sommer and Vern Paxson. 2010. Outside the closed world: On using machine learning for network intrusion detection. In IEEE Symposium on Security and Privacy. 305--316. Google ScholarDigital Library
- Shuo Tang, Nathan Dautenhahn, and Samuel T. King. 2011. Fortifying web-based applications automatically. In ACM Conference on Computer and Communications Security (CCS’11). 615--626. Google ScholarDigital Library
- Gary M. Weiss, Kate McCarthy, and Bibi Zabar. 2007. Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs? Retrieved from http://storm.cis.fordham.edu/gweiss/papers/dmin07-weiss.pdf.Google Scholar
- Yuchen Zhou and David Evans. 2010. Why aren’t HTTP-Only cookies more widely deployed. In Web 2.0 Security and Privacy Workshop (W2SP’10).Google Scholar
Index Terms
- A Supervised Learning Approach to Protect Client Authentication on the Web
Recommendations
Quite a mess in my cookie jar!: leveraging machine learning to protect web authentication
WWW '14: Proceedings of the 23rd international conference on World wide webBrowser-based defenses have recently been advocated as an effective mechanism to protect web applications against the threats of session hijacking, fixation, and related attacks. In existing approaches, all such defenses ultimately rely on client-side ...
Client-side cross-site scripting protection
Web applications are becoming the dominant way to provide access to online services. At the same time, web application vulnerabilities are being discovered and disclosed at an alarming rate. Web applications often make use of JavaScript code that is ...
Security vulnerabilities and mitigation techniques of web applications
SIN '13: Proceedings of the 6th International Conference on Security of Information and NetworksWeb applications contain vulnerabilities, which may lead to serious security breaches such as stealing of confidential information. To protect against security breaches, it is necessary to understand the detailed steps of attacks and the pros and cons ...
Comments