research-article

A Supervised Learning Approach to Protect Client Authentication on the Web

Authors:
Stefano Calzavara

Università Ca’ Foscari Venezia, Venezia Mestre (Italy)

Università Ca’ Foscari Venezia, Venezia Mestre (Italy)
View Profile

,
Gabriele Tolomei

Università Ca’ Foscari Venezia; Yahoo Labs, London UK

Università Ca’ Foscari Venezia; Yahoo Labs, London UK
View Profile

,
Andrea Casini

Università Ca’ Foscari Venezia, Venezia Mestre (Italy)

Università Ca’ Foscari Venezia, Venezia Mestre (Italy)
View Profile

,
Michele Bugliesi

Università Ca’ Foscari Venezia, Venezia Mestre (Italy)

Università Ca’ Foscari Venezia, Venezia Mestre (Italy)
View Profile

,
Salvatore Orlando

Università Ca’ Foscari Venezia, Venezia Mestre (Italy)

Università Ca’ Foscari Venezia, Venezia Mestre (Italy)
View Profile

Authors Info & Claims

ACM Transactions on the Web Volume 9 Issue 3Article No.: 15pp 1–30https://doi.org/10.1145/2754933

Published:12 June 2015Publication History

ACM Transactions on the Web

Abstract

Browser-based defenses have recently been advocated as an effective mechanism to protect potentially insecure web applications against the threats of session hijacking, fixation, and related attacks. In existing approaches, all such defenses ultimately rely on client-side heuristics to automatically detect cookies containing session information, to then protect them against theft or otherwise unintended use. While clearly crucial to the effectiveness of the resulting defense mechanisms, these heuristics have not, as yet, undergone any rigorous assessment of their adequacy. In this article, we conduct the first such formal assessment, based on a ground truth of 2,464 cookies we collect from 215 popular websites of the Alexa ranking.

To obtain the ground truth, we devise a semiautomatic procedure that draws on the novel notion of authentication token, which we introduce to capture multiple web authentication schemes. We test existing browser-based defenses in the literature against our ground truth, unveiling several pitfalls both in the heuristics adopted and in the methods used to assess them. We then propose a new detection method based on supervised learning, where our ground truth is used to train a set of binary classifiers, and report on experimental evidence that our method outperforms existing proposals. Interestingly, the resulting classifiers, together with our hands-on experience in the construction of the ground truth, provide new insight on how web authentication is actually implemented in practice.

References

Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules. In International Conference on Very Large Data Bases (VLDB’94). 487--499. Google ScholarDigital Library
Kellie J. Archer and Ryan V. Kimes. 2008. Empirical characterization of random forest variable importance measures. Computational Statistics & Data Analysis 52, 4, 2249--2260. Google ScholarDigital Library
L. Breiman. 2001. Random forests. Machine Learning 45, 5--32. Google ScholarDigital Library
Carla E. Brodley and Paul E. Utgoff. 1995. Multivariate decision trees. Machine Learning 19, 1, 45--77. Google ScholarDigital Library
Michele Bugliesi, Stefano Calzavara, Riccardo Focardi, and Wilayat Khan. 2014a. Automatic and robust client-side protection for cookie-based sessions. In Engineering Secure Software and Systems (ESSoS’14). 161--178. Google ScholarDigital Library
Michele Bugliesi, Stefano Calzavara, Riccardo Focardi, Wilayat Khan, and Mauro Tempesta. 2014b. Provably sound browser-based enforcement of web session integrity. In IEEE Computer Security Foundations Symposium (CSF’14). Google ScholarDigital Library
Stefano Calzavara, Gabriele Tolomei, Michele Bugliesi, and Salvatore Orlando. 2014. Quite a mess in my cookie jar&excl;: Leveraging machine learning to protect web authentication. In 23rd International World Wide Web Conference (WWW’14). 189--200. Google ScholarDigital Library
Nitesh V. Chawla. 2005. Data mining for imbalanced datasets: An overview. In Data Mining and Knowledge Discovery Handbook. Springer, 853--867.Google Scholar
Italo Dacosta, Saurabh Chakradeo, Mustaque Ahamad, and Patrick Traynor. 2012. One-time cookies: Preventing session hijacking attacks with stateless authentication tokens. ACM Transactions on Internet Technology 12, 1, 1. Google ScholarDigital Library
Philippe De Ryck, Lieven Desmet, Thomas Heyman, Frank Piessens, and Wouter Joosen. 2010. CsFire: Transparent client-side mitigation of malicious cross-domain requests. In Engineering Secure Software and Systems (ESSoS’10). 18--34. Google ScholarDigital Library
Philippe De Ryck, Lieven Desmet, Wouter Joosen, and Frank Piessens. 2011. Automatic and precise client-side protection against CSRF attacks. In European Symposium on Research in Computer Security (ESORICS’11). 100--116. Google ScholarDigital Library
Philippe De Ryck, Nick Nikiforakis, Lieven Desmet, Frank Piessens, and Wouter Joosen. 2012. Serene: Self-reliant client-side protection against session fixation. In Distributed Applications and Interoperable Systems (DAIS’12). 59--72. Google ScholarDigital Library
P. A. Devyver and J. Kittler. 1982. Pattern Recognition: A Statistical Approach. Prentice-Hall.Google Scholar
Michael Dietz, Alexei Czeskis, Dirk Balfanz, and Dan S. Wallach. 2012. Origin-bound certificates: A fresh approach to strong client authentication for the web. In 21th USENIX Security Symposium. 317--331. Google ScholarDigital Library
Charles Elkan. 2001. The foundations of cost-sensitive learning. In 17th International Joint Conference on Artificial Intelligence. 973--978. Google ScholarDigital Library
Dinei A. F. Florêncio and Cormac Herley. 2007. A large-scale study of web password habits. In International Conference on World Wide Web (WWW’07). 657--666. Google ScholarDigital Library
Seth Fogie, Jeremiah Grossman, Robert Hansen, Anton Rager, and Petko D. Petkov. 2007. XSS Attacks: Cross Site Scripting Exploits and Defense. Syngress Publishing. Google ScholarDigital Library
William F. Friedman. 1922. The Index of Coincidence and Its Applications to Cryptanalysis. Cryptographic Series.Google Scholar
Kevin Fu, Emil Sit, Kendra Smith, and Nick Feamster. 2001. The Dos and Don’ts of client authentication on the web. In 10th USENIX Security Symposium. Google ScholarDigital Library
Stuart Geman, Elie Bienenstock, and René Doursat. 1992. Neural networks and the bias/variance dilemma. Neural Computation 4, 1 (Jan. 1992), 1--58. Google ScholarDigital Library
Thiago S. Guzella and Walmir M. Caminhas. 2009. A review of machine learning approaches to Spam filtering. Expert Systems with Applications 36, 7, 10206--10222. Google ScholarDigital Library
Jeff Hodges, Collin Jackson, and Adam Barth. 2012. HTTP Strict Transport Security. Retrieved from http://tools.ietf.org/html/rfc6797.Google Scholar
Collin Jackson and Adam Barth. 2008. ForceHTTPS: Protecting high-security web sites from network attacks. In International Conference on World Wide Web (WWW’08). 525--534. Google ScholarDigital Library
Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2014. An Introduction to Statistical Learning: With Applications in R. Springer. Google Scholar
Nathalie Japkowicz and Shaju Stephen. 2002. The class imbalance problem: A systematic study. Intelligent Data Analysis 6, 5, 429--449. Google ScholarDigital Library
Martin Johns, Bastian Braun, Michael Schrank, and Joachim Posegga. 2011. Reliable protection against session fixation attacks. In ACM Symposium on Applied Computing (SAC’11). 1531--1537. Google ScholarDigital Library
Martin Johns and Justus Winter. 2006. RequestRodeo: Client side protection against session riding. In OWASP Europe Conference. 5--17.Google Scholar
Michal Kranch and Joseph Bonneau. 2015. Upgrading HTTPS in mid-air: An empirical study of strict transport security and key pinning. In Network and Distributed System Symposium (NDSS’15).Google ScholarCross Ref
E. Kreyszig. 1979. Advanced Engineering Mathematics (4 ed.). Wiley.Google Scholar
Thomas M. Mitchell. 1997. Machine Learning (1 ed.). McGraw-Hill, New York. Google ScholarDigital Library
Nick Nikiforakis, Wannes Meert, Yves Younan, Martin Johns, and Wouter Joosen. 2011. SessionShield: Lightweight protection against session hijacking. In Engineering Secure Software and Systems (ESSoS’11). 87--100. Google ScholarDigital Library
Claudia Perlich, Foster Provost, and Jeffrey S. Simonoff. 2003. Tree induction vs. logistic regression: A learning-curve analysis. Journal of Machine Learning Research 4 (Dec. 2003), 211--255. Google ScholarDigital Library
M. M. Rahman and D. N. Davis. 2013. Addressing the class imbalance problem in medical datasets. International Journal of Machine Learning and Computing 3, 3, 224--228.Google ScholarCross Ref
Franziska Roesner, Tadayoshi Kohno, and David Wetherall. 2012. Detecting and defending against third-party tracking on the web. In USENIX Conference on Networked Systems Design and Implementation (NSDI’12). 1--14. Google ScholarDigital Library
G. Salton and M. J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill, New York. Google ScholarDigital Library
Claude Shannon. 1948. A mathematical theory of communication. The Bell System Technical Journal 27, 379--423.Google ScholarCross Ref
Robin Sommer and Vern Paxson. 2010. Outside the closed world: On using machine learning for network intrusion detection. In IEEE Symposium on Security and Privacy. 305--316. Google ScholarDigital Library
Shuo Tang, Nathan Dautenhahn, and Samuel T. King. 2011. Fortifying web-based applications automatically. In ACM Conference on Computer and Communications Security (CCS’11). 615--626. Google ScholarDigital Library
Gary M. Weiss, Kate McCarthy, and Bibi Zabar. 2007. Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs? Retrieved from http://storm.cis.fordham.edu/gweiss/papers/dmin07-weiss.pdf.Google Scholar
Yuchen Zhou and David Evans. 2010. Why aren’t HTTP-Only cookies more widely deployed. In Web 2.0 Security and Privacy Workshop (W2SP’10).Google Scholar

Index Terms

A Supervised Learning Approach to Protect Client Authentication on the Web

Recommendations

Quite a mess in my cookie jar!: leveraging machine learning to protect web authentication
WWW '14: Proceedings of the 23rd international conference on World wide web

Browser-based defenses have recently been advocated as an effective mechanism to protect web applications against the threats of session hijacking, fixation, and related attacks. In existing approaches, all such defenses ultimately rely on client-side ...
Read More
Client-side cross-site scripting protection

Web applications are becoming the dominant way to provide access to online services. At the same time, web application vulnerabilities are being discovered and disclosed at an alarming rate. Web applications often make use of JavaScript code that is ...
Read More
Security vulnerabilities and mitigation techniques of web applications
SIN '13: Proceedings of the 6th International Conference on Security of Information and Networks

Web applications contain vulnerabilities, which may lead to serious security breaches such as stealing of confidential information. To protect against security breaches, it is necessary to understand the detailed steps of attacks and the pros and cons ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on the Web Volume 9, Issue 3
June 2015
187 pages
ISSN:1559-1131
EISSN:1559-114X
DOI:10.1145/2788341
Editors:
Brian D. Davison
Lehigh University, USA
,
Marianne Winslett
University of Illinois at Urbana-Champaign
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 June 2015
- Revised: 1 March 2015
- Accepted: 1 March 2015
- Received: 1 October 2014
Published in tweb Volume 9, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Web security
authentication cookies
classification
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 530
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Supervised Learning Approach to Protect Client Authentication on the Web

ACM Transactions on the Web

Abstract

References

Cited By

Index Terms

Recommendations

Quite a mess in my cookie jar!: leveraging machine learning to protect web authentication

Client-side cross-site scripting protection

Security vulnerabilities and mitigation techniques of web applications