Sentence Embedding Models for Similarity Detection of Software Requirements

Das, Souvick; Deb, Novarun; Cortesi, Agostino; Chaki, Nabendu

doi:10.1007/s42979-020-00427-1

Sentence Embedding Models for Similarity Detection of Software Requirements

Original Research
Published: 02 February 2021

Volume 2, article number 69, (2021)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Souvick Das ORCID: orcid.org/0000-0002-3314-2537¹,
Novarun Deb²,
Agostino Cortesi³ &
…
Nabendu Chaki⁴

1488 Accesses
Explore all metrics

Abstract

Semantic similarity detection mainly relies on the availability of laboriously curated ontologies, as well as of supervised and unsupervised neural embedding models. In this paper, we present two domain-specific sentence embedding models trained on a natural language requirements dataset in order to derive sentence embeddings specific to the software requirements engineering domain. We use cosine-similarity measures in both these models. The result of the experimental evaluation confirm that the proposed models enhance the performance of textual semantic similarity measures over existing state-of-the-art neural sentence embedding models: we reach an accuracy of 88.35%—which improves by about 10% on existing benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic Similarities in Natural Language Requirements

Software Functional and Non-function Requirement Classification Using Word-Embedding

Neural sentence embedding models for semantic similarity estimation in the biomedical domain

Article Open access 11 April 2019

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Notes

https://github.com/hanxiao/bert-as-service/.

References

Abad ZSH, Karras O, Ghazi P, Glinz M, Ruhe G, Schneider K. What works better? A study of classifying requirements. In: 2017 IEEE 25th International Requirements Engineering Conference (RE), IEEE; 2017; p. 496–501.
Arora S, Liang Y, Ma T. A simple but tough-to-beat baseline for sentence embeddings. In: International Conference on learning representations; 2016; p. 1–16.
Barz B, Denzler J. Deep learning on small datasets without pre-training using cosine loss. In: The IEEE Winter Conference on applications of computer vision, 2020; p. 1371–380.
Biswas E, Vijay-Shanker K, Pollock L. Exploring word embedding techniques to improve sentiment analysis of software engineering texts. In: 2019 IEEE/ACM 16th International Conference on mining software repositories (MSR), IEEE, 2019; p. 68–78.
Bowman SR, Angeli G, Potts C, Manning CD. A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on empirical methods in natural language processing, association for computational linguistics, Lisbon, Portugal, 2015. ;. 632–42, https://doi.org/10.18653/v1/D15-1075.
Cer D, Diab M, Agirre E, Lopez-Gazpio I, Specia L. Semeval-2017 task 1: semantic textual similarity-multilingual and cross-lingual focused evaluation. In: Association for Computational Linguistics, 2017; p. 1–14. https://www.aclweb.org/anthology/S17-2001, arXiv preprint arXiv:1708.00055.
Cer D, Yang Y, Kong Sy, Hua N, Limtiaco N, St John R, Constant N, Guajardo-Cespedes M, Yuan S, Tar C, Strope B, Kurzweil R. Universal sentence encoder for Eenglish. In: Proceedings of the 2018 Conference on empirical methods in natural language processing: system demonstrations, association for computational linguistics, Brussels, Belgium, 2018; p. 169–74, https://doi.org/10.18653/v1/D18-2029
Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A. Supervised learning of universal sentence representations from natural language inference data. In: Proceedings of the 2017 Conference on empirical methods in natural language processing, association for computational linguistics, Copenhagen, Denmark, 2017; p. 670–80, https://doi.org/10.18653/v1/D17-1070
Dalpiaz F, Ferrari A, Franch X, Palomares C. Natural language processing for requirements engineering: the best is yet to come. In: IEEE software, IEEE, 2018;35:115–19.
Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019; p. 4171–186, https://doi.org/10.18653/v1/N19-1423
Efstathiou V, Chatzilenas C, Spinellis D. Word embeddings for the software engineering domain. In: Proceedings of the 15th International Conference on mining software repositories, 2018; p. 38–41.
Eyal Salman H, Hammad M, Seriai AD, Al-Sbou A. Semantic clustering of functional requirements using agglomerative hierarchical clustering. In: Information, Multidisciplinary Digital Publishing Institute, 2018;9: 222.
Ferrari A, Spagnolo GO, Gnesi S. Pure: a dataset of public requirements documents. In: 2017 IEEE 25th International Requirements Engineering Conference (RE), IEEE, 2017; p. 502–5.
Howard J, Ruder S. Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the association for computational linguistics (Volume 1: Long Papers), association for computational linguistics, Melbourne, Australia, 2018; p. 328–39, https://doi.org/10.18653/v1/P18-1031
Ilyas M, Kung J. A similarity measurement framework for requirements engineering. In: 2009 Fourth International Multi-Conference on computing in the global information technology, IEEE, 2009; p. 31–4.
Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the association for computational linguistics: volume 2, short papers, association for computational linguistics, Valencia, Spain, 2017;.p. 427–31.
Kingma DP, Ba J. Adam: a method for stochastic optimization. In: 3rd International Conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings; 2015.
Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S. Skip-thought vectors. In: Advances in neural information processing systems, 2015’ p. 3294–302.
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R. Albert: a lite bert for self-supervised learning of language representations. In: arXiv preprint arXiv:1909.11942 2019.
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. Roberta: A robustly optimized Bert pretraining approach. In: arXiv preprint arXiv:1907.11692 2019.
May C, Wang A, Bordia S, Bowman SR, Rudinger R. On measuring social biases in sentence encoders. In: Proceedings of the 2019 Conference of the North American Chapter of the association for computational linguistics: human language technologies, volume 1 (Long and Short Papers), association for computational linguistics, Minneapolis, Minnesota, 2019; p. 622–28, https://doi.org/10.18653/v1/N19-1063
Mihany FA, Moussa H, Kamel A, Ezzat E, Ilyas M. An automated system for measuring similarity between software requirements. In: Proceedings of the 2nd Africa and Middle East Conference on software engineering, 2016; p. 46–51.
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systemss, Vol. 2. Red Hook, NY: Curran Associates Inc.; 2013. p. 3111–9.
Mishra S, Sharma A. On the use of word embeddings for identifying domain specific ambiguities in requirements. In: 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW), IEEE, 2019; p. 234–40.
Ott D. Automatic requirement categorization of large natural language specifications at mercedes-benz for review improvements. In: International Working Conference on requirements engineering: foundation for software quality, Springer, 2013; p. 50–64.
Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on empirical methods in natural language processing (EMNLP), 2014; p. 1532–543.
Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L. Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the association for computational linguistics: human language technologies, volume 1 (Long Papers), association for computational linguistics, New Orleans, Louisiana, 2018; p. 2227–237, https://doi.org/10.18653/v1/N18-1202
Quan Z, Wang Z, Le Y, Yao B, Li K, Yin J. An efficient framework for sentence similarity modeling. IEEE/ACM Trans Audio Speech Lang Process. 2019;27:853–65.
Article Google Scholar
Rahimi M, Mirakhorli M, Cleland-Huang J. Automated extraction and visualization of quality concerns from requirements specifications. In: 2014 IEEE 22nd International Requirements Engineering Conference (RE), IEEE, 2014; p. 253–62.
Reimers N, Gurevych I. Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on empirical methods in natural language processing and the 9th International Joint Conference on natural language processing (EMNLP-IJCNLP), association for computational linguistics, Hong Kong, China, 2019; p. 3982–992, https://doi.org/10.18653/v1/D19-1410
Shirabad JS, Menzies TJ. The promise repository of software engineering databases. In: School of information technology and engineering, University of Ottawa, Canada, 2005; vol 24.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: NIPS 2017: 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. Red Hook, NY: Curran Associates Inc. 2017; p. 5998–6008.
Winkler J, Vogelsang A. Automatic classification of requirements based on convolutional neural networks. In: 2016 IEEE 24th International Requirements Engineering Conference Workshops (REW), IEEE, 2016; p. 39–45.
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV. Xlnet: generalized autoregressive pretraining for language understanding. In: Advances in neural information processing systems, 2019; p 5754–764.
Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi Y. Bertscore: evaluating text generation with Bert. In: 8th International Conference on learning representations, ICLR, 2020; 2020.
Zhu Y, Kiros R, Zemel R, Salakhutdinov R, Urtasun R, Torralba A, Fidler S. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE International Conference on computer vision, 2015; p. 19–27.

Download references

Acknowledgements

This work has been partially supported by the Project IN17MO07 “Formal Specification for Secured Software System”, under the Indo-Italian Executive Programme of Scientific and Technological Cooperation.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Calcutta, Kolkata, 700106, India
Souvick Das
Indian Institute of Information Technology, Vadodara, Gandhinagar, Gujarat, 382028, India
Novarun Deb
Department of Environmental Science, Informatics, and Statistics, Ca’ Foscari University, 30172, Venezia, Italy
Agostino Cortesi
Department. of Computer Science and Engineering, University of Calcutta, Kolkata, 700106, India
Nabendu Chaki

Authors

Souvick Das
View author publications
You can also search for this author inPubMed Google Scholar
Novarun Deb
View author publications
You can also search for this author inPubMed Google Scholar
Agostino Cortesi
View author publications
You can also search for this author inPubMed Google Scholar
Nabendu Chaki
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Souvick Das.

Ethics declarations

Conflict of Interest Statement

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Applications of Software Engineering and Tool Support” guest edited by Nabendu Chaki, Agostino Cortesi and Anirban Sarkar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das, S., Deb, N., Cortesi, A. et al. Sentence Embedding Models for Similarity Detection of Software Requirements. SN COMPUT. SCI. 2, 69 (2021). https://doi.org/10.1007/s42979-020-00427-1

Download citation

Received: 11 August 2020
Accepted: 11 December 2020
Published: 02 February 2021
DOI: https://doi.org/10.1007/s42979-020-00427-1

Keywords

Part of a collection:

Applications of Software Engineering and Tool Support

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentence Embedding Models for Similarity Detection of Software Requirements

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic Similarities in Natural Language Requirements

Software Functional and Non-function Requirement Classification Using Word-Embedding

Neural sentence embedding models for semantic similarity estimation in the biomedical domain

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest Statement

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now