ABSTRACT
The paper explores the possibility to employ the source code of corporate websites as an information source for research in innovation studies. Research in this area is generally based on studies that collect data on patents or official data sources. Our paper links the standard economic information of the firm with web-based data and joins the ongoing debate with a threefold contribution. First, whereas the majority of the literature focused on the linguistic content of web-pages, we mostly use HTML tags. Second, we propose a method to assess the quality of the linkage of Web data to firm-level information. Third, we show that the data retrieved from corporate websites can aid to identify ‘innovative SMEs’.
- Roberto Antonietti and Francesca Gambarotto. 2020. The role of industry variety in the creation of innovative start-ups in Italy. Small Business Economics 54, 2 (2020), 561–573. https://doi.org/10.1007/s11187-018-0034-4Google ScholarCross Ref
- David Antons, Eduard Grünwald, Patrick Cichy, and Torsten Oliver Salge. 2020. The application of text mining methods in innovation research: current state, evolution patterns, and development priorities. R&D Management 50, 3 (2020), 329–351. https://doi.org/10.1111/radm.12408Google ScholarCross Ref
- Sanjay K. Arora, Yin Li, Jan Youtie, and Philip Shapira. 2016. Using the Wayback Machine to mine websites in the social sciences: A methodological resource. Journal of the Association for Information Science and Technology 67, 8(2016), 1904–1915. https://doi.org/10.1002/asi.23503Google ScholarDigital Library
- Sanjay K. Arora, Jan Youtie, Philip Shapira, Lidan Gao, and TingTing Ma. 2013. Entry strategies in an emerging technology: A pilot web-based study of graphene firms. Scientometrics 95, 3 (2013), 1189–1207. https://doi.org/10.1007/s11192-013-0950-7Google ScholarCross Ref
- Nikolaos Askitas and Klaus F. Zimmermann. 2015. The Internet as a data source for advancement in Social Sciences. International Journal of Manpower 36, 1 (2015), 2–12. https://doi.org/10.1108/IJM-02-2015-0029Google ScholarCross Ref
- Janna Axenbeck and Patrick Breithaupt. 2021. Innovation indicators based on firm websites—Which website characteristics predict firm-level innovation activity?PLOS ONE 16, 4 (2021), 1–23. https://doi.org/10.1371/journal.pone.0249583Google Scholar
- Giulio Barcaroli, Monica Scannapieco, and Summa Donato. 2016. On the Use of Internet as a Data Source for Official Statistics: a Strategy for Identifying Enterprises on the Web. Rivista Italiana di Economia Demografia e Statistica 70, 4(2016), 25–41.Google Scholar
- Desamparados Blázquez and Josep Domènech. 2018. Big Data sources and methods for social and economic analyses. Technological Forecasting and Social Change 130 (2018), 99–113. https://doi.org/10.1016/j.techfore.2017.07.027Google ScholarCross Ref
- Desamparados Blázquez, Josep Domènech, and Ana Debón. 2018. Do corporate websites’ changes reflect firms’ survival?Online Information Review 42, 6 (2018), 956–970. https://doi.org/10.1108/OIR-11-2016-0321Google Scholar
- Desamparados Blázquez, Josep Domènech, Jose A. Gil, and Ana Pont. 2019. Monitoring e-commerce adoption from online data. Knowledge and Information Systems 60, 1 (2019), 227–245. https://doi.org/10.1007/s10115-018-1233-7Google ScholarDigital Library
- Hyunyoung Choi and Hal Varian. 2012. Predicting the Present with Google Trends. Economic Record 88, s1 (2012), 2–9. https://doi.org/10.1111/j.1475-4932.2012.00809.xGoogle ScholarCross Ref
- Alessandra Colombelli. 2016. The impact of local knowledge bases on the creation of innovative start-ups in Italy. Small Business Economics 47, 2 (2016), 383–396. https://doi.org/10.1007/s11187-016-9722-0Google ScholarCross Ref
- Lisa Crosato, Josep Domènech, and Caterina Liberati. 2021. Predicting SME’s default: Are their websites informative?Economics Letters 204(2021), 109888. https://doi.org/10.1016/j.econlet.2021.109888Google Scholar
- Piet J. H. Daas, Marco J. Puts, Bart Buelens, and Paul A. M. van den Hurk. 2015. Big Data as a Source for Official Statistics. Journal of Official Statistics 31, 2 (2015), 249–262. https://doi.org/10.1515/jos-2015-0016Google ScholarCross Ref
- Piet J. H. Daas and Suzanne van der Doef. 2020. Detecting innovative companies via their website. Statistical Journal of the IAOS 36, 4 (2020), 1239–1251. https://doi.org/10.3233/SJI-200627Google ScholarCross Ref
- Josep Domènech, Bernardo de la Ossa, Ana Pont, Jose A. Gil, Milagros Martinez, and Alicia Rubio. 2012. An Intelligent System for Retrieving Economic Information from Corporate Websites. In 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, Vol. 1. IEEE Computer Society, Washington (DC), 573–578. https://doi.org/10.1109/WI-IAT.2012.92Google Scholar
- Liran Einav and Jonathan Levin. 2014. The Data Revolution and Economic Analysis. Innovation Policy and the Economy 14 (2014), 1–24. https://doi.org/10.1086/674019Google ScholarCross Ref
- Magda Fontana and Marco Guerzoni. Forthcoming. Modeling complexity with unconventional data: Foundational issues in computational social science. In Handbook of Computational Social Science, E. Bertoni, M. Fontana, L. Gabrielli, S. Signorelli, and M. Vespe (Eds.). JRC, Oxford, Chapter 1.Google Scholar
- Abdullah Gök, Alec Waterworth, and Philip Shapira. 2015. Use of web mining in studying innovation. Scientometrics 102, 1 (2015), 653–671. https://doi.org/10.1007/s11192-014-1434-0Google ScholarDigital Library
- Marco Guerzoni, Consuelo R. Nava, and Massimiliano Nuccio. 2021. Start-ups survival through a crisis. Combining machine learning with econometrics to measure innovation. Economics of Innovation and New Technology 30, 5 (2021), 468–493. https://doi.org/10.1080/10438599.2020.1769810Google ScholarCross Ref
- Mikaël Héroux-Vaillancourt, Catherine Beaudry, and Constant Rietsch. 2020. Using web content analysis to create innovation indicators—What do we really measure?Quantitative Science Studies 1, 4 (2020), 1601–1637. https://doi.org/10.1162/qss_a_00086Google Scholar
- John E. Hopcroft and Richard M. Karp. 1973. An n5/2 Algorithm for Maximum Matchings in Bipartite Graphs. SIAM J. Comput. 2, 4 (1973), 225–231. https://doi.org/10.1137/0202019Google ScholarDigital Library
- Jiri Hradec, Nicole Ostlaender, Charles Macmillan, Szvetlana Acs, Giulia Listorti, Robert Tomas, and Xavier Arnes Novau. 2019. Semantic Text Analysis tool: SeTA. Publications Office of the European Union, Luxembourg (LU). https://doi.org/10.2760/577814 JRC116152.Google Scholar
- Istituto nazionale di statistica (Istat). 2018. Social Mood on Economy Index. https://www.istat.it/en/archive/219600 Last accessed May 17 2022.Google Scholar
- J. Sylvan Katz and Viv Cothey. 2006. Web indicators for complex innovation systems. Research Evaluation 15, 2 (2006), 85–95. https://doi.org/10.3152/147154406781775922Google ScholarCross Ref
- Jan Kinne and Janna Axenbeck. 2020. Web mining for innovation ecosystem mapping: A framework and a large-scale pilot study. Scientometrics 125, 3 (2020), 2011–2041. https://doi.org/10.1007/s11192-020-03726-9Google ScholarDigital Library
- Jan Kinne and David Lenz. 2021. Predicting innovative firms using web mining and deep learning. PLOS ONE 16, 4 (2021), 1–18. https://doi.org/10.1371/journal.pone.0249071Google ScholarCross Ref
- Raymond Kosala and Hendrik Blockeel. 2000. Web Mining Research: A Survey. ACM SIGKDD Explorations Newsletter 2, 1 (2000), 1––15. https://doi.org/10.1145/360402.360406Google ScholarDigital Library
- Dirk Libaers, Diana Hicks, and Alan L. Porter. 2016. A taxonomy of small firm technology commercialization. Industrial and Corporate Change 25, 3 (2016), 371–405. https://doi.org/10.1093/icc/dtq039Google ScholarCross Ref
- Bjoern Witting, Joachim Weisbrod, and Mirjam Weber. 2013. Identical variables in different business and trade-related statistics—A challenge for European statistics. In Proceedings of the 59th World Statistics Congress of the International Statistical Institute, 25–30 August 2013. ISI, Hong Kong (CN), 217–222. Session IPS023 ‘Modernisation of business statistics’.Google Scholar
Index Terms
- Unconventional data for policy: Using Big Data for detecting Italian innovative SMEs
Recommendations
Knowledge creation and dissemination by Kosetsushi in sectoral innovation systems: insights from patent data
AbstractPublic institutes for testing and research called Kosetsushi constitute an important component of regional innovation policies in Japan. They are organized as a technology diffusion program to help small and medium-sized enterprises (SMEs) improve ...
Innovative thinking in the leaders and competitiveness of SMEs in the Industrial sector in Colombia
AbstractAt a global level, it is thought that organizations, which carry out innovation, are more competitive in the market and achieve a greater competitive advantage; The objective of this scientific article is to identify if innovative thinking in ...
The impact of entrepreneurial orientation on innovation performance: A study on micro, small and medium-sized enterprises (MSMEs) in the Colombian Caribbean
AbstractTraditionally, innovation system literature favors studies of well-functioning economies and innovation in high-technology sectors. In emerging economies, such as Latin America, a significant number of gaps is still found, especially in terms of ...
Comments