Skills, division of labor and performance in collective inventions: Evidence from open source software
Introduction
Collective inventions among profit-seeking individuals and organizations have become popular in the economics literature since the seminal paper of Robert Allen (1983) on the iron district of Cleveland in the nineteenth century. More recently, collective inventions have come to the forefront of economists' attention because of the diffusion of open source software (OSS hereafter). OSS can be viewed as a ‘virtual’ community of practice made up of inventors who voluntarily contribute to multiple collective inventions. OSS offers expert developers the opportunity to participate in innovation networks which are, to some extent, reminiscent of the communities of users in the early age of computing (Steinmueller, 1996, Torrisi, 1998) or other user-centered innovation processes such as those analyzed by von Hippel (1988).
Most studies have attempted to explain why a growing number of independent developers (‘hackers’) voluntarily disclose their inventions. Several theoretical works seek to understand not only the motivations for disclosure of the source code, but also the social norms and the patterns of collaboration among distributed developers, and the implications for efficiency and social welfare (e.g. Raymond, 1999, von Hippel, 2001, Lerner and Tirole, 2002, Johnson, 2002, Harhoff et al., 2003, Dalle and David, 2005).
Empirical studies (e.g. Lakhani and von Hippel, 2003, Hertel et al., 2003, Lakhani and Wolf, 2005) also ask why hackers freely reveal information and what is the contribution of single participants to the productivity of specific OSS projects. However, little is known about the determinants of OSS projects' performance on a larger scale.
Our paper uses a large sample of OSS teams to study the association between a project's performance (measured by bugs and patches fixed, new feature requests completed, new file releases and changes made to the project's source code) and two important dimensions of team production — skill composition and the level of modularity of project activities.
Our analysis draws on two streams of the literature. The first one is rooted into team production theory. Team production requires collaborative skills, i.e. communication ability (people skills), leadership, and the ability to carry out multiple tasks. These skills add to specialized technical skills, thereby expanding production possibilities. Collaborative skills also favor the “discovery of ways to assign, organize, and perhaps alter tasks to produce more efficiently” (Hamilton et al., 2003: p. 470). Moreover, most importantly for this paper, heterogeneity among team members favors mutual learning and intra-team bargaining, creating opportunities for nonmonetary benefits such as a stimulating working environment, peer recognition and decisional authority (Hamilton et al., 2003).
In this setting we analyze the association between project performance and skill heterogeneity of its members. We expect that skill heterogeneity is positively associated with project performance in teams of open source software developers (Galunic and Riordan, 1998, Sutton and Hargadon, 1997). We focus on two dimensions of skill heterogeneity. First, individual participants must be prepared to carry out multiple tasks whose fulfillment requires a variety of skills. Second, open source participants may have different levels of commitment to single projects. In particular, we can distinguish core developers, who are highly committed and presumably highly experienced people, from the varied community of contributors, who occasionally participate in problem solving by supplying patches, reporting bugs or asking for assistance. Then, it is likely that the level and composition of skills vary across different categories of participants.
The second research line is associated with a key characteristic of the modern organization design that is modularity (Milgrom and Roberts, 1990, Milgrom and Roberts, 1995). Modularity in design and production has been defined as a strategy for “building a complex product or process from smaller subsystems that can be designed independently” (Baldwin and Clark, 1997, p. 84). In modular production, the value generated by each module can be separated from the total outcome. Moreover, modularity allows for experimentation and innovation, increases the efficiency of design activities, favors mutual learning between team members, and stimulates innovation (Baldwin and Clark, 1997, Langlois, 2002, Pil and Cohen, 2006). Thus, we posit that the level of design modularization or division of tasks at the project level is correlated with observable differences in performance across open source software projects.
This paper provides a novel empirical contribution to the literature on the economics of collective inventions. Our contribution is twofold. First, unlike many previous works that have focused on one or a few open source software projects, we provide an empirical investigation based on a large sample of OSS projects hosted by the SourceForge.net website, one of the largest repositories of OSS activity. To our knowledge, this is one of the few attempts to provide a systematic empirical analysis of multiple dimensions of OSS projects. Second, we focus on a crucial economic issue and examine open source projects with the aim of understanding the association between performance and key project characteristics — team members' skill composition and design modularity.
This paper is organized as follows. Section 2 discusses the theoretical background. Section 3 presents the data. Section 4 illustrates the methodology for estimating the relationship between skills and modularity and project performance. Section 5 analyzes the empirical results. Section 6 concludes.
Section snippets
Theoretical background
Our paper focuses on two dimensions of collective inventions: (i) the diversity of skills of team members and (ii) modularity.
Data and variables
Our empirical analysis draws on a unique dataset containing information on OSS projects and individuals (registered users). The dataset has been built from data provided by SF.net over the period November 3rd 1999 to January 10th 2003.
The number of projects registered at SF.net has increased quite rapidly since its foundation. In May 2008, the number of registered projects was about 178,000 and the number of registered contributors was more than 1,800,000. Other websites hosting open source
Econometric method
To study how skills and modularity are associated with project survival and performance we have estimated three sets of equations.
First, we estimated the probability of project survival with a logistic regression. The dependent variable is ACTIVE, which is equal to 1 when at least one of the following events has occurred during the period October 2002 to December 2002: a fixed bug, a fixed patch, a fulfilled feature request, a change to the project source code (a CVS commit sent to the CVS
Descriptive statistics
Table 1 summarizes the variables used in the empirical analysis and provides some descriptive statistics for the sample of 9076 projects. For about 26% of these projects (2372) data on skills are missing or not reported. Given the importance of this variable in our analysis, we compared these two categories of projects and checked for statistically significant differences in the two distributions. About 29.13% of sample projects that report data on skills are active while the same percentage is
Discussion and conclusions
Our analysis provides novel empirical evidence on an important example of collective inventions. The analysis draws on two streams of the literature. First, the theory of teams has submitted different, contrasting hypotheses about the effect of team composition on team performance. In particular, this paper has addressed the association between skill heterogeneity and performance. Second, the economics of modern production has explored the importance of modular design and production for
Acknowledgements
We thank Gaia Rocchetti for her valuable collaboration in the early stages of our research project and the administrators of SourceForge.net for providing us with their data on open source software projects and individual contributors. We also thank a sample of SourceForge developers for helping us to interpret some key variables in the dataset. We thank Bruno Cassiman, Paul David, Neil Gandal, Walter Garcia Fontes, Joachim Henkel and Gregorio Robles for useful comments to earlier drafts of the
References (52)
Collective invention
Journal of Economic Behavior and Organization
(1983)- et al.
The changing technology of technological change: general and abstract knowledge and the division of innovative labour’
Research Policy
(1994) - et al.
From planning to mature: on the success of open source projects
Research Policy
(2007) - et al.
Profiting from voluntary information spillovers: how users benefit by freely revealing their innovations
Research Policy
(2003) - et al.
Motivation of software developers in open source projects: an Internet-based survey of contributors to the Linux kernel
Research Policy
(2003) - et al.
How open source software works: “free” developer-to-developer assistance
Research Policy
(2003) Modularity in technology and organization
Journal of Economic Behavior and Organization
(2002)Skills and innovation
International Journal of Industrial Organization
(2005)- et al.
Division of labor, organizational coordination and market mechanisms in collective problem-solving
Journal of Economic Behavior and Organization
(2005) - et al.
Complementarities and fit. Strategy, structure, and organizational change in modern manufacturing
Journal of Accounting and Economics
(1995)