Applying Social Network Analysis to the Information in Cvs Repositories
Essay by review • October 13, 2010 • Research Paper • 2,635 Words (11 Pages) • 2,744 Views
Essay Preview: Applying Social Network Analysis to the Information in Cvs Repositories
Applying Social Network Analysis to the Information in CVS Repositories
Abstract
The huge quantities of data available in the CVS repositories
of large, long-lived libre (free, open source) software
projects, and the many interrelationships among those data
offer opportunities for extracting large amounts of valuable
information about their structure, evolution and internal
processes. Unfortunately, the sheer volume of that information
renders it almost unusable without applying methodologies
which highlight the relevant information for a given
aspect of the project. In this paper, we propose the use of
a well known set of methodologies (social network analysis)
for characterizing libre software projects, their evolution
over time and their internal structure. In addition,
we show how we have applied such methodologies to real
cases, and extract some preliminary conclusions from that
experience.
Keywords: source code repositories, visualization techniques,
complex networks, libre software engineering
1 Introduction
The study and characterization of complex systems is an
active research area, with many interesting open problems.
Special attention has been paid recently to techniques based
on network analysis, thanks to their power to capture some
important characteristics and relationships. Network characterization
is widely used in many scientific and technological
disciplines, ranging from neurobiology [14] to computer
networks [1] [3] or linguistics [9] (to mention just
some examples). In this paper we apply this kind of analysis
to software projects, using as a base the data available in
their source code versioning repository (usually CVS). Fortunately,
most large (both in code size and number of developers)
libre (free, open source) software projects maintain
such repositories, and grant public access to them.
The information in the CVS repositories of libre software
projects has been gathered and analyzed using several
methodologies [12] [5], but still many other approaches are
possible. Among them, we explore here how to apply some
techniques already common in the traditional (social) network
analysis. The proposed approach is based on considering
either modules (usually CVS directories) or developers
(commiters to the CVS) as vertices, and the number of common
commits as the weight of the link between any two vertices
(see section 3 for a more detailed definition). This way,
we end up with a weighted graph which captures some relationships
between developers or modules, in which characteristics
as information flow or communities can be studied.
There have been some other works analyzing social networks
in the libre software world. [7] hypothesizes that the
organization of libre software projects can be modeled as
self-organizing social networks and shows that this seems
to be true at least when studying SourceForge projects.
[6] proposes also a sort of network analysis for libre software
projects, but considering source dependencies between
modules. Our approach explores how to apply those
network analysis techniques in a more comprehensive and
complete way. To expose it, we will start by introducing
some basic concepts of social network analysis which are
used later (section 2), and the definition of the networks we
consider 3. In section 4 we introduce the characterization
we propose for those networks, and later, in section 5, we
show some examples of the application of that characterization
to Apache, GNOME and KDE. To finish, we offer
some conclusions and discuss some future work.
2 Basic concepts on Social Network Analysis
The Theory of Complex Networks is based on representing
complex systems as graphs. There are many examples
in the literature where this approach has been successfully
used in very different scientific and technological
disciplines, identifying vertices and links as relevant for
each specific domain. For example, in ecological networks
each vertex may represent a particular specie, with a link
between two species if one of them "eats" the other. When
dealing with social networks, we may identify vertices with
persons or groups of people, considering a link when there
...
...