ReviewEssays.com - Term Papers, Book Reports, Research Papers and College Essays
Search

Applying Social Network Analysis to the Information in Cvs Repositories

Essay by   •  October 13, 2010  •  Research Paper  •  2,635 Words (11 Pages)  •  2,744 Views

Essay Preview: Applying Social Network Analysis to the Information in Cvs Repositories

Report this essay
Page 1 of 11

Applying Social Network Analysis to the Information in CVS Repositories

Abstract

The huge quantities of data available in the CVS repositories

of large, long-lived libre (free, open source) software

projects, and the many interrelationships among those data

offer opportunities for extracting large amounts of valuable

information about their structure, evolution and internal

processes. Unfortunately, the sheer volume of that information

renders it almost unusable without applying methodologies

which highlight the relevant information for a given

aspect of the project. In this paper, we propose the use of

a well known set of methodologies (social network analysis)

for characterizing libre software projects, their evolution

over time and their internal structure. In addition,

we show how we have applied such methodologies to real

cases, and extract some preliminary conclusions from that

experience.

Keywords: source code repositories, visualization techniques,

complex networks, libre software engineering

1 Introduction

The study and characterization of complex systems is an

active research area, with many interesting open problems.

Special attention has been paid recently to techniques based

on network analysis, thanks to their power to capture some

important characteristics and relationships. Network characterization

is widely used in many scientific and technological

disciplines, ranging from neurobiology [14] to computer

networks [1] [3] or linguistics [9] (to mention just

some examples). In this paper we apply this kind of analysis

to software projects, using as a base the data available in

their source code versioning repository (usually CVS). Fortunately,

most large (both in code size and number of developers)

libre (free, open source) software projects maintain

such repositories, and grant public access to them.

The information in the CVS repositories of libre software

projects has been gathered and analyzed using several

methodologies [12] [5], but still many other approaches are

possible. Among them, we explore here how to apply some

techniques already common in the traditional (social) network

analysis. The proposed approach is based on considering

either modules (usually CVS directories) or developers

(commiters to the CVS) as vertices, and the number of common

commits as the weight of the link between any two vertices

(see section 3 for a more detailed definition). This way,

we end up with a weighted graph which captures some relationships

between developers or modules, in which characteristics

as information flow or communities can be studied.

There have been some other works analyzing social networks

in the libre software world. [7] hypothesizes that the

organization of libre software projects can be modeled as

self-organizing social networks and shows that this seems

to be true at least when studying SourceForge projects.

[6] proposes also a sort of network analysis for libre software

projects, but considering source dependencies between

modules. Our approach explores how to apply those

network analysis techniques in a more comprehensive and

complete way. To expose it, we will start by introducing

some basic concepts of social network analysis which are

used later (section 2), and the definition of the networks we

consider 3. In section 4 we introduce the characterization

we propose for those networks, and later, in section 5, we

show some examples of the application of that characterization

to Apache, GNOME and KDE. To finish, we offer

some conclusions and discuss some future work.

2 Basic concepts on Social Network Analysis

The Theory of Complex Networks is based on representing

complex systems as graphs. There are many examples

in the literature where this approach has been successfully

used in very different scientific and technological

disciplines, identifying vertices and links as relevant for

each specific domain. For example, in ecological networks

each vertex may represent a particular specie, with a link

between two species if one of them "eats" the other. When

dealing with social networks, we may identify vertices with

persons or groups of people, considering a link when there

...

...

Download as:   txt (18.6 Kb)   pdf (200.6 Kb)   docx (19.8 Kb)  
Continue for 10 more pages »
Only available on ReviewEssays.com