
eBook - ePub
The Art and Science of Analyzing Software Data
- 672 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
eBook - ePub
The Art and Science of Analyzing Software Data
About this book
The Art and Science of Analyzing Software Data provides valuable information on analysis techniques often used to derive insight from software data. This book shares best practices in the field generated by leading data scientists, collected from their experience training software engineering students and practitioners to master data science.
The book covers topics such as the analysis of security data, code reviews, app stores, log files, and user telemetry, among others. It covers a wide variety of techniques such as co-change analysis, text analysis, topic analysis, and concept analysis, as well as advanced topics such as release planning and generation of source code comments. It includes stories from the trenches from expert data scientists illustrating how to apply data analysis in industry and open source, present results to stakeholders, and drive decisions.
- Presents best practices, hints, and tips to analyze data and apply tools in data science projects
- Presents research methods and case studies that have emerged over the past few years to furtherunderstanding of software data
- Shares stories from the trenches of successful data science initiatives in industry
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access The Art and Science of Analyzing Software Data by Christian Bird,Tim Menzies,Thomas Zimmermann in PDF and/or ePUB format, as well as other popular books in Informatique & Traitement des données. We have over one million books available in our catalogue for you to explore.
Information
Chapter 1
Past, Present, and Future of Analyzing Software Data
Christian Bird*; Tim Menzies†; Thomas Zimmermann* * Microsoft Research, Redmond, WA, USA
† Computer Science, North Carolina State University, Raleigh, NC, USA
† Computer Science, North Carolina State University, Raleigh, NC, USA
Abstract
This chapter introduces the book and offers some context for the rest of the chapters. Specifically, we explore different definitions of ”software analytics” as well the historical evolution of the data science for software engineering.
Keywords
data science
software analytics
software engineering
history
Acknowledgments
The work of this kind of book falls mostly on the authors and reviewers, and we’re very appreciative of all those who took the time to write and comment on these chapters. The work of the reviewers was particularly challenging because their feedback was required in a very condensed timetable. Accordingly, we offer them our heartfelt thanks.
We’re also grateful to the Morgan Kaufmann production team for their hard work in assembling this material.

So much data, so little time.
Once upon a time, reasoning about software projects was inhibited by a lack of data. Now thanks to the Internet and open source, there’s so much data about software projects that it’s impossible to manually browse through it all. For example, at the time of writing (December 2014), our Web searches shows that Mozilla Firefox has over 1.1 million bug reports, and platforms such as GitHub host over 14 million projects. Furthermore, the PROMISE repository of software engineering data (openscience.us/repo) contains data sets, ready for mining, on hundreds of software projects. PROMISE is just one of more than a dozen open source repositories that are readily available to industrial practitioners and researchers; see the following table.

Repositories of Software Engineering Data
| Repository | URL |
| Bug Prediction Dataset | http://bug.int.usi.ch |
| Eclipse Bug Data | http://www.st.cs.uni-saarland.de/softevo/bug-data/eclipse |
| FLOSSMetrics | http:/flossmetrics.org |
| FLOSSMole | http://flossmole.org |
| International Software Benchmarking Standards Group (IBSBSG) | http://www.isbsg.org |
| Ohloh | http://www.ohloh.net |
| PROMISE | http://promisedata.googlecode.com |
| Qualitas Corpus | http://qualitascorpus.com |
| Software Artifact Repository | http://sir.unl.edu |
| SourceForge Research Data | http://zeriot.cse.nd.edu |
| Sourcerer Project | http://sourcerer.ics.uci.edu |
| Tukutuku | http://www.metriq.biz/tukutuku |
| Ultimate Debian Database | http://udd.debian.org |
It is now routine for any project to generate gigabytes of artifacts (software code, developer emails, bug reports, etc.). How can we reason about it all? The answer is data science. This is a rapidly growing field with immense potential to change the day-to-day practices of any number of fields. Software companies (e.g., Google, Facebook, and Microsoft) are increasingly making decisions in a data-driven way and are in search of data scientists to help them.
1.1 Definitions
It is challenging to define software analytics for software engineering (SE) since, at different times, SE analytics has meant different things to different people. Table 1.1 lists some of the more recent definitions found in various papers since 2010. Later in this introduction, we offer a short history of work dating back many decades, any of which might be called “SE data analytics.”
Table 1.1
Five Definitions of “Software Analytics”
| Hassan A, Xie T. Software intelligence: the future of mining software engineering data. FoSER 2010: 161-166. | [Software Intelligence] offers software practitioners (not just developers) up-to-date and pertinent information to support their daily decision-making processes. |
| Buse RPL, Zimmermann T. Analytics for software development. FoSER 2010:77-90. | The idea of analytics is to leverage potentially large amounts of data into real and actionable insights. |
| Zhang D, Dang Y, Lou J-G, Han S, Zhang H, Xie T. Software analytics as a learning case in practice: approaches and experiences. MALETS 2011. | Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for data driven tasks around software and services (and software practitioners typically include software developers, tests, usability engineers, and managers, etc.). |
| Buse RPL, Zimmermann T. Information needs for software development analytics. ICSE 2012:987-996. | Software development analytics … empower(s) software development teams to independently gain and share insight from their data without relying on a separate entity. |
| Menzies T, Zimmermann T. Software analytics: so what? IEEE Softw 2013;30(4):31-7. | Software analytics is analytics on software data for managers and software engineers with the aim of empowering software development individuals and teams to gain and share insight from thei... |
Table of contents
- Cover image
- Title page
- Table of Contents
- Copyright
- List of Contributors
- Chapter 1: Past, Present, and Future of Analyzing Software Data
- Part 1: Tutorial-Techniques
- Part 2: Data/Problem Focussed
- Part 3: Stories from the Trenches
- Part 4: Advanced Topics
- Part 5: Data Analysis at Scale (Big Data)