Skip to Content

technology.collections

About this document

This document is a R notebook, dynamically created from the numbers extracted on the project. It lists all datasets published for the project, providing basic numbers, figures and a quick summary, and serves as a test case to make sure that all the required data is present and roughly consistent with requirements. All plots and tables are computed from the actual data as provided in the downloads.

To re-execute the document, simply start a R session, load rmarkdown and render the page with the project ID as a parameter:

require('rmarkdown')
render("datasets_report.Rmarkdown", params = list(project_id = "technology.collections"), output_format="html_document")

This website uses the blogdown R package, which provides a different output_format for the hugo framework.

This report was generated on 2021-04-25.

Downloads

All data is retrieved from Alambic, an open-source framework for development data extraction and processing.

This project’s analysis page can be found on the Alambic instance for the Eclipse forge, at https://eclipse.alambic.io/projects/technology.collections.

Downloads are composed of gzip’d CSV and JSON files. CSV files always have a header to name the fields, which makes it easy to import in analysis software like R:

data <- read.csv(file='myfile.csv', header=T)
names(data)

List of datasets generated for the project:

  • Git
    • Git Commits (CSV) – Full list of commits with id, message, time, author, committer, and added, deleted and modifed lines.
    • Git Commits Evol (CSV) – Evolution of number of commits and authors by day.
    • Git Log (TXT) – the raw export of git log.
  • Jenkins CI
  • Eclipse PMI
    • PMI Checks (CSV) – list of all checks applied to the Project Management Infrastructure entries for the project.
  • ScanCode

Git

Git commits

Download: git_commits_evol.csv.gz

data <- read.csv(file=file_git_commits_evol, header=T)

File is git_commits_evol.csv, and has 3 columns for 514 entries.

data$commits_sum <- cumsum(data$commits)
data.xts <- xts(x = data[,c('commits_sum', 'commits', 'authors')], order.by=as.POSIXct(as.character(data[,c('date')]), format="%Y-%m-%d"))

time.min <- index(data.xts[1,])
time.max <- index(data.xts[nrow(data.xts)])
all.dates <- seq(time.min, time.max, by="days")
empty <- xts(order.by = all.dates)

merged.data <- merge(empty, data.xts, all=T)
merged.data[is.na(merged.data) == T] <- 0

p <-dygraph(merged.data[,c('commits')],
        main = paste('Daily commits for ', project_id, sep=''),
        width = 800, height = 250 ) %>%
      dyRangeSelector()
p


Git log

Download: git_log.txt.gz

File is git_log.txt, and full log has 12140 lines.


Jenkins

Builds

Download: jenkins_builds.csv.gz

data <- read.csv(file=file_jenkins_builds, header=T)

File is jenkins_builds.csv, and has 7 columns for 357 commits.

ID Name Time Result
62 coverage-nightly \#62 1.518303e+12 ABORTED
2016-06-06\_21-00-37 coverage-nightly \#61 1.465261e+12 FAILURE
2016-06-05\_21-00-37 coverage-nightly \#60 1.465175e+12 FAILURE
2016-06-04\_21-00-38 coverage-nightly \#59 1.465088e+12 FAILURE
2016-06-03\_21-00-37 coverage-nightly \#58 1.465002e+12 FAILURE
2016-06-02\_21-00-37 coverage-nightly \#57 1.464916e+12 FAILURE
2016-06-01\_21-00-37 coverage-nightly \#56 1.464829e+12 FAILURE
134 deploy \#134 1.616087e+12 SUCCESS
133 deploy \#133 1.604335e+12 SUCCESS
132 deploy \#132 1.597958e+12 SUCCESS


Jobs

Download: jenkins_jobs.csv.gz

data <- read.csv(file=file_jenkins_jobs, header=T)

File is jenkins_jobs.csv, and has 15 columns for 12 commits.

Name Colour Last build time Health report
coverage-nightly aborted 1.518303e+12 20
deploy blue 1.616087e+12 80
deploy-p2-maven red 1.557007e+12 0
gsc-ec-converter upload blue 1.464575e+12 100
hipp-setting-analysis blue 1.467796e+12 50
javadoc red 1.604342e+12 33
master blue 1.618317e+12 100
new-version blue 1.616089e+12 60
publish-p2-repo blue 1.616170e+12 80
release blue 1.616085e+12 100


PMI

PMI Checks

Download: eclipse_pmi_checks.csv.gz

data <- read.csv(file=file_pmi_checks, header=T)

File is eclipse_pmi_checks.csv, and has 3 columns for 17 commits.

checks.table <- head(data[,c('Description', 'Value', 'Results')], 10)

print(
    xtable(checks.table,
        caption = paste('Extract of the 10 first PMI checks for ', 
                        project_id, '.', sep=" "),
        digits=0, align="llll"), type="html",
    html.table.attributes='class="table table-striped"',
    caption.placement='bottom',
    include.rownames=FALSE,
    sanitize.text.function=function(x) { x }
)
Extract of the 10 first PMI checks for technology.collections .
Description Value Results
Checks if the URL can be fetched using a simple get query. Failed: no URL defined for create\_url.
Checks if the URL can be fetched using a simple get query. Failed: no URL defined for query\_url.
Sends a get request to the given CI URL and looks at the headers in the response (200 404..). Also checks if the URL is really a Hudson instance (through a call to its API). https://hudson.eclipse.org/collections/ OK. Fetched CI URL.\\Failed: could not decode Hudson JSON.
Checks if the Dev ML URL can be fetched using a simple get query. https://dev.eclipse.org/mailman/listinfo/collections-dev OK: Dev ML URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. https://www.eclipse.org/collections/\#refGuide OK: Documentation URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. https://www.eclipse.org/collections/\#start OK: Download URL could be successfully fetched.
Checks if the Forums URL can be fetched using a simple get query. http://eclipse.org/forums/eclipse.collections OK. Forum \[Eclipse Collections forum\] correctly defined.\\OK: Forum \[Eclipse Collections forum\] URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. https://www.eclipse.org/collections/\#learn OK: Documentation URL could be successfully fetched.
Checks if the Mailing lists URL can be fetched using a simple get query. Failed: no mailing list defined.
Checks if the URL can be fetched using a simple get query. Failed: no URL defined for plan.

ScanCode

Authors

Download: scancode_authors.csv.gz

data <- read.csv(file=file_sc_authors, header=T)

File is scancode_authors.csv, and has 2 columns for 2 commits.

Author Count
unknown 3173
collect 1
suppressPackageStartupMessages(library(googleVis))
options(gvis.plot.tag='chart')

data.sorted <- data[order(data$count, decreasing = T),]

p <- gvisPieChart(data.sorted,
              options = list(
                title=paste("Authors for project ", project_id, " ", sep=""),
                sliceVisibilityThreshold=0, height=280,
                pieHole= 0.4))

print(p, 'chart')


Copyrights

Download: scancode_copyrights.csv.gz

data <- read.csv(file=file_sc_copyrights, header=T)

File is scancode_copyrights.csv, and has 2 columns for 24 commits.

Copyrights Count
Copyright (c) Goldman Sachs 2284
unknown 474
Copyright (c) Goldman Sachs and others 343
Copyright (c) Bhavana Hindupur 12
Copyright (c) Shotaro Sano 12
Copyright (c) The Eclipse Foundation 8
Copyright (c) Ivan Sopov and others 7
Copyright (c) Shotaro Sano and others 6
Copyright (c) The Bank of New York Mellon 6
Copyright (c) Two Sigma 6
suppressPackageStartupMessages(library(googleVis))
options(gvis.plot.tag='chart')

data.sorted <- data[order(data$count, decreasing = T),]

p <- gvisPieChart(data.sorted,
              options = list(
                title=paste("Copyrights for project ", project_id, " ", sep=""),
                sliceVisibilityThreshold=0, height=280,
                pieHole= 0.4))

print(p, 'chart')


Holders

Download: scancode_holders.csv.gz

data <- read.csv(file=file_sc_holders, header=T)

File is scancode_holders.csv, and has 2 columns for 25 commits.

Holders Count
Goldman Sachs. \~ 2284
unknown 474
Goldman Sachs and others. \~ 343
Bhavana Hindupur 12
Shotaro Sano 12
The Eclipse Foundation 8
Ivan Sopov and others. \~ 7
Shotaro Sano and others 6
The Bank of New York Mellon 6
Two Sigma 6
suppressPackageStartupMessages(library(googleVis))
options(gvis.plot.tag='chart')

data.sorted <- data[order(data$count, decreasing = T),]

p <- gvisPieChart(data.sorted,
              options = list(
                title=paste("Holders for project ", project_id, " ", sep=""),
                sliceVisibilityThreshold=0, height=280,
                pieHole= 0.4))

print(p, 'chart')


Licences

Download: scancode_licences.csv.gz

data <- read.csv(file=file_sc_licences, header=T)

File is scancode_licences.csv, and has 2 columns for 10 commits.

Licence Count
epl-1.0 OR bsd-new 2695
unknown 464
bsd-new 30
epl-1.0 28
cpl-1.0 AND other-permissive 6
bsd-new OR epl-2.0 5
apache-2.0 1
cpl-1.0 1
eclipse-sua-2011 1
mpl-1.1 1
suppressPackageStartupMessages(library(googleVis))
options(gvis.plot.tag='chart')

p <- gvisPieChart(data,
              options = list(
                title=paste("Licences for project ", project_id, " ", sep=""),
                sliceVisibilityThreshold=0, height=280,
                pieHole= 0.4))

print(p, 'chart')


Programming Languages

Download: scancode_programming_languages.csv.gz

data <- read.csv(file=file_sc_pl, header=T)

File is scancode_licences.csv, and has 2 columns for 6 commits.

Programming Language Count
Java 2591
Python 347
unknown 162
Scala 54
HTML 19
ActionScript 3 1
suppressPackageStartupMessages(library(googleVis))
options(gvis.plot.tag='chart')

p <- gvisPieChart(data,
              options = list(
                title=paste("Programming languages for project ", project_id, " ", sep=""),
                sliceVisibilityThreshold=0, height=280,
                pieHole= 0.4))

print(p, 'chart')


Special files

Download: scancode_special_files.csv.gz

data <- read.csv(file=file_sc_sf, header=T)

File is scancode_special_files.csv, and has 2 columns for 39 commits.

Holders Type
LICENSE-EDL-1.0.txt legal
LICENSE-EPL-1.0.txt legal
pom.xml manifest
README.md readme
README\_EXAMPLES.md readme
acceptance-tests/pom.xml manifest
eclipse-collections/pom.xml manifest
eclipse-collections/src/main/resources/LICENSE-EDL-1.0.txt legal
eclipse-collections/src/main/resources/LICENSE-EPL-1.0.txt legal
eclipse-collections-api/pom.xml manifest