Python Programming Dataset contains source code snapshots of 15 students who implement gradient descent in Python. This data set is meant to evaluate time series prediction algorithms for tree data, i.e. the task is to predict the next source code snapshot in a students development.
VBB Shortest Path Data 2018 contains the times needed to travel between all metro stations in the Berlin public transportation system according to the public schedule data between August and December 2017. Each station is labelled with the city district it belongs to. This data set is meant to compare classification and clustering approaches on challenging dissimilarity data, where symmetry does not always hold.
BinaryAdder UML contains twelve correct and six wrong solutions of an UML diagram describing the addition of two binary numbers. This dataset additionally contains the development of these solutions over time and human tutor hints for every step in the wrong solutions. As such, this data set can be used as a reference to compare hint provision strategies in educational datamining.
Sorting programs contains the pairwise distances between 128 Java sorting programs implementing one of six different sorting algorithms.
MiniPalindrome contains 48 Java programs which recognize whether all words in an input string are palindromic using one of eight different strategies. The dataset is balanced and also contains the pairwise distances between all programs.