Software
Python3 Research Implementations
- Word embedding clustering is a small set of scripts to perform clustering on word embeddings, especially for psychological research.
- Decorrelated Sparse Survival Regression is a suite of methods to perform survival regression with a linear scoring model that is easy (or easier) to interpret for human experts.
- ast2vec is a pre-trained recursive tree grammar autoencoder (see below) that can auto-encode Python programs. It was trained on half a million beginner programs and is intended for educational datamining applications in computer science education. Reference Paper
- Recursive Tree Grammar Autoencoders are recursive neural networks that can auto-encode tree data if a grammar is known. The autoencoding accuracy and optimization performance for this model is generally higher compared to an autoencoder that encodes trees sequentially or does not use grammar knowledge. Reference Paper
- Graph Edit Networks are graph neural networks which can model changes in time by predicting graph edits at each node. Reference Paper
- Reservoir Stack Machines are an extension of Reservoir Memory Machines (see below) with a stack as memory. This raises the computational power to deterministic context-free grammars (above Chomsky-3 but below Chomsky-2). Reference Paper
- Reservoir Memory Machines are an extension of Echo State Networks with an explicit memory. This enables these networks to solve computational tasks such as losslessly copying data which are difficult or impossible to solve for standard recurrent neural networks (even deep ones). This memory extension also raises the computational power of ESNs from below Chomsky-3 to above Chomsky-3. Reference Paper
- Tree Echo State Autoencoders are a model to auto-encode tree data in domains where a tree grammar is known. Since the model follows the echo state framework (especially tree echo state networks of Gallicchio and Micheli, 2013), it is very simple to train. Given a list of training trees, an autoencoder can be set up within seconds. Reference paper
- Unordered Tree Edit Distance provides an A* algorithm to compute the NP-hard unoredered tree edit distance with custom costs. Reference Paper
- Adversarial Edit Attacks provides an approach to attack classifiers for tree data using tree edits. Reference paper
- Linear Supervised Transfer Learning provides a simple expectation maximization scheme to learn a mapping from a target space to a source space based on a labelled Gaussian mixture model in the source space and very few target space data points. Reference paper
- Faster Confidence Intervals for Item Response Theory via an Approximate Likelihood provides a faster way to compute confidence intervals for parameters of an item response theory model. Reference Paper
- Sparse Factor Autoencoders for Item Response Theory provides a new factor analysis approach to identify latent factors that explain observed test results in a multi-dimensional item response theroy setting. Reference Paper
Python3 Software Packages
- edist implements a variety of edit distances between sequences and trees, including backtracing and metric learning (Paaßen et al., 2018), in cython. In particular, the library contains implementations for the Levenshtein distance, dynamic time warping, the affine edit distance, and the tree edit distance, as well as support for further edit distances via algebraic dynamic programming (Giegerich, Meyer, and Steffen, 2004). The library is available on pypi via
pip3 install edist
(currently only for linux). - proto-dist-ml implements prototype-based machine learning for distance data, in particular relational neural gas, relational generalized learning vector quantization, and median generalized learning vector quantization. It is available on pypi via
pip3 install proto-dist-ml
. - Falling Walls Diversity provides a simple algorithm to distribute a population of people into work groups that are as diverse as possible. This has been successfully applied at the Falling Walls Circle 2018 and at the Interdisciplinary College (IK) 2018 and 2019. If you wish to adapt this to your event, please contact me for further recommendations.
Java
- TCS Alignment Toolbox for edit distances and derivatives thereof. Also supports custom comparison functions for sequence elements, custom derivatives, and new sequence edit distances via algebraic dynamic programming. Reference Paper
- Relational Neural Gas provides an efficient clustering algorithm solely based on pairwise dissimilarities. Data points are clustered by assigning the cluster of the closest prototype, where each prototype is a convex combination of existing data points. Reference Paper
- Median Generalized Learning Vector Quantization provides an efficient classification algorithm solely based on pairwise dissimilarities. Data points are classified by assigning the label of the closest prototype, where each prototype is a point from the data set. Reference Paper
MATLAB®
- Linear Supervised Transfer Learning Toolbox provides a simple expectation maximization scheme to learn a mapping from a target space to a source space based on a labelled Gaussian mixture model in the source space and very few target space data points. Reference Paper
- Time Series Prediction for Relational and Kernel Data provides a Gaussian Process/robust Bayesian Committee Machine mechanism to predict the next step in time series that are only defined in terms of pairwise distances between elements. Reference Paper
- Tree Edit Distance Learning via Adaptive Symbol Embeddings provides a reference implementation for tree edit distance learning as described in the ICML 2018 reference paper.