DMTK in the pubblic domain

Researchers from the laboratory of Microsoft Asia Research Lab laid out in the open access to GitHub a program for creating distributed machine learning systems Microsoft Distributed Machine Learning Toolkit (DMTK), with which several computers can solve in parallel a whole complex of problems traditionally attributed to the field of artificial intelligence.

The current version of DMTK includes a framework that effectively implements the machine learning process on large data through a hybrid data structure, to store large data models, and through automatic pipelining.

A hybrid data structure is a storage model that uses data separation for high-frequency and low-frequency parameters (for example, the frequency of the user's access to data or access to the system itself, and so on) to achieve a balance between memory capacity and access speed.

DMTK has two models of algorithms: LightLDA - system of thematic modeling and Word2vec - implementation of distributed algorithms of vector representation of words.

The toolkit offers an accompanying API for facilitating the work of researchers and developers.

DMTK will help in the realization of such tasks as recognition of natural language, classification of documents, computer vision, speech recognition and definition of the meaning of textual information, etc.

DMTK в открытом доступе