Data-driven Learned Metric Index: an Unsupervised Approach
Authors | |
---|---|
Year of publication | 2021 |
Type | Article in Proceedings |
Conference | 14th International Conference on Similarity Search and Applications (SISAP 2021) |
MU Faculty or unit | |
Citation | |
Doi | http://dx.doi.org/10.1007/978-3-030-89657-7_7 |
Keywords | Index structures; Learned index; Unstructured data; Content-based search; Metric space; Machine learning |
Attached files | |
Description | Metric indexes are traditionally used for organizing unstructured or complex data to speed up similarity queries. The most widely-used indexes cluster data or divide space using hyper-planes. While searching, the mutual distances between objects and the metric properties allow for the pruning of branches with irrelevant data -- this is usually implemented by utilizing selected anchor objects called pivots. Recently, we have introduced an alternative to this approach called Lear\-ned Metric Index. In this method, a series of machine learning models substitute decisions performed on pivots -- the query evaluation is then determined by the predictions of these models. This technique relies upon a traditional metric index as a template for its own structure -- this dependence on a pre-existing index and the related overhead is the main drawback of the approach. In this paper, we propose a data-driven variant of the Learned Metric Index, which organizes the data using their descriptors directly, thus eliminating the need for a template. The proposed learned index shows significant gains in performance over its earlier version, as well as the established indexing structure M-index. |
Related projects: |