Sebastiano Vigna on “Recsplit: Minimal perfect hashing via recursive splitting”, Dec. 6th, 1:00 pm, Aula A

Another stellar speaker to close this season’s seminars session! We are very happy to announce the seminar of Prof. Sebastiano Vigna, from Università degli Studi di Milano, that will be held at the Dep. of Computer Science, Università degli Studi di Torino, on Dec. 6th, 2019, at 1:00 pm in Aula A (floor ‘R’).

Find below more information about the talk and the speaker.

It is a great Network Science month! We are very happy to announce the seminar of Prof. Giacomo Como, from Politecnico di Torino, that will be held at the Dep. of Computer Science, Università degli Studi di Torino, on Nov. 28th, 2019, at 10:00 am in Sala Riunioni (first floor).

Recsplit: Minimal perfect hashing via recursive splitting

Abstract
A minimal perfect hash function bijectively maps a key set S out of a universe U into the first |S| natural numbers. Minimal perfect hash functions are used, for example, to map irregularly-shaped keys, such as strings, in a compact space so that metadata can then be simply stored in an array. While it is known that just 1.44 bits per key are necessary to store a minimal perfect hash function, no published technique can go below 2 bits per key in practice. We propose a new technique for storing minimal perfect hash functions with expected linear construction time and expected constant lookup time that makes it possible to build for the first time, for example, structures which need 1.56 bits per key, that is, within 8.3% of the lower bound, in less than 2ms per key. We show that instances of our construction are able to simultaneously beat the construction time, space usage and lookup time of the state-of-the-art data structure reaching 2 bits per key. Moreover, we provide parameter choices giving structures which are competitive with alternative, larger-size data structures in terms of space and lookup time. The construction of our data structures can be easily parallelized or mapped on distributed computational units (e.g., within the MapReduce framework), and structures larger than the available RAM can be directly built in mass storage.

Bio
Sebastiano Vigna’s research focuses on the interaction between theory
and practice. He has worked on highly theoretical topics such as
computability on the reals, distributed computability,
self-stabilization, minimal perfect hashing, succinct data structures,
query recommendation, algorithms for large graphs, pseudorandom number
generation, theoretical/experimental analysis of spectral rankings such
as PageRank, and axiomatization of centrality measures, but he is also
(co)author of several widely used software tools ranging from
high-performance Java libraries to a model-driven software generator, a
search engine, a crawler, a text editor and a graph compression
framework. In 2011 he collaborated to the computation the distance
distribution of the whole Facebook graph, from which it was possible to
evince that on Facebook there are just 3.74 degrees of separation.
Recently, he participated to the analysis of the largest available
public web crawl (Common Crawl 2012), which led to the publication of
the first open ranking of web sites
(http://wwwranking.webdatacommons.org/). His work on Elias-Fano coding
and quasi-succinct indices is at the basis of the code of Facebook’s
“folly” library
(https://github.com/facebook/folly/blob/master/folly/experimental/EliasFanoCoding.h).
He also collaborated to the first open ranking of Wikipedia pages
(http://wikirank.di.unimi.it/), which is based on his body of work on
centrality in networks. His pseudorandom number generator xorshift128+
is currently used by the JavaScript engine V8 of Chrome, as well by
Safari and Firefox, and it is the stock generator of the Erlang language.
Sebastiano Vigna obtained his PhD in Computer Science from the
Università degli Studi di Milano, where he is currently a Full
Professor.