iCubed (Institute, Industry, Innovation) seminars invite DSI Industry Affiliates to give technical talks on work going on in their domain. Join to learn about real-world uses of data science and career opportunities at Industry Affiliate companies.

Guest Speaker

Nikita Seleznev, Manager, Machine Learning Engineering, Capital One

Moderated By: Jessica Rodriguez, Industry Engagement and Outreach Officer, The Data Science Institute


Wednesday, May 11, 2022 (4:30 PM – 5:30 PM ET) – Virtual

This event was NOT recorded.

Talk Information

Double-Hashing Algorithm for Frequency Estimation in Data Streams

Frequency estimation of elements is an important task for summarizing data streams and machine learning applications. The problem is often addressed by using streaming algorithms with sublinear space data structures. Commonly used streaming algorithms, such as count-min sketch, have many advantages, but do not take into account properties of a data stream for performance optimization. In the present talk we introduce a novel double-hashing algorithm that provides flexibility to optimize streaming algorithms depending on the properties of a given stream and is not dependent on specific features of the stream elements. We demonstrate on both synthetic data and an internet query log dataset that our approach is capable of improving frequency estimation in data streams. 

Bio: Nikita Seleznev is a Manager, Machine Learning Engineer at Capital One in Applied Research. Since joining the company in 2020 he has focused on emerging applications of machine learning algorithms in finance. Nikita has prior experience in research and technology product development in the energy domain.