iCubed Seminar: Nikita Seleznev, Capital One
Wednesday, May 11, 2022
4:30 pm - 5:30 pm
Wednesday, May 11, 2022
4:30 pm - 5:30 pm
Nikita Seleznev, Manager, Machine Learning Engineering, Capital One
Moderated By: Jessica Rodriguez, Industry Engagement and Outreach Officer, The Data Science Institute
Wednesday, May 11, 2022 (4:30 PM – 5:30 PM ET) – Virtual
This event was NOT recorded.
Double-Hashing Algorithm for Frequency Estimation in Data Streams
Frequency estimation of elements is an important task for summarizing data streams and machine learning applications. The problem is often addressed by using streaming algorithms with sublinear space data structures. Commonly used streaming algorithms, such as count-min sketch, have many advantages, but do not take into account properties of a data stream for performance optimization. In the present talk we introduce a novel double-hashing algorithm that provides flexibility to optimize streaming algorithms depending on the properties of a given stream and is not dependent on specific features of the stream elements. We demonstrate on both synthetic data and an internet query log dataset that our approach is capable of improving frequency estimation in data streams.
Bio: Nikita Seleznev is a Manager, Machine Learning Engineer at Capital One in Applied Research. Since joining the company in 2020 he has focused on emerging applications of machine learning algorithms in finance. Nikita has prior experience in research and technology product development in the energy domain.