Coping with the explosion of ML research

Taivo Pungas

06 Jan 2019 • 2 min read

As the popularity of machine learning grows, so does the amount of academic papers, blog posts and software released. As an industry practicioner I need to be up-to-date with useful ones without spending hours each week combing through new developments to find the few important updates.

The obvious solution to this is curation: having people or algorithms digest the mass of new ML work into a smaller set of recommendations. There are various flavours of this:

Arxiv-sanity with its top recent, top hype, and personalised paper recommendation categories.
Technical gatherings, from global events like Nvidia GTC to regional conferences like North Star AI in Europe to local meetup groups.
Email lists. I especially like Import AI and Data Science Weekly.
Podcasts like This Week in ML & AI.

All of the above still requires special effort: opening a website, watching a video, or even travelling to an event. With Joonatan Samuel from the Veriff ML team we came up with a way to reduce the friction to a minimum: showing most important papers in a TV dashboard set up next to our workplace.

Arxiv dashboard setup

This is what arxiv-sanity looks like on our dashboard (the refresh interval is very short for demonstration purposes):

And Papers with Code:

In action at the office:

The technical setup is simple: a 50ish-inch TV connected to a mini PC that runs Chrome and automatically revolves between tabs (we already had this running for other dashboards). We show content from two websites: paperswithcode.com and arxiv-sanity.com, but it’s trivial to add any website we like.

The styling, shuffling, and refreshing happens purely on the front end through injecting custom CSS and JS into the pages, manually written for each website. We’ve shared those snippets on Github at taivop/arxivdashboard. Currently the code is injected by third-party Chrome extensions for adding custom CSS and JavaScript; while writing our own Chrome extension would be cleaner, we went for an 80-20 solution since the goal was saving time in the first place.

Results

Having run the dashboard for about a month now I can wholeheartedly recommend it. Almost every time I get out of my chair I gloss over the TV. Even if the title isn’t always interesting enough to go and read the abstract, the 20 or so times I’ve done that I’ve found really useful and interesting papers, from better locality-sensitive hashing to really fast speech recognition.

If you improve on our very basic setup (e.g. by writing a Chrome extension) or find the dashboard useful for yourself, I’d be glad to hear from you.