Fabians Bookmarks
  • Fabians Bookmarks
  • Tag cloud
  • Daily
  • RSS Feed
  • Login
Delete   Set public   Set private   Add tags   Delete tags
  Add tag   Cancel
  Delete tag   Cancel
17527 shaares · 14141 private links
Filters
Links per page
20 50 100

fa81/ApproxyCount: A (Python) script to approximate the number of distinct values in a stream of elements using the (simple) Chakraborty/Vinodchandran/Meel algorithm (https://arxiv.org/pdf/2301.10191#section.2). - Codeberg.org

A (Python) script to approximate the number of distinct values in a stream of elements using the (simple) Chakraborty/Vinodchandran/Meel algorithm (https://arxiv.org/pdf/2301.10191#section.2).

tldr:

Compared to sort/uniq:

– sort/uniq always uses less memory (about 30-50%).
– sort/uniq is about 5 times slower.

Compared to 'the awk construct':

– awk uses about the same amount of time (0.5x-2x).
– awk uses much more memory for large files. Basically linear to the file size, while ApproxiCount has an upper bound. For typical multi-GiB files this can mean factors of 20x-150x, e.g. 5GiB (awk) vs. 40MiB (aprxc).

python cli linux tool algorithm math computerscience bigdata
May 23, 2024 at 18:53:25 GMT+2*
https://codeberg.org/fa81/ApproxyCount
Filters
Links per page
20 50 100
By @fabian@floss.social · Powered by Shaarli
Fold Fold all Expand Expand all Are you sure you want to delete this link? Are you sure you want to delete this tag? The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community