Fabians Bookmarks
  • Fabians Bookmarks
  • Tag cloud
  • Daily
  • RSS Feed
  • Login
Delete   Set public   Set private   Add tags   Delete tags
  Add tag   Cancel
  Delete tag   Cancel
17475 shaares · 14124 private links
Filters
Links per page
20 50 100
7 results tagged bigdata

fa81/ApproxyCount: A (Python) script to approximate the number of distinct values in a stream of elements using the (simple) Chakraborty/Vinodchandran/Meel algorithm (https://arxiv.org/pdf/2301.10191#section.2). - Codeberg.org

A (Python) script to approximate the number of distinct values in a stream of elements using the (simple) Chakraborty/Vinodchandran/Meel algorithm (https://arxiv.org/pdf/2301.10191#section.2).

tldr:

Compared to sort/uniq:

– sort/uniq always uses less memory (about 30-50%).
– sort/uniq is about 5 times slower.

Compared to 'the awk construct':

– awk uses about the same amount of time (0.5x-2x).
– awk uses much more memory for large files. Basically linear to the file size, while ApproxiCount has an upper bound. For typical multi-GiB files this can mean factors of 20x-150x, e.g. 5GiB (awk) vs. 40MiB (aprxc).

python cli linux tool algorithm math computerscience bigdata
May 23, 2024 at 6:53:25 PM GMT+2*
https://codeberg.org/fa81/ApproxyCount

Big Data Analytics - Products - Joyent

bigdata s3 ec3 mapreduce computing
February 4, 2014 at 12:23:26 PM GMT+1
https://www.joyent.com/products/manta

Presto | Distributed SQL Query Engine for Big Data

Facebook just launched Presto, our distributed SQL query engine for huge data stores. It's amazing.

facebook sql bigdata databases
November 6, 2013 at 7:41:46 PM GMT+1*
http://prestodb.io/

elasticsearch, Big Data, Search & Analytics // Speaker Deck

elasticsearch bigdata search es presentation indexing shards
November 12, 2012 at 4:21:12 PM GMT+1
https://speakerdeck.com/kimchy/elasticsearch-big-data-search-analytics

Storm, distributed and fault-tolerant realtime computation

Storm now has a website!

bigdata distributed processing storm
June 12, 2012 at 1:07:27 PM GMT+2
http://storm-project.net/

HBase Advanced Schema Design - Berlin Buzzwords - June 2012

My slides from the second talk today: "HBase Advanced Schema Design" via @slideshare #bbuzz

hbase bigdata slides presentation
June 5, 2012 at 3:41:07 PM GMT+2
http://www.slideshare.net/larsgeorge/hbase-advanced-schema-design-berlin-buzzwords-june-2012

ElasticSearch Users - some ES stats (at yfrog.com)

elasticsearch bigdata zfs
February 21, 2012 at 11:04:43 PM GMT+1
http://elasticsearch-users.115913.n3.nabble.com/some-ES-stats-at-yfrog-com-td3759891.html
Filters
Links per page
20 50 100
By @fabian@floss.social · Powered by Shaarli
Fold Fold all Expand Expand all Are you sure you want to delete this link? Are you sure you want to delete this tag? The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community