Search: [bigdata] - Fabians Bookmarks

3388 shaares

3388 shaares

Filters

Links per page

20 50 100

7 results tagged bigdata

fa81/ApproxyCount: A (Python) script to approximate the number of distinct values in a stream of elements using the (simple) Chakraborty/Vinodchandran/Meel algorithm (https://arxiv.org/pdf/2301.10191#section.2). - Codeberg.org

A (Python) script to approximate the number of distinct values in a stream of elements using the (simple) Chakraborty/Vinodchandran/Meel algorithm (https://arxiv.org/pdf/2301.10191#section.2).

tldr:

Compared to sort/uniq:

– sort/uniq always uses less memory (about 30-50%).
– sort/uniq is about 5 times slower.

Compared to 'the awk construct':

– awk uses about the same amount of time (0.5x-2x).
– awk uses much more memory for large files. Basically linear to the file size, while ApproxiCount has an upper bound. For typical multi-GiB files this can mean factors of 20x-150x, e.g. 5GiB (awk) vs. 40MiB (aprxc).

python · cli · linux · tool · algorithm · math · computerscience · bigdata

May 23, 2024 at 6:53:25 PM GMT+2 * · permalink

https://codeberg.org/fa81/ApproxyCount

Big Data Analytics - Products - Joyent

bigdata · s3 · ec3 · mapreduce · computing

February 4, 2014 at 12:23:26 PM GMT+1 · permalink

https://www.joyent.com/products/manta

Presto | Distributed SQL Query Engine for Big Data

Facebook just launched Presto, our distributed SQL query engine for huge data stores. It's amazing.

facebook · sql · bigdata · databases

November 6, 2013 at 7:41:46 PM GMT+1 * · permalink

http://prestodb.io/

elasticsearch, Big Data, Search & Analytics // Speaker Deck

elasticsearch · bigdata · search · es · presentation · indexing · shards

November 12, 2012 at 4:21:12 PM GMT+1 · permalink

https://speakerdeck.com/kimchy/elasticsearch-big-data-search-analytics

Storm, distributed and fault-tolerant realtime computation

Storm now has a website!

bigdata · distributed · processing · storm

June 12, 2012 at 1:07:27 PM GMT+2 · permalink

http://storm-project.net/

HBase Advanced Schema Design - Berlin Buzzwords - June 2012

My slides from the second talk today: "HBase Advanced Schema Design" via @slideshare #bbuzz

hbase · bigdata · slides · presentation

June 5, 2012 at 3:41:07 PM GMT+2 · permalink

http://www.slideshare.net/larsgeorge/hbase-advanced-schema-design-berlin-buzzwords-june-2012

ElasticSearch Users - some ES stats (at yfrog.com)

elasticsearch · bigdata · zfs

February 21, 2012 at 11:04:43 PM GMT+1 · permalink

http://elasticsearch-users.115913.n3.nabble.com/some-ES-stats-at-yfrog-com-td3759891.html

Filters

Links per page

20 50 100