cattrs: Flexible Object Serialization and Validation
Because validation belongs to the edges.
cattrs is a Swiss Army knife for (un)structuring and validating data in Python. In practice, that means it converts unstructured dictionaries into proper classes and back, while validating their contents.
Example
cattrs works best with attrs classes, and dataclasses where simple (un-)structuring works out of the box, even for nested data, without polluting your data model with serialization details:
Python terminal utility to plot the distribution of a (newline-separated) dataset as a boxplot.
APIFlask is a lightweight Python web API framework based on Flask and marshmallow-code projects. It's easy to use, highly customizable, ORM/ODM-agnostic, and 100% compatible with the Flask ecosystem.
htpy is a library that makes writing HTML in plain Python fun and efficient, without a template language.
cloudpickle makes it possible to serialize Python constructs not supported by the default pickle module from the Python standard library.
cloudpickle is especially useful for cluster computing where Python code is shipped over the network to execute on remote hosts, possibly close to the data.
Among other things, cloudpickle supports pickling for lambda functions along with functions and classes defined interactively in the main module (for instance in a script, a shell or a Jupyter notebook).
Typed and DST-safe datetimes for Python, written in Rust
BeeWare is not a single product, or tool, or library - it’s a collection of tools and libraries, each of which works together to help you write cross platform Python applications with a native GUI. It includes:
Toga, a cross platform widget toolkit;
Briefcase, a tool for packaging Python projects as distributable artefacts that can be shipped to end users;
Libraries (such as Rubicon ObjC) for accessing platform-native libraries;
Pre-compiled builds of Python that can be used on platforms where official Python installers aren’t available.
open-source smart on-demand image cropping, resizing and filters
A (Python) script to approximate the number of distinct values in a stream of elements using the (simple) Chakraborty/Vinodchandran/Meel algorithm (https://arxiv.org/pdf/2301.10191#section.2).
tldr:
Compared to sort/uniq:
– sort/uniq always uses less memory (about 30-50%).
– sort/uniq is about 5 times slower.
Compared to 'the awk construct':
– awk uses about the same amount of time (0.5x-2x).
– awk uses much more memory for large files. Basically linear to the file size, while ApproxiCount has an upper bound. For typical multi-GiB files this can mean factors of 20x-150x, e.g. 5GiB (awk) vs. 40MiB (aprxc).
Vega-Altair is a declarative visualization library for Python. Its simple, friendly and consistent API, built on top of the powerful Vega-Lite grammar, empowers you to spend less time writing code and more time exploring your data.
diffoscope is a tool to get to the bottom of what makes files or directories different. It recursively unpacks archives of many kinds and transforms various binary formats into more human readable forms to compare them.
A multiprocessing distributed task queue for Django based on Django-Q - GitHub - django-q2/django-q2: A multiprocessing distributed task queue for Django based on Django-Q
A lightweight message queue. Like AWS SQS and RSMQ but on Postgres.