How to use the partd.Snappy function in partd

To help you get started, we’ve selected a few partd examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github dask / dask / dask / bag / core.py View on Github external
def groupby_disk(b, grouper, npartitions=None, blocksize=2 ** 20):
    if npartitions is None:
        npartitions = b.npartitions
    token = tokenize(b, grouper, npartitions, blocksize)

    import partd

    p = ("partd-" + token,)
    dirname = config.get("temporary_directory", None)
    if dirname:
        file = (apply, partd.File, (), {"dir": dirname})
    else:
        file = (partd.File,)
    try:
        dsk1 = {p: (partd.Python, (partd.Snappy, file))}
    except AttributeError:
        dsk1 = {p: (partd.Python, file)}

    # Partition data on disk
    name = "groupby-part-{0}-{1}".format(funcname(grouper), token)
    dsk2 = dict(
        ((name, i), (partition, grouper, (b.name, i), npartitions, p, blocksize))
        for i in range(b.npartitions)
    )

    # Barrier
    barrier_token = "groupby-barrier-" + token

    dsk3 = {barrier_token: (chunk.barrier,) + tuple(dsk2)}

    # Collect groups

partd

Appendable key-value storage

BSD-3-Clause
Latest version published 7 months ago

Package Health Score

86 / 100
Full package analysis