SequentialZfpCompression
Documentation for SequentialZfpCompression.
This package aims to provide a nice interface for compression of multiple arrays of the same size in sequence. These arrays can be up to 4D. The intended application is to store snapshots of a iterative process such as a simulation or optimization process. Since sometimes these processes may require a lot of iterations, having compression might save you some RAM. This package uses the ZFP compression algorithm algorithm.
A few comments before you start reading the code.
This code implements an vector like interface to access compressed arrays at different time indexes, so to understand the code you need to first read the julia documentation on indexing interfaces. Basically, I had to implement a method for the Base.getindex function which governs if an type can be indexed like an array or vector. I also wrote a method for the function Base.append! to add new arrays to the sequential collection of compressed arrays.
I also use functions like fill and map, so reading the documentation on these functions might also help.
Example
Here is an simple example of its usage. Imagine these A1 till A3 arrays are snapshots of a iterative process.
using SequentialZfpCompression
using Test
# Lets define a few arrays to compress
A1 = rand(Float32, 100,100,100)
A2 = rand(Float32, 100,100,100)
A3 = rand(Float32, 100,100,100)
# Initializing the compressed array sequence
compSeq = SeqCompressor(Float32, 100, 100, 100)
# Compressing the arrays
append!(compSeq, A1)
append!(compSeq, A2)
append!(compSeq, A3)
# Asserting the decompressed array is the same
@test compSeq[1] == A1
@test compSeq[2] == A2
@test compSeq[3] == A3
# Dumping to a file
save("myarrays.szfp", compSeq)
# Reading it back
compSeq2 = load("myarrays.szfp")
# Asserting the loaded type is the same
@test compSeq[:] == compSeq2[:]
# output
Test PassedLossy compression
Lossy compression is achieved by specifying additional keyword arguments for SeqCompressor, which are tol::Real, precision::Int, and rate::Real. If none are specified (as in the example above) the compression is lossless (i.e. reversible). Lossy compression parameters are
toldefines the maximum absolute error that is tolerated.precisioncontrols the precision, bounding a weak relative error, see this FAQratefixes the bits used per value.
Multi file out-of-core parallel compression and decompression
This package has two workflows for compression. It can compress the array into a Vector{UInt8} and keep it in memory, or it can slice the array and compress each slice, saving each slice to different files, one per thread.
To use this out-of-core approach, you have four options:
- Use the
inmemory=falsekeyword toSeqCompressor. This will create the files for you intmpdir(), - Specify
filepaths::Vector{String}keyword argument with a list of folders, one for each thread, - Specify
filepaths::Stringkeyword argument with just one folder that will hold all the files, - Specify
envVarPath::Stringkeyword argument with the name of a environment variable that holds the path to the folder that will hold all the files. This might be useful if you are using a SLURM cluster, that allows you to access the local node storage via theSLURM_TMPDIRenvironment variable.
SequentialZfpCompression.CompressedArraySeqSequentialZfpCompression.CompressedMmapArraySeqSequentialZfpCompression.CompressedMultiFileArraySeqBase.append!Base.getindexBase.ndimsBase.sizeSequentialZfpCompression.SeqCompressorSequentialZfpCompression.cleanup!SequentialZfpCompression.cleanup!SequentialZfpCompression.finalizeMultiFile!SequentialZfpCompression.totalsize
SequentialZfpCompression.CompressedArraySeq — Type
CompressedArraySeq{T,Nx}A mutable structure for storing time-dependent arrays in a compressed format.
Fields
data::Vector{UInt8}: Compressed data in byte form.headpositions::Vector{Int64}: Positions of the beginning of each time slice indata.tailpositions::Vector{Int64}: Positions of the end of each time slice indata.spacedim::NTuple{Nx,Int32}: Dimensions of the spatial grid.timedim::Int32: Number of time steps.eltype::Type{T}: Element type of the uncompressed array.tol::Float32: Mean absolute error that is tolerated.precision::Float32: Controls the precision, bounding a weak relative error.rate::Int64: Fixes the bits used per value.
SequentialZfpCompression.CompressedMmapArraySeq — Type
CompressedMmapArraySeq{T,Nx}A compressed time-dependent array stored in per-thread temporary files, accessed via memory mapping for zero-copy reads.
The write path is identical to CompressedMultiFileArraySeq: each append! call compresses a spatial slice and writes it to the backing file with standard IO. After every write the mmap views are refreshed to cover the grown file, so subsequent getindex calls read directly from OS-mapped pages without an extra allocation or copy.
Fields
files::Vector{IOStream}: Write channel — one file per thread.mmaps::Vector{Vector{UInt8}}: Live mmap views into each file; refreshed after everyappend!.headpositions::Vector{Int64}: 1-indexed end byte of each (time, thread) chunk.tailpositions::Vector{Int64}: 1-indexed start byte of each (time, thread) chunk.spacedim::NTuple{Nx,Int32}: Spatial dimensions.timedim::Int32: Number of time steps appended so far.eltype::Type{T}: Element type of the uncompressed array.tol::Float32,precision::Int64,rate::Int64: ZFP compression parameters.nth::Int16: Number of threads (and files).filePaths::Vector{String}: Paths to the backing temporary files.
Arguments exclusive for the constructor
filepaths::Union{Vector{String}, String}="/tmp/seqcomp": Directory (or per-thread paths) where temporary files are created.
SequentialZfpCompression.CompressedMultiFileArraySeq — Type
CompressedMultiFileArraySeq{T,Nx}A compressed time-dependent array that is stored in multiple files, one per thread.
Fields
files::Vector{IOStream}: IO object for each array slice.headpositions::Vector{Int64}: Positions of the beginning of each time slice indata.tailpositions::Vector{Int64}: Positions of the end of each time slice indata.spacedim::NTuple{Nx,Int32}: Dimensions of the spatial grid.timedim::Int32: Number of time steps.eltype::Type{T}: Element type of the uncompressed array.- tol::Float32: Mean absolute error that is tolerated.
- precision::Float32: Controls the precision, bounding a weak relative error.
- rate::Int64: Fixes the bits used per value.
Arguments exclusive for the constructor
filepaths::Union{Vector{String}, String}="/tmp/seqcomp": Path(s) to the files where the compressed data will be stored. If only one string is passed, the same path will be used for all threads.
Base.append! — Method
append!(compArray::CompressedArraySeq{T,N}, array::AbstractArray{T,N})Append a new time slice to compArray, compressing array in the process.
Arguments
compArray::CompressedArraySeq{T,N}: Existing compressed array.
array::AbstractArray{T,N}: Uncompressed array to append.```
Base.getindex — Method
getindex(compArray::AbstractCompArraySeq, timeidx::Int)Retrieve and decompress a single time slice from compArray at timeidx.
Base.ndims — Method
ndims(compArray::AbstractCompArraySeq)Returns the number of dimensions of the uncompressed array, including the time dimension.
SequentialZfpCompression.SeqCompressor — Method
SeqCompressor(dtype::DataType, spacedim::Integer...;
inmemory::Bool=true, mmap::Bool=false,
rate::Int=0, tol::Real=0, precision::Real=0,
filepaths::Union{Vector{String}, String}="",
envVarPath::String="", nthreads::Integer=-1, nt::Integer=1)Construct a compressed sequential array, choosing the backend based on the arguments:
| Condition | Backend |
|---|---|
mmap=true | CompressedMmapArraySeq — file-backed, zero-copy reads via mmap. Call refreshMmaps! once after all append! calls and before any indexing. |
inmemory=true (default) | CompressedArraySeq — all data held in a Vector{UInt8} in RAM. |
inmemory=false | CompressedMultiFileArraySeq — file-backed with standard seek/read IO. |
Arguments
dtype::DataType: element type of the arrays to compress (e.g.Float32,Float64).spacedim::Integer...: spatial dimensions of each time slice.inmemory::Bool=true: store compressed data in memory (ignored whenmmap=true).mmap::Bool=false: use memory-mapped file backend for zero-copy reads.rate::Int=0: fixed-rate mode — bits per value.tol::Real=0: fixed-accuracy mode — maximum absolute error.precision::Real=0: fixed-precision mode — number of uncompressed bits per value.filepaths::Union{Vector{String}, String}="": directory or per-thread file paths for file-backed backends. A single string is used as a directory for all threads; a vector must have one entry per thread. Ignored forinmemory=true.envVarPath::String="": name of an environment variable whose value is used as the file path (useful with SLURM'sSLURM_TMPDIR). Takes precedence overfilepaths.nthreads::Integer=-1: maximum number of threads to use. Defaults toThreads.nthreads().nt::Integer=1: expected number of time steps, used to pre-allocate the in-memory buffer.
Examples
In-memory (default):
julia> using SequentialZfpCompression
julia> A = SeqCompressor(Float64, 4, 4)
SequentialZfpCompression.CompressedArraySeq{Float64, 2}(UInt8[], [0], [0], (4, 4), 0, Float64, 0.0f0, 0, 0)
julia> append!(A, ones(Float64, 4, 4));
julia> A[1]
4×4 Matrix{Float64}:
1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0
julia> size(A)
(4, 4, 1)Memory-mapped (write all, then refresh once before reading):
A = SeqCompressor(Float32, 64, 64; mmap=true)
for t in 1:100
append!(A, my_slice(t))
end
refreshMmaps!(A) # map the completed files into memory
A[50] # zero-copy readSequentialZfpCompression.cleanup! — Method
cleanup!(comp::CompressedMmapArraySeq)Close file streams and remove all backing temporary files.
SequentialZfpCompression.cleanup! — Method
cleanup!(comp:CompressedMultiFileArraySeq)Cleanup function for CompressedMultiFileArraySeq. Closes all open file streams and removes the temporary files from disk.
Arguments
comp: ACompressedMultiFileArraySeqobject containing the file streams and paths to clean up.
SequentialZfpCompression.finalizeMultiFile! — Method
finalizeMultiFile(comp)Cleanup function for CompressedMultiFileArraySeq. Closes all open file streams and removes the temporary files from disk.
Arguments
comp: ACompressedMultiFileArraySeqobject containing the file streams and paths to clean up.
SequentialZfpCompression.totalsize — Method
totalsize(compArray::CompressedMultiFileArraySeq)Returns the total size of the compressed data in bytes.