GGUF API¶

GGUF is a binary format used by llama.cpp for storing models and their associated metadata and weights, designed for efficient memory mapping of model weights, making it particularly suitable for large language models and other AI applications.

See GGUF specification.

The GGUF API provides methods to read and write GGUF files, access and manage metadata, handle tensor information, and support various data types and array structures.

While the API manages the file structure and metadata, it does not directly write tensor data to the file. Users are responsible for writing tensor data at the correct file offsets specified by the API, ensuring proper data alignment and following the tensor specifications (type, dimensions, and layout) defined in the metadata.

This design choice separates the concerns of file format management from the actual tensor data handling, allowing users to implement custom tensor writing strategies and optimize for their specific use cases. The API provides the necessary offset information and tensor specifications, but the actual tensor data writing must be handled by the user's implementation. For example, after creating a GGUF file with metadata and tensor definitions, users must:

Get tensor offsets from the API via GGUF#tensorDataOffset() and TensorInfo#offset()
Seek to the correct file position
Write tensor data in the correct format
Ensure proper memory alignment according to GGUF#getAlignment()

Get Started¶

To use the GGUF library in your project, add the following dependency using your preferred build tool. The library requires Java 11 or higher and has no external dependencies.

MavenGradleSBTMill

pom.xml

<dependency>
    <groupId>com.llama4j</groupId>
    <artifactId>gguf</artifactId>
    <version>0.1.1</version>
</dependency>

build.gradle

implementation 'com.llama4j:gguf:0.1.1'

build.sbt

libraryDependencies += "com.llama4j" % "gguf" % "0.1.1"

build.sc

ivy"com.llama4j::gguf:0.1.1"

Reading GGUF files¶

import com.llama4j.gguf.GGUF;

Path modelPath = Path.of("/path/to/Llama-3.2-1B-Instruct-Q8_0.gguf");
GGUF gguf = GGUF.read(modelPath);

System.out.println(gguf.getMetadataKeys());

Note

When reading GGUF files, the order of metadata keys and tensors within the file is preserved.

Basic Information¶

// Get GGUF format version
int version = gguf.getVersion();

// Get alignment value (default or specified)
int alignment = gguf.getAlignment();

// Get tensor data offset
long tensorDataOffset = gguf.getTensorDataOffset();

Accessing Metadata¶

// Get all metadata keys
Set<String> keys = gguf.getMetadataKeys();

// Check if key exists
boolean hasKey = gguf.containsKey("key");

// Get metadata type
MetadataValueType type = gguf.getType("key");

// Get component type for arrays
MetadataValueType arrayType = gguf.getComponentType("key");

// Get value with type casting
int intValue = gguf.getValue(int.class, "numberKey");
String text = gguf.getValue(String.class, "textKey");
float[] floats = gguf.getValue(float[].class, "floatArrayKey");

// Generic access if type is unknown
Object value = gguf.getValue(Object.class, "key");

// Get value with default fallback
int theValue = gguf.getValueOrDefault(int.class, "key", 0);

// String array
String[] strings = gguf.getValue(String[].class, "stringArray");

// Numeric arrays
int[] integers = gguf.getValue(int[].class, "intArray");
float[] floats = gguf.getValue(float[].class, "floatArray");
double[] doubles = gguf.getValue(double[].class, "doubleArray");

// Boolean array
boolean[] bools = gguf.getValue(boolean[].class, "boolArray");

Accessing Tensors¶

// Get all tensor information, tensor order is preserved
Collection<TensorInfo> tensors = gguf.getTensors();

// Get specific tensor info
TensorInfo tensor = gguf.getTensor("token_embeddings");

// Check tensor existence
boolean hasTensor = gguf.containsTensor("token_embeddings");

Best Practices¶

Always close channels after reading/writing:

try (var channel = Files.newByteChannel(path, StandardOpenOption.READ)) {
    GGUF gguf = GGUF.read(channel);
    // Process GGUF data
}

Check for key/tensor existence before accessing values:

if (gguf.containsKey("key")) {
    int value = gguf.getValue(int.class, "key");
}
// Or use default values
float ropeTetha = gguf.getValueOrDefault(float.class, "key", 1e-5f);

Reading from URLs¶

No need to download large GGUF files or using a browser to peek at the GGUF metadata - GGUF metadata can be easily read from various sources, including URLs.

static GGUF readFromHuggingFace(String user, String repo, String file) throws IOException {
    URL url = new URL("https://hf.co/%s/%s/resolve/main/%s".formatted(user, repo, file));
    try (var ch = Channels.newChannel(
            new BufferedInputStream(url.openConnection().getInputStream()))) {
        return GGUF.read(ch);
    }
}

// Example usage
String user = "mukel";
String repo = "Llama-3.2-1B-Instruct-GGUF";
String file = "Llama-3.2-1B-Instruct-Q8_0.gguf";
GGUF gguf = readFromHuggingFace(user, repo, file);

System.out.println(gguf.getMetadataKeys());

Writing GGUF files¶

// Write to file
GGUF gguf = ...;
GGUF.write(gguf, Path.of("model.gguf"));

The Builder interface provides a fluent API for creating and modifying GGUF metadata.

Creating a New GGUF File¶

// Create a new empty builder
Builder builder = Builder.newBuilder()
    .setVersion(3)     // (optionsl) set GGUF version
    .setAlignment(32); // (optional) set alignment (must be power of 2)

// Add metadata
builder
    .putString("name", "my-model")
    .putInteger("num_layers", 32)
    .putFloat("learning_rate", 0.001f)
    .putArrayOfString("vocab", new String[]{"token1", "token2"});

// Add tensor information
builder.putTensor(
    TensorInfo.create(
        "weights",    // tensor name
        new long[]{1024, 1024}, // shape
        GGMLType.F32, // GGML type
        offset        // tensor offset w.r.t. to gguf.tensorDataOffset()
));

// Build the GGUF instance
GGUF gguf = builder.build();

// Write to file
GGUF.write(gguf, Path.of("model.gguf"));

Modifying Existing GGUF Files¶

// Create builder from existing GGUF
GGUF existing = GGUF.read(Path.of("model.gguf"));
Builder builder = Builder.newBuilder(existing);

// Modify metadata
builder
    .putString("description", "Updated model")
    .removeKey("old_key")
    .putInteger("num_layers", 64);

// Modify tensors
builder
    .removeTensor("old_tensor")
    .putTensor(TensorInfo.create("new_tensor", ...));

// Build and save
GGUF updated = builder.build();
GGUF.write(updated, Path.of("updated-model.gguf"));

Re-compute tensors offsets¶

By default, the tensor offsets are re-computed on .build().

// Tensors are packed in the same order, respecting the alignment
GGUF gguf1 = builder.build(); // recomputeTensorOffsets = true

// Leave tensors as-is, do not re-compute tensors offsets
GGUF gguf2 = builder.build(false); // recomputeTensorOffsets = false

Warning

When recomputeTensorOffsets is false, you must ensure tensor offsets are correctly set manually.

Metadata Types¶

The API supports various data types for metadata values, unsigned types are always stored using Java signed types.

GGUF Type	Java Type	Description
INT8	byte	Signed 8-bit integer
UINT8	byte	Unsigned 8-bit integer (requires manual conversion)
INT16	short	Signed 16-bit integer
UINT16	short	Unsigned 16-bit integer (requires manual conversion)
INT32	int	Signed 32-bit integer
UINT32	int	Unsigned 32-bit integer (requires manual conversion)
INT64	long	Signed 64-bit integer
UINT64	long	Unsigned 64-bit integer (requires manual conversion)
FLOAT32	float	32-bit floating point
FLOAT64	double	64-bit floating point
BOOL	boolean	Boolean value
STRING	String	Text string
ARRAY	String[] or primitive arrays	Array of supported types

Warning

Reading/writing arrays of arrays is currently NOT supported.

Builder.newBuilder()
    .putBoolean("flag", true)
    .putByte("byte_val", (byte) 1)
    .putShort("short_val", (short) 100)
    .putInteger("int_val", 1000)
    .putLong("long_val", 10000L)
    .putFloat("float_val", 0.5f)
    .putDouble("double_val", 0.75)
    .putBoolean("string_val", "hello")
    // Unsigned types
    .putUnsignedByte("ubyte", (byte) 255)
    .putUnsignedShort("ushort", (short) 65535)
    .putUnsignedInteger("uint", 0xFFFFFFFF)
    .putUnsignedLong("ulong", -1L)  // Represents max unsigned long
    // Arrays
    .putArrayOfString("strings", new String[]{"a", "b", "c"})
    .putArrayOfInteger("ints", new int[]{1, 2, 3})
    .putArrayOfFloat("floats", new float[]{0.1f, 0.2f, 0.3f})    
    .putArrayOfBoolean("flags", new boolean[]{true, false, true});