GGUF API¶
GGUF is a binary format used by llama.cpp for storing models and their associated metadata and weights, designed for efficient memory mapping of model weights, making it particularly suitable for large language models and other AI applications.
See GGUF specification.
The GGUF API provides methods to read and write GGUF files, access and manage metadata, handle tensor information, and support various data types and array structures.
While the API manages the file structure and metadata, it does not directly write tensor data to the file. Users are responsible for writing tensor data at the correct file offsets specified by the API, ensuring proper data alignment and following the tensor specifications (type, dimensions, and layout) defined in the metadata.
This design choice separates the concerns of file format management from the actual tensor data handling, allowing users to implement custom tensor writing strategies and optimize for their specific use cases. The API provides the necessary offset information and tensor specifications, but the actual tensor data writing must be handled by the user's implementation. For example, after creating a GGUF file with metadata and tensor definitions, users must:
- Get tensor offsets from the API via
GGUF#tensorDataOffset()
andTensorInfo#offset()
- Seek to the correct file position
- Write tensor data in the correct format
- Ensure proper memory alignment according to
GGUF#getAlignment()
Get Started¶
To use the GGUF library in your project, add the following dependency using your preferred build tool. The library requires Java 11 or higher and has no external dependencies.
Reading GGUF files¶
import com.llama4j.gguf.GGUF;
Path modelPath = Path.of("/path/to/Llama-3.2-1B-Instruct-Q8_0.gguf");
GGUF gguf = GGUF.read(modelPath);
System.out.println(gguf.getMetadataKeys());
Note
When reading GGUF files, the order of metadata keys and tensors within the file is preserved.
Basic Information¶
// Get GGUF format version
int version = gguf.getVersion();
// Get alignment value (default or specified)
int alignment = gguf.getAlignment();
// Get tensor data offset
long tensorDataOffset = gguf.getTensorDataOffset();
Accessing Metadata¶
// Get all metadata keys
Set<String> keys = gguf.getMetadataKeys();
// Check if key exists
boolean hasKey = gguf.containsKey("key");
// Get metadata type
MetadataValueType type = gguf.getType("key");
// Get component type for arrays
MetadataValueType arrayType = gguf.getComponentType("key");
// Get value with type casting
int intValue = gguf.getValue(int.class, "numberKey");
String text = gguf.getValue(String.class, "textKey");
float[] floats = gguf.getValue(float[].class, "floatArrayKey");
// Generic access if type is unknown
Object value = gguf.getValue(Object.class, "key");
// Get value with default fallback
int theValue = gguf.getValueOrDefault(int.class, "key", 0);
// String array
String[] strings = gguf.getValue(String[].class, "stringArray");
// Numeric arrays
int[] integers = gguf.getValue(int[].class, "intArray");
float[] floats = gguf.getValue(float[].class, "floatArray");
double[] doubles = gguf.getValue(double[].class, "doubleArray");
// Boolean array
boolean[] bools = gguf.getValue(boolean[].class, "boolArray");
Accessing Tensors¶
// Get all tensor information, tensor order is preserved
Collection<TensorInfo> tensors = gguf.getTensors();
// Get specific tensor info
TensorInfo tensor = gguf.getTensor("token_embeddings");
// Check tensor existence
boolean hasTensor = gguf.containsTensor("token_embeddings");
Best Practices¶
-
Always close channels after reading/writing:
-
Check for key/tensor existence before accessing values:
Reading from URLs¶
No need to download large GGUF files or using a browser to peek at the GGUF metadata - GGUF metadata can be easily read from various sources, including URLs.
static GGUF readFromHuggingFace(String user, String repo, String file) throws IOException {
URL url = new URL("https://hf.co/%s/%s/resolve/main/%s".formatted(user, repo, file));
try (var ch = Channels.newChannel(
new BufferedInputStream(url.openConnection().getInputStream()))) {
return GGUF.read(ch);
}
}
// Example usage
String user = "mukel";
String repo = "Llama-3.2-1B-Instruct-GGUF";
String file = "Llama-3.2-1B-Instruct-Q8_0.gguf";
GGUF gguf = readFromHuggingFace(user, repo, file);
System.out.println(gguf.getMetadataKeys());
Writing GGUF files¶
The Builder interface provides a fluent API for creating and modifying GGUF metadata.
Creating a New GGUF File¶
// Create a new empty builder
Builder builder = Builder.newBuilder()
.setVersion(3) // (optionsl) set GGUF version
.setAlignment(32); // (optional) set alignment (must be power of 2)
// Add metadata
builder
.putString("name", "my-model")
.putInteger("num_layers", 32)
.putFloat("learning_rate", 0.001f)
.putArrayOfString("vocab", new String[]{"token1", "token2"});
// Add tensor information
builder.putTensor(
TensorInfo.create(
"weights", // tensor name
new long[]{1024, 1024}, // shape
GGMLType.F32, // GGML type
offset // tensor offset w.r.t. to gguf.tensorDataOffset()
));
// Build the GGUF instance
GGUF gguf = builder.build();
// Write to file
GGUF.write(gguf, Path.of("model.gguf"));
Modifying Existing GGUF Files¶
// Create builder from existing GGUF
GGUF existing = GGUF.read(Path.of("model.gguf"));
Builder builder = Builder.newBuilder(existing);
// Modify metadata
builder
.putString("description", "Updated model")
.removeKey("old_key")
.putInteger("num_layers", 64);
// Modify tensors
builder
.removeTensor("old_tensor")
.putTensor(TensorInfo.create("new_tensor", ...));
// Build and save
GGUF updated = builder.build();
GGUF.write(updated, Path.of("updated-model.gguf"));
Re-compute tensors offsets¶
By default, the tensor offsets are re-computed on .build()
.
// Tensors are packed in the same order, respecting the alignment
GGUF gguf1 = builder.build(); // recomputeTensorOffsets = true
// Leave tensors as-is, do not re-compute tensors offsets
GGUF gguf2 = builder.build(false); // recomputeTensorOffsets = false
Warning
When recomputeTensorOffsets
is false, you must ensure tensor offsets are correctly set manually.
Metadata Types¶
The API supports various data types for metadata values, unsigned types are always stored using Java signed types.
GGUF Type | Java Type | Description |
---|---|---|
INT8 | byte | Signed 8-bit integer |
UINT8 | byte | Unsigned 8-bit integer (requires manual conversion) |
INT16 | short | Signed 16-bit integer |
UINT16 | short | Unsigned 16-bit integer (requires manual conversion) |
INT32 | int | Signed 32-bit integer |
UINT32 | int | Unsigned 32-bit integer (requires manual conversion) |
INT64 | long | Signed 64-bit integer |
UINT64 | long | Unsigned 64-bit integer (requires manual conversion) |
FLOAT32 | float | 32-bit floating point |
FLOAT64 | double | 64-bit floating point |
BOOL | boolean | Boolean value |
STRING | String | Text string |
ARRAY | String[] or primitive arrays | Array of supported types |
Warning
Reading/writing arrays of arrays is currently NOT supported.
Builder.newBuilder()
.putBoolean("flag", true)
.putByte("byte_val", (byte) 1)
.putShort("short_val", (short) 100)
.putInteger("int_val", 1000)
.putLong("long_val", 10000L)
.putFloat("float_val", 0.5f)
.putDouble("double_val", 0.75)
.putBoolean("string_val", "hello")
// Unsigned types
.putUnsignedByte("ubyte", (byte) 255)
.putUnsignedShort("ushort", (short) 65535)
.putUnsignedInteger("uint", 0xFFFFFFFF)
.putUnsignedLong("ulong", -1L) // Represents max unsigned long
// Arrays
.putArrayOfString("strings", new String[]{"a", "b", "c"})
.putArrayOfInteger("ints", new int[]{1, 2, 3})
.putArrayOfFloat("floats", new float[]{0.1f, 0.2f, 0.3f})
.putArrayOfBoolean("flags", new boolean[]{true, false, true});