2024 llama.cpp 7840u: Difference between revisions
(Created page with "I briefly had a Macbook M3 Max with 64GB. It was pretty good at running local LLMs, but couldn't stand the ergonomics and not being able to run Linux, so returned it. I picked up a Thinkpad P16s with an AMD 7840 to give Linux hardware a chance to catch up with Apple silicon. It's an amazing computer for the price, and can run LLMs. Here's how I set up llama.cpp to use ROCm. Install ROCm, set an env variable for the 780m: <code>export HSA_OVERRIDE_GFX_VERSION=11.0.0</co...") |
No edit summary |
||
Line 17: | Line 17: | ||
ggml_cuda_init: found 1 ROCm devices: | |||
Device 0: AMD Radeon Graphics, compute capability 11.0, VMM: no | |||
llm_load_tensors: ggml ctx size = 0.22 MiB | |||
llm_load_tensors: offloading 24 repeating layers to GPU | |||
llm_load_tensors: offloaded 24/33 layers to GPU | |||
llm_load_tensors: ROCm0 buffer size = 2978.91 MiB | |||
llm_load_tensors: CPU buffer size = 4165.37 MiB | |||
............................................................................................... | |||
llama_new_context_with_model: n_ctx = 512 | |||
llama_new_context_with_model: n_batch = 512 | |||
llama_new_context_with_model: n_ubatch = 512 | |||
llama_new_context_with_model: freq_base = 1000000.0 | |||
llama_new_context_with_model: freq_scale = 1 | |||
llama_kv_cache_init: ROCm0 KV buffer size = 48.00 MiB | |||
llama_kv_cache_init: ROCm_Host KV buffer size = 16.00 MiB | |||
llama_new_context_with_model: KV self size = 64.00 MiB, K (f16): 32.00 MiB, V (f16): 32.00 MiB | |||
llama_new_context_with_model: ROCm_Host output buffer size = 0.12 MiB | |||
llama_new_context_with_model: ROCm0 compute buffer size = 173.04 MiB | |||
llama_new_context_with_model: ROCm_Host compute buffer size = 9.01 MiB | |||
llama_new_context_with_model: graph nodes = 1030 | |||
llama_new_context_with_model: graph splits = 92 | |||
system_info: n_threads = 8 / 16 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | | |||
sampling: | |||
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000 | |||
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800 | |||
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000 | |||
sampling order: | |||
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature | |||
generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1 | |||
simple example code for a lit Web Component that reverses a string input | |||
Hi there! Here's a simple example of a Lit Web Component that reverses a string input: | |||
```javascript | |||
import { Component, html, css } from 'lit'; | |||
class ReverseString extends Component { | |||
static styles = css` | |||
input { | |||
padding: 10px; | |||
margin-bottom: 20px; | |||
} | |||
`; | |||
static properties = { | |||
value: { type: String } | |||
}; | |||
constructor() { | |||
super(); | |||
this.value = ''; | |||
} | |||
render() { | |||
return html` | |||
<style>${this.constructor.styles}</style> | |||
<input @input=${this._handleInputChange} value=${this.value} type="text"> | |||
< p>Reversed string: ${this._reverseString(this.value)}< /p> | |||
`; | |||
} | |||
_reverseString(str) { | |||
return str.split('').reverse().join(''); | |||
} | |||
_handleInputChange(event) { | |||
this.value = event.target.value; | |||
} | |||
} | |||
customElements.define('reverse-string', ReverseString); | |||
``` | |||
In this example, we define a custom Web Component called `reverse-string` that uses Lit for rendering and handling user input. The `ReverseString` class defines a render method that returns an HTML template with an input field and a paragraph that displays the reversed string. The component also defines a `_reverseString` method that reverses a given string using the `split`, `reverse`, and `join` array methods, and a `_handleInputChange` method that updates the component's value whenever the input changes. Finally, we use the `customElements.define` method to register our component with the browser. | |||
You can use this component in your HTML like this: | |||
```html | |||
<reverse-string></reverse-string> | |||
``` [end of text] | |||
llama_print_timings: load time = 2865.16 ms | |||
llama_print_timings: sample time = 13.64 ms / 442 runs ( 0.03 ms per token, 32407.07 tokens per second) | |||
llama_print_timings: prompt eval time = 1281.98 ms / 14 tokens ( 91.57 ms per token, 10.92 tokens per second) | |||
llama_print_timings: eval time = 100829.12 ms / 441 runs ( 228.64 ms per token, 4.37 tokens per second) | |||
llama_print_timings: total time = 102397.06 ms / 455 tokens | |||
Log end | |||
{{Blikied|April 13 12, 2024}} | {{Blikied|April 13 12, 2024}} |
Revision as of 14:38, 13 April 2024
I briefly had a Macbook M3 Max with 64GB. It was pretty good at running local LLMs, but couldn't stand the ergonomics and not being able to run Linux, so returned it.
I picked up a Thinkpad P16s with an AMD 7840 to give Linux hardware a chance to catch up with Apple silicon. It's an amazing computer for the price, and can run LLMs. Here's how I set up llama.cpp to use ROCm.
Install ROCm, set an env variable for the 780m: export HSA_OVERRIDE_GFX_VERSION=11.0.0
clone llama.cpp and compile it:
make -j16 LLAMA_HIPBLAS=1 LLAMA_HIP_UMA=1 AMDGPU_TARGETS=gfx1030 DLLAMA_HIP_UMA=
ON
run it like this:
./main -m /home/vid/jan/models/mistral-ins-7b-q4/mistral-7b-instruct-v0.2.Q4_K_M
.gguf -p "example code for a lit Web Component that reverses a string" -n 50 -e
-ngl 16 -n -1
ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, compute capability 11.0, VMM: no llm_load_tensors: ggml ctx size = 0.22 MiB llm_load_tensors: offloading 24 repeating layers to GPU llm_load_tensors: offloaded 24/33 layers to GPU llm_load_tensors: ROCm0 buffer size = 2978.91 MiB llm_load_tensors: CPU buffer size = 4165.37 MiB ............................................................................................... llama_new_context_with_model: n_ctx = 512 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: freq_base = 1000000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: ROCm0 KV buffer size = 48.00 MiB llama_kv_cache_init: ROCm_Host KV buffer size = 16.00 MiB llama_new_context_with_model: KV self size = 64.00 MiB, K (f16): 32.00 MiB, V (f16): 32.00 MiB llama_new_context_with_model: ROCm_Host output buffer size = 0.12 MiB llama_new_context_with_model: ROCm0 compute buffer size = 173.04 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 9.01 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 92 system_info: n_threads = 8 / 16 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | sampling: repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000 top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800 mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000 sampling order: CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1 simple example code for a lit Web Component that reverses a string input Hi there! Here's a simple example of a Lit Web Component that reverses a string input: ```javascript import { Component, html, css } from 'lit'; class ReverseString extends Component { static styles = css` input { padding: 10px; margin-bottom: 20px; } `; static properties = { value: { type: String } }; constructor() { super(); this.value = ; } render() { return html` <style>${this.constructor.styles}</style> <input @input=${this._handleInputChange} value=${this.value} type="text"> < p>Reversed string: ${this._reverseString(this.value)}< /p> `; } _reverseString(str) { return str.split().reverse().join(); } _handleInputChange(event) { this.value = event.target.value; } } customElements.define('reverse-string', ReverseString); ``` In this example, we define a custom Web Component called `reverse-string` that uses Lit for rendering and handling user input. The `ReverseString` class defines a render method that returns an HTML template with an input field and a paragraph that displays the reversed string. The component also defines a `_reverseString` method that reverses a given string using the `split`, `reverse`, and `join` array methods, and a `_handleInputChange` method that updates the component's value whenever the input changes. Finally, we use the `customElements.define` method to register our component with the browser. You can use this component in your HTML like this: ```html <reverse-string></reverse-string> ``` [end of text] llama_print_timings: load time = 2865.16 ms llama_print_timings: sample time = 13.64 ms / 442 runs ( 0.03 ms per token, 32407.07 tokens per second) llama_print_timings: prompt eval time = 1281.98 ms / 14 tokens ( 91.57 ms per token, 10.92 tokens per second) llama_print_timings: eval time = 100829.12 ms / 441 runs ( 228.64 ms per token, 4.37 tokens per second) llama_print_timings: total time = 102397.06 ms / 455 tokens Log end
"April 13 12, 2024" contains more than three components required for a date interpretation. Blikied on April 13 12, 2024