2024 llama.cpp 7840u: Difference between revisions

From zooid Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
{{#setmainimage:https://wiki.zooid.org/images/c/c5/Ipectinstat1-sq-trans2.png|thumb|right}}
I briefly had a Macbook M3 Max with 64GB. It was pretty good at running local LLMs, but couldn't stand the ergonomics and not being able to run Linux, so returned it.
I briefly had a Macbook M3 Max with 64GB. It was pretty good at running local LLMs, but couldn't stand the ergonomics and not being able to run Linux, so returned it.



Revision as of 17:25, 14 April 2024

https://wiki.zooid.org/images/c/c5/Ipectinstat1-sq-trans2.png

I briefly had a Macbook M3 Max with 64GB. It was pretty good at running local LLMs, but couldn't stand the ergonomics and not being able to run Linux, so returned it.

I picked up a Thinkpad P16s with an AMD 7840u to give Linux hardware a chance to catch up with Apple silicon. It's an amazing computer for the price, and can run LLMs. Here's how I set up llama.cpp to use ROCm.

Install ROCm, set an env variable for the 780m: export HSA_OVERRIDE_GFX_VERSION=11.0.0

clone llama.cpp and compile it:

make -j16 LLAMA_HIPBLAS=1 LLAMA_HIP_UMA=1 AMDGPU_TARGETS=gfx1030 DLLAMA_HIP_UMA= ON

run it like this:

./main -m /home/vid/jan/models/mistral-ins-7b-q4/mistral-7b-instruct-v0.2.Q4_K_M .gguf -p "example code for a lit Web Component that reverses a string" -n 50 -e -ngl 33 -n -1

ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, compute capability 11.0, VMM: no
llm_load_tensors: ggml ctx size =    0.22 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:      ROCm0 buffer size =  4095.05 MiB
llm_load_tensors:        CPU buffer size =    70.31 MiB
..............................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =    64.00 MiB
llama_new_context_with_model: KV self size  =   64.00 MiB, K (f16):   32.00 MiB, V (f16):   32.00 MiB
llama_new_context_with_model:  ROCm_Host  output buffer size =     0.12 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =    81.00 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =     9.01 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 2

system_info: n_threads = 8 / 16 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
sampling:
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1


 simple example code for a lit Web Component that reverses a string

```javascript
import { LitElement, html } from 'lit';
import { customElement, property } from 'lit/decorators.js';

@customElement('reverse-string')
class ReverseString extends LitElement {

  static styles = css`
    :host {
      display: block;
    }
  `;

  @property({ type: String }) input = ;

  render() {
    return html`
      <input type="text" value=${this.input} @input=${this._handleInput} />
      <button @click=${this._reverse}>Reverse</button>
      < p>${this._reversed}< /p>
    `;
  }

  _handleInput(event) {
    this.input = event.target.value;
  }

  _reverse() {
    this._reversed = this.input.split().reverse().join();
  }

  @property private _reversed = ;
}
```

```css
:host {
  display: block;
}
```

This is a simple example of a Lit Web Component that reverses a string. The component has an input field for the user to enter a string, and a button to reverse the string when clicked. The reversed string is displayed below the button.

The component uses the Lit library to define the custom element, and uses the `@customElement` decorator to define the element's name as 'reverse-string'. The `@property` decorator is used to define the input property, and the `static styles` property is used to define the component's styles.

In the `render` method, the input field and button are created using template literals, and the reversed string is displayed using a reactive property `_reversed`.

The input field's value is updated in the `_handleInput` method when the user types in the field, and the string is reversed in the `_reverse` method when the button is clicked. The reversed string is then assigned to the `_reversed` property, which updates the displayed string. [end of text]
 
llama_print_timings:        load time =    2488.76 ms
llama_print_timings:      sample time =      24.92 ms /   485 runs   (    0.05 ms per token, 19462.28 tokens per second)
llama_print_timings: prompt eval time =     576.45 ms /    14 tokens (   41.18 ms per token,    24.29 tokens per second)
llama_print_timings:        eval time =   38985.11 ms /   484 runs   (   80.55 ms per token,    12.41 tokens per second)
llama_print_timings:       total time =   39890.17 ms /   498 tokens
Log end


It's definitely not going to win any speed prizes even though is a smaller model, but it could be ok for non time sensitive results, or where using a tiny, faster model is useful.



RSS

Blikied on April 13, 2024