2024 llama.cpp 7840u: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
I briefly had a Macbook M3 Max with 64GB. It was pretty good at running local LLMs, but couldn't stand the ergonomics and not being able to run Linux, so returned it. | I briefly had a Macbook M3 Max with 64GB. It was pretty good at running local LLMs, but couldn't stand the ergonomics and not being able to run Linux, so returned it. | ||
I picked up a Thinkpad P16s with an AMD | I picked up a Thinkpad P16s with an AMD 7840u to give Linux hardware a chance to catch up with Apple silicon. It's an amazing computer for the price, and can run LLMs. Here's how I set up llama.cpp to use ROCm. | ||
Install ROCm, set an env variable for the 780m: <code>export HSA_OVERRIDE_GFX_VERSION=11.0.0</code> | Install ROCm, set an env variable for the 780m: <code>export HSA_OVERRIDE_GFX_VERSION=11.0.0</code> | ||
Line 107: | Line 107: | ||
llama_print_timings: total time = 102397.06 ms / 455 tokens | llama_print_timings: total time = 102397.06 ms / 455 tokens | ||
Log end | Log end | ||
It's definitely not going to win any speed prizes even though is a smaller model, but it could be ok for non time sensitive results, or where using a tiny, faster model is useful. | |||
{{Blikied|April 13 12, 2024}} | {{Blikied|April 13 12, 2024}} |
Revision as of 14:40, 13 April 2024
I briefly had a Macbook M3 Max with 64GB. It was pretty good at running local LLMs, but couldn't stand the ergonomics and not being able to run Linux, so returned it.
I picked up a Thinkpad P16s with an AMD 7840u to give Linux hardware a chance to catch up with Apple silicon. It's an amazing computer for the price, and can run LLMs. Here's how I set up llama.cpp to use ROCm.
Install ROCm, set an env variable for the 780m: export HSA_OVERRIDE_GFX_VERSION=11.0.0
clone llama.cpp and compile it:
make -j16 LLAMA_HIPBLAS=1 LLAMA_HIP_UMA=1 AMDGPU_TARGETS=gfx1030 DLLAMA_HIP_UMA=
ON
run it like this:
./main -m /home/vid/jan/models/mistral-ins-7b-q4/mistral-7b-instruct-v0.2.Q4_K_M
.gguf -p "example code for a lit Web Component that reverses a string" -n 50 -e
-ngl 16 -n -1
ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, compute capability 11.0, VMM: no llm_load_tensors: ggml ctx size = 0.22 MiB llm_load_tensors: offloading 24 repeating layers to GPU llm_load_tensors: offloaded 24/33 layers to GPU llm_load_tensors: ROCm0 buffer size = 2978.91 MiB llm_load_tensors: CPU buffer size = 4165.37 MiB ............................................................................................... llama_new_context_with_model: n_ctx = 512 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: freq_base = 1000000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: ROCm0 KV buffer size = 48.00 MiB llama_kv_cache_init: ROCm_Host KV buffer size = 16.00 MiB llama_new_context_with_model: KV self size = 64.00 MiB, K (f16): 32.00 MiB, V (f16): 32.00 MiB llama_new_context_with_model: ROCm_Host output buffer size = 0.12 MiB llama_new_context_with_model: ROCm0 compute buffer size = 173.04 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 9.01 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 92 system_info: n_threads = 8 / 16 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | sampling: repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000 top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800 mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000 sampling order: CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1 simple example code for a lit Web Component that reverses a string input Hi there! Here's a simple example of a Lit Web Component that reverses a string input: ```javascript import { Component, html, css } from 'lit'; class ReverseString extends Component { static styles = css` input { padding: 10px; margin-bottom: 20px; } `; static properties = { value: { type: String } }; constructor() { super(); this.value = ; } render() { return html` <style>${this.constructor.styles}</style> <input @input=${this._handleInputChange} value=${this.value} type="text"> < p>Reversed string: ${this._reverseString(this.value)}< /p> `; } _reverseString(str) { return str.split().reverse().join(); } _handleInputChange(event) { this.value = event.target.value; } } customElements.define('reverse-string', ReverseString); ``` In this example, we define a custom Web Component called `reverse-string` that uses Lit for rendering and handling user input. The `ReverseString` class defines a render method that returns an HTML template with an input field and a paragraph that displays the reversed string. The component also defines a `_reverseString` method that reverses a given string using the `split`, `reverse`, and `join` array methods, and a `_handleInputChange` method that updates the component's value whenever the input changes. Finally, we use the `customElements.define` method to register our component with the browser. You can use this component in your HTML like this: ```html <reverse-string></reverse-string> ``` [end of text] llama_print_timings: load time = 2865.16 ms llama_print_timings: sample time = 13.64 ms / 442 runs ( 0.03 ms per token, 32407.07 tokens per second) llama_print_timings: prompt eval time = 1281.98 ms / 14 tokens ( 91.57 ms per token, 10.92 tokens per second) llama_print_timings: eval time = 100829.12 ms / 441 runs ( 228.64 ms per token, 4.37 tokens per second) llama_print_timings: total time = 102397.06 ms / 455 tokens Log end
It's definitely not going to win any speed prizes even though is a smaller model, but it could be ok for non time sensitive results, or where using a tiny, faster model is useful.
Toronto 43° 39' 12.53" N, 79° 23' 2.16" W Arts Dance Person Asian Origin
Lata Pada is a Canadian choreographer and Bharatanatyam dancer of Indian descent. Pada is the Founder and Artistic Director of Sampradaya Dance Creations, a dance Company that performs South Asian dance. She is also the Founder and Director of Sampradaya Dance Academy, a leading professional dance training institution that is the only South Asian dance school in North America affiliated with the prestigious, UK-based Imperial Society for Teachers of Dancing.Pada founded the dance company in 1990 because she wanted to showcase Bharatantyam dance as an art form throughout the world.
Pada, who attended Elphinstone College in Mumbai, trained under the gurus Kalaimamani Kalyanasundaram and Padmabhushan Kalanidhi Narayanan.Pada lives in Mississauga, near Toronto. Pada married geologist Vishnu Pada when she was 17 years old.
In 1985 Lata Pada and her family decided to take an extended vacation to India. On June 23 of that year Vishnu Pada and daughters Arti and Brinda died in the bombing of Air India Flight 182. Lata Pada was not aboard since she left on an earlier date to tour India for Bharatanatyam recitals in Bangalore and across India; Lata was in Mumbai rehearsing for her tour, while her husband and daughters stayed behind in Sudbury, Ontario because Brinda was graduating from high school; afterwards the three flew on Air India 182. Lata Pada became a spokesperson for the families of the victims. After the crash she created the dance piece "Revealed By Fire" in remembrance of the incident. Pada received a master's degree in fine arts from York University in 1997.
Pada married Hari Venkatacharya in September, 2000. Venkatacharya is an entrepreneur and was Managing Director of Nytric Business Partners and is the Immediate Past President of TiE Toronto. He also serves on the Boards of the Ontario Science Centre and Fields Institute for Research in Mathematical Sciences. They both met while founding the South Asian advisory committee at the Royal Ontario Museum in 1995, where they raised over $3 million Canadian dollars for Canada's first permanent South Asian Gallery.
In December 2008, she was made a Member of the Order of Canada for her contributions to the development of Bharatanatyam as a choreographer, teacher, dancer and artistic director, as well as for her commitment and support of the Indian community in Canada. Lata was also recently appointed as Adjunct Professor in the Graduate Faculty of Dance, York University, Toronto.