Package Details: leela-zero 0.17-1

Git Clone URL: https://aur.archlinux.org/leela-zero.git (read-only)
Package Base: leela-zero
Description: Go engine with no human-provided knowledge, modeled after the AlphaGo Zero paper.
Upstream URL: https://github.com/leela-zero/leela-zero
Licenses: GPLv3
Submitter: apetresc
Maintainer: apetresc (algebro)
Last Packager: apetresc
Votes: 4
Popularity: 0.008442
First Submitted: 2018-04-25 03:12
Last Updated: 2019-04-04 19:46

Latest Comments

1 2 Next › Last »

apetresc commented on 2019-08-05 17:53

@janwil You will have more luck posting your findings so far in the ticket linked by @Liso; it's likely this is just a bug with Leela-Zero itself, not this particular package of it.

(You might also want to give leela-zero-git a try, just in case whatever the problem is has already been fixed. Leela-Zero is very slow to tag actual releases, so this is not too unlikely)

Hope that helps!

janwil commented on 2019-08-04 14:03

@Liso, I can confirm that my results of 'coredump gdb' and 'where' are similar to yours. What does this mean now and what can I do to fix this? Is there anything I can do in the first place? I am unfortunately not very good at debugging and fixing C++ myself :(

Best regards, Jan

Liso commented on 2019-07-28 17:57

Now I found that it is same as this issue https://github.com/leela-zero/leela-zero/issues/2438

Liso commented on 2019-07-28 15:26

@janwil , it seems I have same problem (in my case AMD Radeon RX 550X).

If I do coredumpctl list I get

Sun 2019-07-28 17:00:46 CEST 19240 1000 1000 11 present /var/tmp/pamac-build-i/leela-zero/src/leela-zero/build/tests

Then you could use "exe" to investigate problem:

coredumpctl gdb /var/tmp/pamac-build-i/leela-zero/src/leela-zero/build/tests

then you could write where and see stack. I got this:

(gdb) where

0 0x0000000000000000 in ?? ()
1 0x00007f90acde36a2 in ?? () from /usr/lib/libMesaOpenCL.so.1
2 0x00007f90acdd4e3f in ?? () from /usr/lib/libMesaOpenCL.so.1
3 0x00007f90acdd5a13 in ?? () from /usr/lib/libMesaOpenCL.so.1
4 0x00007f90acdd6291 in ?? () from /usr/lib/libMesaOpenCL.so.1
5 0x00007f90acdd31a5 in ?? () from /usr/lib/libMesaOpenCL.so.1
6 0x00007f90acdd0cde in ?? () from /usr/lib/libMesaOpenCL.so.1
7 0x000055c2ad0bf72b in cl::CommandQueue::enqueueWriteBuffer (blocking=0, offset=0, events=0x0, event=0x0, ptr=<optimized out>, size=147456, buffer=..., this=<synthetic pointer>)
at /var/tmp/pamac-build-i/leela-zero/src/leela-zero/src/CL/cl2.hpp:7166
8 Tuner<half_float::half>::tune_sgemmabi:cxx11 (this=0x7ffc804f9b50, m=8, n=25, k=8, batch_size=36, runs=<optimized out>)
at /var/tmp/pamac-build-i/leela-zero/src/leela-zero/src/Tuner.cpp:491
9 0x000055c2ad0c0c3f in Tuner<half_float::half>::load_sgemm_tunersabi:cxx11 (this=0x7ffc804f9b50, m=8, n=25, k=8, batch_size=36)
at /usr/include/c++/9.1.0/ext/new_allocator.h:89
10 0x000055c2ad0d6d18 in OpenCL<half_float::half>::initialize (this=0x55c2ae675570, channels=8, batch_size=1) at /var/tmp/pamac-build-i/leela-zero/src/leela-zero/src/Tuner.cpp:722
11 0x000055c2ad0d738e in OpenCLScheduler<half_float::half>::initialize (this=0x55c2ae675490, channels=8) at /usr/include/c++/9.1.0/bits/unique_ptr.h:357
12 0x000055c2ad0eaacb in Network::init_net (this=0x7f90ace86010, channels=8, pipe=...) at /usr/include/c++/9.1.0/bits/unique_ptr.h:357
13 0x000055c2ad0f2c64 in Network::select_precision (this=0x7f90ace86010, channels=8) at /usr/include/c++/9.1.0/bits/move.h:74
14 0x000055c2ad0f35a8 in Network::initialize (this=0x7f90ace86010, playouts=<optimized out>, weightsfile=...) at /var/tmp/pamac-build-i/leela-zero/src/leela-zero/src/Network.cpp:573
15 0x000055c2ad1245a5 in LeelaEnv::SetUp (this=<optimized out>) at /var/tmp/pamac-build-i/leela-zero/src/leela-zero/src/tests/gtests.cpp:87
16 0x000055c2ad1470d1 in testing::internal::UnitTestImpl::RunAllTests() ()
17 0x000055c2ad15255d in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl, bool (testing::internal::UnitTestImpl::)(), char const*) ()
18 0x000055c2ad147562 in testing::UnitTest::Run() ()
19 0x000055c2ad07e8f6 in main () at /usr/include/c++/9.1.0/ext/new_allocator.h:89

(gdb)

Do you have same problem?

janwil commented on 2019-07-23 15:41

@apetresc, did a boot and clean git clone, but makepkg still gives the same error.

How should I debug this?

Best, Jan

apetresc commented on 2019-07-23 15:25

@janwil Hmm, interesting; I'm not sure, as I don't have access to an AMD GPU to test it myself.

One similar problem that is often the culprit - have you upgraded the kernel+headers since your last restart? This sort of thing sometimes occurs in those cases because leela-zero is being compiled against the newly-installed AMD headers, but executing against the old version of the module. If so, just try rebooting and let me know if that helps!

janwil commented on 2019-07-19 18:44

I have AMD RX 480 GPU which seems to be recognised during the test phase with OpenCL support working and everything, but then a test fails:

Started OpenCL SGEMM tuner. Will try 290 valid configurations. /home/janwil/Documents/install/leela-zero/PKGBUILD: line 39: 2209 Segmentation fault (core dumped) ./tests ==> ERROR: A failure occurred in check(). Aborting...

What am I missing?

Thanks in advance, Jan

sfranchi commented on 2019-01-13 16:07

Ah, success at last!

You were right @apetresc, that was the problem. It took me a few iterations of updating kernel/rebooting/reinstalling nvidia-390xx + opencl-nvidia-390xx but it finally passed the test.

I would suggest adding a note to the PKGBUILD recommending users to make sure they have kernel and modules in sync before installing the program

sfranchi commented on 2019-01-13 08:09

@apetresc: Unfortunately that was not the problem, I am getting the same error in the tests phase

Do I have to do anything special to recompile the 390xx driver? I simply rebooted after the updates to the kernel were installed.

apetresc commented on 2019-01-01 19:48

@sfranchi: I've run this PGKBUILD with the nvidia390xx driver before, so I don't think it's that...

The only times I've encountered your error message before, it's always been for the same reason: I'd recompiled my nvidia driver module after upgrading my kernel but before rebooting into that updated kernel. Since the module compiled against the headers of the installed kernel, not the running one, it would fail to load with exactly that error until I rebooted so that the two matched again.

It's a bit of a long shot, but could that be the cause of your issue here?