ThreadTest sometimes fails
- Found in: 1.18
CategoryFixed: fixed in 744 for release 1.19
There may be some thread safety issues with PlashGlibc. ThreadTest sometimes fails:
======================================================================
ERROR: tests_thread_test_py.ThreadTest.test_parallel
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/thread_test.py", line 89, in test_parallel
self._run_test("test_parallel")
File "tests/thread_test.py", line 83, in _run_test
plash.process_test.check_exit_status(status)
File "/work/plash/python/plash/process_test.py", line 41, in check_exit_status
raise ProcessExitError("Status %i" % status)
ProcessExitError: Status 139
This occurred once:
*** glibc detected *** /tmp/tmpXCQ_U0/test-case: double free or corruption (out): 0x00002aaaac004ee0 ***
fstat(fd, &st): No such file or directory
fd >= 0: Function not implemented
E....
======================================================================
ERROR: tests_thread_test_py.ThreadTest.test_parallel
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/thread_test.py", line 89, in test_parallel
self._run_test("test_parallel")
File "tests/thread_test.py", line 83, in _run_test
plash.process_test.check_exit_status(status)
File "/work/plash/python/plash/process_test.py", line 39, in check_exit_status
os.WEXITSTATUS(status))
ProcessExitError: Process exited with status 1
That said, the non-PlashGlibc version of this test has failed occasionally, so there may be a bug in the test case too:
fd >= 0: File exists
F...
======================================================================
FAIL: tests_thread_test_py.ThreadTest.test_parallel_native
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests/thread_test.py", line 65, in test_parallel_native
assert rc == 0, rc
AssertionError: 1
Problems found:
thread-test.c increments a counter in a non-thread-safe way. This explains failures such as printing unlink(socket_filename): No such file or directory. It explains why the non-PlashGlibc version of the test fails.
Wrap use of the counter in a lock.
- dup2() and close() call plash_init() without the lock held. This can cause two threads to clash at start-up time. This explains why failure doesn't appear to be made more likely by running the tests for longer (by increasing the number of loop iterations). Running thread-test.c more times should increase the chance of failure.
- dup2() and close() also inspect the FD table without the lock held.
- fork() and execve() do not use the lock, although they are not covered by the thread test case.
- The req_and_reply() helper function and its variants in libc-comms.c also call plash_init() without the lock held.
Tasks:
Fix dup2() and close()
Fix fork()
Fix execve()
Fix req_and_reply() and variants
