Fix the issue that joining a detached thread might result in joining hang,
resolve the issue by adding wait_count for a thread's exec_env to indicate
whether a thread needs to detach itself or not when it exits.
And add checks for the input exec_env for cluster's join/detach/cancel thread.