Fix the issue that joining a detached thread might result in joining hang, resolve the issue by adding wait_count for a thread's exec_env to indicate whether a thread needs to detach itself or not when it exits. And add checks for the input exec_env for cluster's join/detach/cancel thread.