This paper introduces BasketVision, a novel benchmark designed to evaluate the capabilities of Multimodal Large Language Models (MLLMs) in understanding complex dynamic systems, specifically within the domain of professional basketball. The authors argue that basketball, with its structured rules, multi-agent interactions, and dynamic spatial and temporal elements, provides an ideal testbed for assessing MLLMs' reasoning abilities. The core contribution of this work is the creation of a large-scale dataset comprising 6,000 curated questions across seven distinct dimensions of perception, reasoning, and prediction. These dimensions encompass tasks such as scene comprehension, object detection, spatial localization, event analysis, context understanding, tracking and trajectory analysis, and reasoning and strategy analysis. The authors have developed an automated data generation pipeline that includes court recognition, perspective transformation, and player tracking, which allows for the creation of spatially-grounded questions at scale. This pipeline is a significant technical contribution, enabling the generation of a large and diverse dataset. The benchmark utilizes both image and video formats, further enhancing its complexity and relevance to real-world scenarios. The authors evaluated 23 state-of-the-art MLLMs on the BasketVision benchmark, revealing a substantial performance gap between human experts and the best-performing model. This gap underscores the limitations of current MLLMs in spatial reasoning and their ability to handle complex dynamic visual environments. The paper's empirical findings highlight the need for further research into developing more robust reasoning capabilities in MLLMs, particularly in the context of dynamic systems. The authors' work is significant because it moves beyond static images and generic video content to assess models in a structured, rule-governed environment, providing a more rigorous evaluation of MLLM capabilities. The benchmark is presented as a valuable resource for the research community, offering a challenging and comprehensive testbed for future advancements in MLLM research. The paper's focus on a specific domain like basketball, while providing a controlled environment, also raises questions about the generalizability of the findings to other complex dynamic systems. Overall, the paper makes a valuable contribution by introducing a novel benchmark that highlights the limitations of current MLLMs in understanding complex dynamic systems, and it provides a solid foundation for future research in this area.