Ensuring the success of any programming language hinges on its ability to facilitate code sharing and result reproduction. This article delves into the intricacies of the MathWorks ecosystem, shedding light on the challenges it poses in these regards, along with viable solutions. Code Reproducibility is a fundamental part of any large ecosystem.
Sharing MATLAB Code: Navigating the Complexity
Imagine crafting a simple MATLAB script encompassing three lines of code, and desiring to share it. The conventional approach might involve sending an email with an attached file or uploading it to a GitHub repository for collaboration. However, this seemingly straightforward process involves several pivotal considerations.
t = 0:1/1e3:2; y = chirp(t,0,1,250); eval("normaldata = normr"+"nd(0,1,100,1);")
Version of MATLAB
With biannual MATLAB updates, compatibility issues arise. A script containing functions that are either too advanced for older versions or unsupported in newer versions can obstruct smooth execution.
Best Practice #1: Explicitly specify the MATLAB version used, e.g., R2023a.
While sharing code within a uniform institutional environment might be uncomplicated, different licensing scenarios complicate matters. Ensuring your code runs across varying MathWorks licenses is essential.
Best Practice #3: Clearly define the necessary toolboxes for code reproduction, like MATLAB, Signal Processing Toolbox, and Statistics and Machine Learning Toolbox for the example above.
It important to understand that MathWorks needs 2 things for a function to execute:
- the file or *.m and its dependencies need to be installed on your system / path
- the user must be licensed to be able to use that feature
MathWorks has tools (see matlab.codetools.requiredFilesAndProducts) help understand which toolboxes are needed to execute a file:
[fList,pList] = matlab.codetools.requiredFilesAndProducts('examplefile.m') struct2table(pList)
Returns a table that shows the following:
This does not help, as it turns out that sometimes MATLAB has difficulty guaranteeing which toolbox is required, notice the Certain column. Trying other commands like
butter does identify correctly that the Signal Processing Toolbox is needed. The issue here is the command
chirp even though part of Signal Processing Toolbox, it needs the toolbox to be installed but it turns out that MATLAB does not actually check out the license when running it, but does check if you have the file. If you ran the file in a docker container with only MATLAB and Statistics and Machine Learning Toolbox it would error saying you need the Signal Processing Toolbox for chirp to work.
Best Practice: #4 when installing MATLAB across organisations install all the needed toolboxes independently of the license access.
The example above is also somewhat artificial as
eval is not a best practice, and joining strings is to really make sure that MATLAB cannot find the Statistics and Machine Learning Toolbox.
Best Practice #5 : to understand and be sure which licenses are checked out, close MATLAB, open MATLAB, run your tests and use
For more advanced reporting of which tools are used look at Toolbox Usage Analyzer.
Dependent External Tools
Organizations often develop internal tools to enhance productivity, which can lead to external dependency issues when sharing code.
Best Practice #6: Leverage the Dependency Analyzer App.
Best Practice #7: Consider publishing your toolbox/app on File Exchange with GitHub configured with GitHub actions for automated testing.
External Data Dependency
Some scripts necessitate specific data files or cloud access, making data sharing an integral part of code reproducibility.
Best Practice #6: Write how to access data in a
README.md or Getting Started document.
Best Practice #7: Implement a MATLAB Project structure with startup.m for data access validation.
Integrating C/C++ code into MATLAB projects requires compiling the source code for end-user execution.
Best Practice #8: Pre-compile and distribute the compiled source code or automate compilation through startup.m in projects.
Random Number Generation
Handling random number generation inconsistently can lead to variations in results when sharing code.
Best Practice #9: Set random number generator (rng) seeds in tests to ensure consistent outcomes.
Achieving code reproducibility in the MathWorks ecosystem necessitates a comprehensive approach. By addressing version compatibility, toolbox requirements, external dependencies, data sharing, compilation, and random number generation, MATLAB users can enhance collaboration and ensure consistent results across various environments and users.