Serving the Quantitative Finance Community

 
User avatar
jasonbell
Topic Author
Posts: 310
Joined: May 6th, 2022, 4:16 pm
Location: Limavady, NI, UK
Contact:

Introducing an AI Bill of Materials for model training proof.

October 19th, 2023, 1:55 pm

In a conversation with James Governor, co founder and analyst at Redmonk, we were talking about proof of what's in a machine learning model. The concept of an AI bill of materials was formed, as regulation creeps ever so close towards us. 

Within the BOM I was hoping to have the following covered in a machine readable format, settling on a custom like Bibtex implementation. Easy enough for someone to read but also parsable by software if needed. 
  • Transparency: Providing clarity on the tools, hardware, data sources, and methodologies used in the development of AI systems.
  • Reproducibility: Offering enough information for researchers and developers to reproduce the models and results.
  • Accountability: Ensuring creators and users of AI systems are aware of their origins, components, and performance metrics.
  • Ethical and Responsible AI: Encouraging the documentation of training data sources, including any synthetic data used, to ensure there's knowledge about potential biases, limitations, or ethical considerations.
A sample bibtex output is in the repository: https://github.com/jasebell/ai-bill-of- ... sample.bib

I wanted to mention it here as with the realm of finance I think it's something to discuss. In preparing for my talk Data is Business, Business is Data: The 2023 AI Redux, which is this Saturday at the Northern Ireland Developers Conference (https://nidevconf.com). I was experimenting with some basic multiple linear regression and finding 0.2% differences in accuracy scores depending if I used R or Python. 

As more and more models are created (look how many Web3 experts became AI experts all of a sudden!), it's becoming more and more important to trace lineage back to the model development. If the EU pass any AI regulation then this kind of thing will be important going forward. 

The original article is here:
https://redmonk.com/jgovernor/2023/10/1 ... materials/

The core repository of the AI Bill of Materials:
https://github.com/jasebell/ai-bill-of-materials
Last edited by jasonbell on October 19th, 2023, 2:01 pm, edited 1 time in total.
Linkedin: https://www.linkedin.com/in/jasonbelldata/
Author of Machine Learning: Hands on for Developers and Technical Professionals (Wiley).
Contributor: Machine Learning in the City (Wiley).
 
User avatar
jasonbell
Topic Author
Posts: 310
Joined: May 6th, 2022, 4:16 pm
Location: Limavady, NI, UK
Contact:

Re: Introducing an AI Bill of Materials for model training proof.

October 19th, 2023, 1:58 pm

The forum software's parse is making the bibtex a bit of a mess :) 
Linkedin: https://www.linkedin.com/in/jasonbelldata/
Author of Machine Learning: Hands on for Developers and Technical Professionals (Wiley).
Contributor: Machine Learning in the City (Wiley).
 
User avatar
katastrofa
Posts: 7929
Joined: August 16th, 2007, 5:36 am
Location: Event Horizon

Re: Introducing an AI Bill of Materials for model training proof.

October 20th, 2023, 9:09 am

There is a concept (and research area in the AI field) called FAT/ML (Fairness, Accountability, and Transparency in Machine Learning), which sounds similar.
I doubt the transparency will ever be go so far as to providing all information necessary to reproduce the code and results. In some domains, especially where computationally heavy models are deployed in high-stake applications, they promote the perspective that “code is a model”, ie source code is treated as a form of data. I guess it inspired projects as code2vec. They are supposed to help understand and search the code.
 
User avatar
jasonbell
Topic Author
Posts: 310
Joined: May 6th, 2022, 4:16 pm
Location: Limavady, NI, UK
Contact:

Re: Introducing an AI Bill of Materials for model training proof.

October 20th, 2023, 2:16 pm

Thank you for highlighting that to me @katastrofa. I was not aware of this initiative, feels a bit short on projects though. I know HuggingFace has a model description yaml file type but it's down to the user/owner to keep it relevant and updated. 

I'll certainly dig deeper into this. Thanks. 
Linkedin: https://www.linkedin.com/in/jasonbelldata/
Author of Machine Learning: Hands on for Developers and Technical Professionals (Wiley).
Contributor: Machine Learning in the City (Wiley).