Introducing an AI Bill of Materials for model training proof.

jasonbell · October 19th, 2023, 1:55 pm

In a conversation with James Governor, co founder and analyst at Redmonk, we were talking about proof of what's in a machine learning model. The concept of an AI bill of materials was formed, as regulation creeps ever so close towards us.

Within the BOM I was hoping to have the following covered in a machine readable format, settling on a custom like Bibtex implementation. Easy enough for someone to read but also parsable by software if needed.

Transparency: Providing clarity on the tools, hardware, data sources, and methodologies used in the development of AI systems.
Reproducibility: Offering enough information for researchers and developers to reproduce the models and results.
Accountability: Ensuring creators and users of AI systems are aware of their origins, components, and performance metrics.
Ethical and Responsible AI: Encouraging the documentation of training data sources, including any synthetic data used, to ensure there's knowledge about potential biases, limitations, or ethical considerations.

A sample bibtex output is in the repository: https://github.com/jasebell/ai-bill-of- ... sample.bib

I wanted to mention it here as with the realm of finance I think it's something to discuss. In preparing for my talk Data is Business, Business is Data: The 2023 AI Redux, which is this Saturday at the Northern Ireland Developers Conference (https://nidevconf.com). I was experimenting with some basic multiple linear regression and finding 0.2% differences in accuracy scores depending if I used R or Python.

As more and more models are created (look how many Web3 experts became AI experts all of a sudden!), it's becoming more and more important to trace lineage back to the model development. If the EU pass any AI regulation then this kind of thing will be important going forward.

The original article is here:
https://redmonk.com/jgovernor/2023/10/1 ... materials/

The core repository of the AI Bill of Materials:
https://github.com/jasebell/ai-bill-of-materials

jasonbell · October 19th, 2023, 1:58 pm

The forum software's parse is making the bibtex a bit of a mess

katastrofa · October 20th, 2023, 9:09 am

There is a concept (and research area in the AI field) called FAT/ML (Fairness, Accountability, and Transparency in Machine Learning), which sounds similar.
I doubt the transparency will ever be go so far as to providing all information necessary to reproduce the code and results. In some domains, especially where computationally heavy models are deployed in high-stake applications, they promote the perspective that “code is a model”, ie source code is treated as a form of data. I guess it inspired projects as code2vec. They are supposed to help understand and search the code.

jasonbell · October 20th, 2023, 2:16 pm

Thank you for highlighting that to me @katastrofa. I was not aware of this initiative, feels a bit short on projects though. I know HuggingFace has a model description yaml file type but it's down to the user/owner to keep it relevant and updated.

I'll certainly dig deeper into this. Thanks.

Introducing an AI Bill of Materials for model training proof.

Introducing an AI Bill of Materials for model training proof.

Re: Introducing an AI Bill of Materials for model training proof.

Re: Introducing an AI Bill of Materials for model training proof.

Re: Introducing an AI Bill of Materials for model training proof.