Machine Learning @kbin.social nsa @kbin.social 1y ago

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

decodingtrust.github.io DecodingTrust Benchmark

DecodingTrust is the Adversarial GLUE Benchmark

DecodingTrust is the Adversarial GLUE Benchmark. DecodingTrust aims at providing a thorough assessment of trustworthiness in GPT models.

This research endeavor is designed to help researchers and practitioners better understand the capabilities, limitations, and potential risks involved in deploying these state-of-the-art Large Language Models (LLMs).
This project is organized around the following eight primary perspectives of trustworthiness, including:

Toxicity
Stereotype and bias
Adversarial robustness
Out-of-Distribution Robustness
Privacy
Robustness to Adversarial Demonstrations
Machine Ethics
Fairness

Paper: https://arxiv.org/abs/2306.11698
Repo: https://github.com/AI-secure/DecodingTrust

0 comments