The AutoML Conference Ethics Review Process

As Machine Learning (ML) research and applications have increasing real-world impact, the likelihood of meaningful social benefit increases, as does the attendant risk of harm. Indeed, problems with data privacy, algorithmic bias, automation risk, and potential malicious uses of AI have been well-documented e.g. by [Whittlestone et al. 2019].

In the light of these findings, ML researchers can no longer assume that their research will have a net positive impact on the world [Hecht et al., 2018]. The research community should consider not only the potential benefits but also the potential negative societial consequences of ML research, and adopt measures that enable positive trajectories to unfold while mitigating the risk of harm. Hence, we expect authors to discuss such ethical and societal consequences of their work in their papers, while avoiding excessive speculation.

The AutoML Conference template provides a section for the broader impact statement. This document should be used by both authors and reviewers (including normal reviewers and ethics reviewers) in order to get on the same page about the AutoML Conference’s ethics principles. The primary goal for reviewers should be to provide critical feedback for the authors to incorporate into their paper. We do not expect this feedback to be the deciding factor for acceptance in most cases, though papers could be rejected if substantial concerns are raised that the authors are not able to address adequately.

There are two aspects of ethics we consider: potential negative societal impacts (Section 2.1) and general ethical conduct in research (Section 2.2). Both sections provide authors and reviewers with prompts to reflect on a submission’s possible harms. The broader impact statement of a paper does not need to answer these exact questions or all of them, but it should address both categories of ethics adequately.

J. Whittlestone, R. Nyrup, A. Alexandrova, K. Dihal, and S. Cave. (2019) Ethical and societal implications of algorithms, data, and artificial intelligence: a roadmap for research. London: Nuffield Foundation.

B. Hecht, L. Wilcox, J. P. Bigham, J. Schoning, E. Hoque, J. Ernst, Y. Bisk, L. De Russis, L. Yarosh, B. Anjam, D. Contractor, and C. Wu. (2018) It’s Time to Do Something: Mitigating the Negative Impacts of Computing Through a Change to the Peer Review Process. ACM Future of Computing Blog.

1. Potential Negative Societal Impacts

Submissions to the AutoML Conference are expected to include a discussion about potential negative societal implications of the proposed research artifact or application. (This corresponds to question 1c of the Reproducibility Checklist). Whenever these are identified, submissions should also include a discussion about how these risks can be mitigated.

Grappling with ethics is a difficult problem for the field, and thinking about ethics is still relatively new to many authors. Given its controversial nature, we choose to place a strong emphasis on transparency. In certain cases, it will not be possible to draw a bright line between ethical and unethical. A paper should therefore discuss any potential issues, welcoming a broader discussion that engages the whole community.

A common difficulty with assessing ethical impact is its indirectness: most papers focus on general-purpose methodologies (e.g., face-recognition methodologies), whereas ethical concerns are more apparent when considering deployed applications (e.g., surveillance systems). Also, real-world impact (both positive and negative) often emerges from the cumulative progress of many papers, so it is difficult to attribute the impact to an individual paper.

The ethics consequences of a paper can stem from either the methodology or the application. On the methodology side, for example, a new adversarial attack might give unbalanced power to malicious entities; in this case, defenses and other mitigation strategies would be expected, as is standard in computer security. On the application side, in some cases, the choice of application is incidental to the core contribution of the paper, and a potentially harmful application should be swapped out (as an extreme example, replacing ethnicity classification with bird classification), but the potential mis-uses should be still noted. In other cases, the core contribution might be inseparable from a questionable application (e.g., reconstructing a face given speech). In such cases, one should critically examine whether the scientific (and ethical) merits really outweigh the potential ethical harms.

A non-exhaustive list of potential negative societal impacts is included below. Consider whether the proposed methods and applications can:

Directly facilitate injury to living beings. For example: could it be integrated into weapons or weapon systems?
Raise safety or security concerns. For example: is there a risk that applications could cause serious accidents or open security vulnerabilities when deployed in real-world environments?
Raise human rights concerns. For example: could the technology be used to discriminate, exclude, or otherwise negatively impact people, including impacts on the provision of vital services, such as healthcare and education, or limit access to opportunities like employment? Please consult the Toronto Declaration for further details.
Have a detrimental effect on people’s livelihood or economic security. For example: Have a detrimental effect on people’s autonomy, dignity, or privacy at work, or threaten their economic security (e.g., via automation or disrupting an industry)? Could it be used to increase worker surveillance, or impose conditions that present a risk to the health and safety of employees?
Develop or extend harmful forms of surveillance. For example: could it be used to collect or analyze bulk surveillance data to predict immigration status or other protected categories, or be used in any kind of criminal profiling?
Severely damage the environment. For example: would the application incentivize significant environmental harms such as deforestation, fossil fuel extraction, or pollution?
Deceive people in ways that cause harm. For example: could the approach be used to facilitate deceptive interactions that would cause harms such as theft, fraud, or harassment? Could it be used to impersonate public figures to influence political processes, or as a tool of hate speech or abuse?

2. General Ethical Conduct in Research

Submissions must adhere to ethical standards for responsible research practice and due diligence in the conduct.

If the research uses human-derived data, consider whether that data might:

Contain any personally identifiable information or sensitive personally identifiable information. For instance, does the dataset use features or label information about individual names? Did people provide their consent on the collection of such data? Could the use of the data be degrading or embarrassing for some people?
Contain information that could be deduced about individuals that they have not consented to share. For instance, a dataset for recommender systems could inadvertently disclose user information such as their name, depending on the features provided.
Encode, contain, or potentially exacerbate bias against people of a certain gender, race, sexuality, age, nationality or who have other protected characteristics. For instance, does the dataset represent the diversity of the community where the approach is intended to be deployed? Does the dataset implicitly encode protected attributes through an individual’s residential address, languages spoken, occupation etc?
Contain human subject experimentation and whether it has been reviewed and approved by a relevant oversight board. For instance, studies predicting characteristics (e.g., health status) from human data (e.g., contacts with people infected by COVID-19) are expected to have their studies reviewed by an ethical board.
Have been discredited by the creators. For instance, the DukeMTMC-ReID dataset has been taken down and it should not be used in AutoML Conference submissions. Similarly the following datsets are deprecated by their creators:
- Full version of LAION-5B
- Versions of ImageNet 21k from prior to Dec 2019
- 80 Million Tiny Images
- MS-Celeb-IM
- eDuke MTMC
- Brainwash
- MegaFace
- HRT-Transgender

In general, there are other issues related to data that are worthy of consideration and review. These include:

Consent to use or share the data. Explain whether you have asked the data owner’s permission to use or share data and what the outcome was. Even if you did not receive consent, explain why this might be appropriate from an ethical standpoint. For instance, if the data was collected from a public forum, were its users asked consent to use the data they produced, and if not, why?
Domain specific considerations when working with high-risk groups. For example, if the research involves work with minors or vulnerable adults, have the relevant safeguards been put in place?
Filtering of offensive content. For instance, when collecting a dataset, how are the authors filtering offensive content such as racist language or violent imagery?
Compliance with GDPR and other data-related regulations. For instance, if the authors collect human-derived data, what is the mechanism to guarantee individuals’ right to be forgotten (removed from the dataset)?

This list is not intended to be exhaustive — it is included here as a prompt for author and reviewer reflection.