notesum.ai
Published at November 28Perspective of Software Engineering Researchers on Machine Learning Practices Regarding Research, Review, and Education
cs.SE
cs.LG
Released Date: November 28, 2024
Authors: Anamaria Mojica-Hanke1, David Nader Palacio2, Denys Poshyvanyk2, Mario Linares-Vásquez3, Steffen Herbold1
Aff.: 1University of Passau, Germany; 2William & Mary, USA; 3Universidad de los Andes, Colombia
| ML Pipeline Stage | Description of the ML Pipeline Stages by Amershi et al. (Amershi et al., 2019) |
|---|---|
| Model Requirements | Designers decide which features are feasible to implement with machine learning and which can be useful for a given existing product or for a new one. |
| Data Collection | Teams look for and integrate available datasets (e.g., internal or open source) or collect their own. |
| Data Cleaning | Involves removing inaccurate or noisy records from the dataset, a common activity to all forms of data science. |
| Data Labeling | Assigns ground truth labels to each record. |
| Feature Engineering | Refers to all activities that are performed to extract and select informative features for machine learning models. |
| Model Training | The chosen models (using the selected features) are trained and tuned on the clean, collected data and their respective labels. |