Post by simranratry20244 on Feb 12, 2024 7:52:24 GMT 1
With all this, it seems that it is a good idea to invest in getting a good set of data before launching into an AI project and, specifically, machine learning. In this sense, a first measure is to have an appropriate methodology, such as the CRISP-DM methodology , a good example because it is easy to explain to those interested in the project with responsibilities for the business.
Regardless of the methodology chosen, it is almost Colombia Telemarketing Data mandatory to carry out a first phase 0 data audit . The objective is to increase the success of the project from the beginning, since work does not begin until it is ensured that there is quality material. Cassie Kozyrkov goes further in this Medium article : If we all understand that designing quality data sets is non-trivial, shouldn't we have a specific position within the company tasked with designing, collecting, governing, documenting and preserving high quality? in the organization's data sets?
It is good news that boards of directors are beginning to incorporate the figure of the chief data officer (CDO ), a strategic position that is responsible for their management and exploitation. However, Kozyrkov complains that, at the most operational levels, these tasks end up being everyone's responsibility, which is like saying that they are no one's responsibility. It must be recognized that the importance of data in AI projects requires going further and not remaining in clichés such as "data is the new oil." These commonplaces must be translated into effective data governance policies on which to build projects that have impact and are reliable.
However, any improvements or bug fixes require retraining the models with more examples, which can be a costly process in time, hardware, and people. According to Stanford University's AI Index 2023 report , the cost of training a language model such as GPT-3 or Megatron-Turing is between one million and ten million dollars, and requires several weeks to complete. In the case of GPT-4, this process lasted up to six months, since it required an iterative process to adjust the system's responses that was quite intensive in human intervention . These experts create and select valid questions and answers and then evaluate the output of the language model when it operates in an unassisted manner.