OpenAI has recently introduced an array of new APIs during its inaugural developer day. Among these offerings, DALL-E 3, which is OpenAI’s text-to-image model, is now accessible through an API. This availability follows its initial integration with ChatGPT and Bing Chat. Similar to its predecessor, the DALL-E 3 API comes with integrated moderation features designed to prevent misuse.
Furthermore, the DALL-E 3 API provides a variety of format and quality choices, with pricing commencing at $0.04 per generated image.
Additionally, OpenAI has introduced a text-to-speech API, presenting users with a selection of six predefined voices and two generative AI model variations. This API is now available, with pricing beginning at $0.015 per 1,000 characters of input.
During the announcement, Sam Altman, OpenAI’s spokesperson, emphasized the naturalness of this technology, stating, “This is much more natural than anything else we’ve heard out there, which can make apps more natural to interact with and more accessible. Furthermore, this technology opens the door to a wide range of applications, including language learning and voice assistance.”
In a related development, OpenAI has unveiled the latest iteration of its open-source automatic speech recognition model, Whisper large-v3. The company asserts that this version exhibits enhanced performance across multiple languages.