A.I.
Are you curious about BTS singing with a female voice?

This is possible if you use a ‘voice converter’



ㆍThe hot debut Super Rookie of HYBE, which BTS belongs to, is an AI collaborative singer.

‘Supertone’ is a South Korea startup that uses vocal synthesis technology to create an AI that sings like a human.

ㆍThey has established three principles to ensure copyright compliance in AI speech synthesis.




The next Super Rookie for BTS?

HYBE surpasses K-pop conventions with ‘MIDNATT’, the future of BTS. MIDNATT fluently speaks six languages, possesses a vocal range that spans both male and female registers, and is a musical prodigy. Let’s give it a listen.


MIDNATT ‘Masquerade’ Official MV


In fact, the super rookie was a result of HYBE and Supertone’s AI music project. The MIDNATT project used AI technology to partially modify the voice of HYBE artist ‘Lee Hyun’.


Supertone utilized its advanced multilingual pronunciation correction technology to translate the song into six different languages – Korean, English, Spanish, Japanese, Chinese, and Vietnamese. The technology extracted the timbre, pitch, and stress from the artist’s voice, and then used elaborate pronunciation from native speakers’ voices to synthesize the translations. The voice designing technology was employed to extract voices of different genders based on the artist’s vocals. The song also features backing vocals and choruses that transformed Lee Hyun’s gravelly voice into a female voice. When asked about working with Korea’s first AI on a record, he expressed amazement and surprise at seeing his voice transformed into that of a woman.


The song MIDNATT is unique in that it has no language or gender barrier. Although sung by a human, AI technology has been used as an assistant, which also avoids the controversial issue of whether AI can be copyrighted. By using Supertone’s AI technology, HYBE has opened up new possibilities for K-pop, a genre that has been criticized for its difficulty in reaching a global audience due to language barriers. With Supertone’s voice conversion technology, we may even see BTS singing in female voices, and future albums may feature male-female duets based on the members’ voices.




Beyond the uncanny valley

Supertone’s AI has not sounding like a machine. Siri, on the other hand, even though it speaks in human-like sentences, can be easily recognized as a machine due to its awkward breaks and stiff voice. Supertone’s AI, on the contrary, sings with human emotion, making it sound more natural and reducing the feeling of unfamiliarity and unpleasantness to the listener as much as possible. Hee Doo Choi, the COO, stated that AI singers have surpassed the uncanny valley.


Supertone tasked the AI with three objectives to overcome the issue of the uncanny valley. Firstly, it had to learn the lyrics to accurately pronounce them. Secondly, it had to learn the audio information to replicate the singer’s voice. Lastly, Supertone aimed to improve the AI’s ability to understand human emotions. To achieve this, the AI was trained to read musical scores that incorporate phonetic elements like note lengths and pitches that vary based on emotion. After completing these objectives, Supertone’s AI can now be utilized to its full potential.




Key technologies

SVS stands for Singing Voice Synthesis. It’s a technology developed by Supertone that synthesizes lyrics, melodies, and beats into a song. Unlike TTS(Text To Speech) used in ordinary voice synthesis technology, which can only speak when you type text, SVS can isolate the unique diction and timbre of a voice for music. This means that with SVS, you can create a song that sounds like it’s sung by a specific person, even if they didn’t actually sing it. For example, you can create a song with the voice of Bruno Mars that sounds like Billy Eilish is singing it.


Supertone’s AI has the unique ability to learn how to sing like a human, even with limited data. It achieves this through a technique called ‘transfer learning’, which reuses an existing model that was trained on rich data to build a model in a field with scarce training data. Supertone used this method to create a singing AI by pre-training its base model with 1,000 music files. The AI made headlines on a South Korean TV program for perfectly recreating the voice of the late singer Kim Kwang-seok. What’s impressive is that it only required 18 minutes of data and nine songs, compared to other AI speaker services that require over 40 hours of voice data to respond in celebrity voices.


Supertone CLEAR separates the audio into three channels: ambience, voice, and voice reverb.




Copyright rules

Supertone, an AI voice synthesis technology, has reached a level of maturity that enables it to be used in K-content. In the Netflix Original Series ‘Mask Girl’, the main character used Supertone’s technology to create a virtual voice for a third character. Additionally, the technology was also utilized in a TV show that featured AI singers competing against their human counterparts. However, there is a debate about the legality of using AI to create a virtual human voice for content creation. In an effort to resolve the copyright issue, Supertone made a promise to do the following:


1. Do not create any content without obtaining permission from the copyright holder. In the case of deceased individuals, it is important to get permission from their family members or legal representatives.

2. Develop an AI system that can identify if any technology has been used. For instance, create an AI system that can differentiate between cloned and synthesized voices. The ‘police AI’ currently being developed is more than 90% accurate on samples generated by Supertone.

3. Ensure that technology is not misused by monitoring it 24/7. The cloud system stores Supertone’s data and detects any unauthorized access by outsiders. This helps in preventing any unauthorized use of Supertone’s work.




Imagine a voice converter

Have you ever wished you could change your voice to sound like someone else’s? Imagine a ’voice converter’ necklace that you could wear. This wearable device would analyze your vocal cords and pitch, and then synthesize that data with a chosen voice to create a completely different sound. With Supertone’s speech synthesis AI, you could sing like Ariana Grande or even have a conversation in the playful tones of the Minions. All this would be possible with this amazing device.


Supertone recently took part in G-STAR 2023, which is the largest gaming festival in South Korea. During the event, Supertone announced an AI-powered voice service that can convert a user’s voice into the voice of a game character in real-time. This means that if you choose a female game character, you will be able to speak in her voice instantly. With this new feature, gamers can engage in real-time voice chat while sounding like their favorite game character. Supertone claims that the voice conversion is so seamless that the listener won’t be able to tell the difference (as little as 20 milliseconds).


If the delay between speaking and hearing yourself through a speaker decreases, it would be easy to use a ‘voice converter’ with Supertone’s technology to change your voice in real-time. However, the voices available for conversion would only be accessible via purchasing content on Supertone’s cloud computers or subscribing to a service. This means that the copyright for the converted voice belongs to the AI provider. The copyright protection could be utilized to prevent crimes such as voice phishing or deep voice impersonation.


The kinds of identities that an individual can express through a voice converter will become increasingly diverse. People of any age can now have both a young and a mature voice. This technological advancement is a significant step towards eliminating gender bias since individuals can now choose the voice that they want, irrespective of their gender. By embracing voice converters, we can move towards a gender-neutral society and eliminate all forms of discrimination. The use of voice converters will facilitate the transition to a completely different society.

TAG
2024-01-06
editor
Eunju Lee
share