AI@IA — Extracting Words Sung on 100 year-old 78rpm records

A post in the series about how the Internet Archive is using AI to help build the library.

Freely available Artificial Intelligence tools are now able to extract words sung on 78rpm records.  The results may not be full lyrics, but we hope it can help browsing, searching, and researching.

Whisper is an open source tool from OpenAI “that approaches human level robustness and accuracy on English speech recognition.”  We were surprised how far it could get with recognizing spoken words on noisy disks and even words being sung.https://archive.org/embed/edison-50043_01_3792

For instance in As We Parted At The Gate (1915) by  Donald Chalmers, Harvey Hindermyer, and E. Austin Keith, the tool found the words:

[…] we parted at the gate,
I thought my heart would shrink.
Often now I seem to hear her last goodbye.
And the stars that tune at night will
never die as bright as they did before we
parted at the gate.
Many years have passed and gone since I
went away once more, leaving far behind
the girl I love so well.
But I wander back once more, and today
I pass the door of the cottade well, my
sweetheart, here to dwell.
All the roads they flew at fair,
but the faith is missing there.
I hear a voice repeating, you’re to live.
And I think of days gone by
with a tear so from her eyes.
On the evening as we parted at the gate,
as we parted at the gate, I thought my
heart would shrink.
Often now I seem to hear her last goodbye.
And the stars that tune at night will
never die as bright as they did before we
parted at the gate.

All of the extracted texts are now available– we hope it is useful for understanding these early recordings.  Bear in mind these are historical materials so may be offensive and also possibly incorrectly transcribed.

We are grateful that University of California Santa Barbara Library donated an almost complete set of transfers of 100 year-old Edison recordings to the Internet Archive’s Great 78 Project this year.  The recordings and the transfers were so good that the automatic tools were able to make out many of the words.

Leave a Comment