If a recording of someones very rare voice is representable by mp4 or whatever, could monkeys typing out code randomly exactly reproduce their exact timbre+tone+overall sound?

I don’t get how we can get rocks to think + exactly transcribe reality in the ways they do!

Edit: I don’t get how audio can be fossilized/reified into plaintext

  • cheese_greater@lemmy.worldOP
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    4 months ago

    I just think its crazy I can record a random recording right now or me speaking and that can be stored in what must ultimately be good old-fashioned plaintext or whatever.

    Like, thats a rock thinking and turning sound right into stone, wayyyyy more impressive and beneficial than alchemy turning lead into gold

    • Barack_Embalmer@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      4 months ago

      Yes digital media, and computers in general, are miracles of science and engineering. Is there some reason digital audio in particular inspires you in this way, as opposed to digital images?

    • Bytemeister@lemmy.world
      link
      fedilink
      Ελληνικά
      arrow-up
      0
      ·
      4 months ago

      It doesn’t get encoded in to plaintext. First, the microphone picks up the sounds, and outputs values for frequencies and intensities. Recording software takes those values, and compresses them down into binary data. Then that binary data is saved onto storage. Depending on your storage, it’s then stored magnetically (cassette, floppy, HDD) or as a “lockable” logic gate (USB, SSD) or as laser etched dots and dashes (CD/DVD)

      It’s not getting turned in to rocks, it’s getting written on media.

      Also, some number for scale…

      My computer has 3.5ghz processors. It can run 3.5 billion instructions every second. To put that in perspective, the smallest unit of time humans can perceive is ~13ms. That processor can run ~270,000 instructions in that time frame. Computers perform very simple tasks, extremely quickly, and it gives the impression of intelligence.

      • AstralPath@lemmy.ca
        link
        fedilink
        arrow-up
        1
        ·
        4 months ago

        Its funny that human perception seems to be anecdotally tied to double digit milliseconds when if you ask any drummer or guitar player about input latency they’ll tell you that the absolute maximum round trip latency to be able to enjoy playing the instrument is in the range of 5ms.

        Only once latency dips under 5ms does it start feeling “right”. Personally, I groan when I have to use anything over 3ms with my guitar as the second I hit high tempos the latency is unbearable.

        Below 3ms it gets very hard to say that you can feel a difference.16th notes at 250bpm with 5ms latency has you approaching 10% of the note separation time. It’s 100% perceivable.

        • PsychedSy@lemmy.dbzer0.com
          link
          fedilink
          arrow-up
          1
          ·
          4 months ago

          It’s kind of apples to oranges. Smoothness or variance is noticeable above discrete human ‘limits’. For a variety of reasons.

          With music you have multiple types of feedback.

      • cheese_greater@lemmy.worldOP
        link
        fedilink
        arrow-up
        0
        arrow-down
        1
        ·
        4 months ago

        But how can it capture perfectly my exact voice or the exact timbre of whatever stuff is playing. Like, its mind-blowing to me and I have nothing i can analogize it to. Its incredible we can even take pictures with pixels, sound is just a whole notha level that astounds me

        • Barack_Embalmer@lemmy.world
          link
          fedilink
          arrow-up
          2
          ·
          4 months ago

          Everything about the exact timbre of your voice is captured in the waveform that represents it. To the extent that the sampling rate and bit depth are good enough to mimic your actual voice without introducing digital artefacts (something analogous to a pixelated image) that’s all it takes to reproduce any sound with arbitrary precision.

          Timbre is the result of having a specific set of frequencies playing simultaneously, that is characteristic of the specific shape and material properties of the object vibrating (be it a guitar string, drum skin, or vocal chords).

          As for how multiple frequencies can “exist” simultaneously at a single instant in time, you might want to read up on Fourier’s theorem and watch 3Blue1Brown’s brilliant series on differential equations that explores Fourier series https://www.youtube.com/watch?v=spUNpyF58BY