Week0
Week 0: Drawing Sound#
For my final project, I’d like to build on my experience building strange musical instruments, as well as my research group’s focus on multimodal AI. Basically, my idea is that I’d like to design a musical instrument where one “plays” by drawing, and those drawings get sonified in some way using a generative AI audio model. The idea for this originated from my recent bouts of thinking a lot about the concept of play, and how wonderful/critically important it is for being a human. From a musical point of view, I think that both in my own experiences and those of others, some of the most wonderful times I’ve had are when both the physicality of music-making and the stresses of it melt away and I’m left with just the joy of jamming with others. My hope in all my research is to create tools that enable these sorts of “music play” adventures, and so I set out to design an instrument that allows some cross-modal music-making to happen.
First step was aesthetic inspiration - I thought it would be really cool to base the physical design of the instrument off of an actual ornate painting frame, as if the sound you were hearing is emerging from some magical work of art you might see hanging on a wall in a museum somewhere:
I also vaguely was thinking of game consoles like a DS, given all of my own many fun times that were had with the the touch based screen.
Next, came sketches and drawings of my rough concept. I started by taking inventory of the possible software given all the many configurations (vector, raster, 2D, 3D, etc), and settled on using Illustrator, Photoshop, Fusion as my primary tools.
After that came the most spooky of all: Fusion. I have very little CAD experience, and zero with fusion so I spent many an hour watching online tutorials like this which were infinitely helpful, as were Alfonso’s office hours.
It was especially really hard for me to figure out things like getting diagonal 3D rectangles and the like - there was a lot of trial and error. Still, the most difficult thing is figuring out the most idiomatic way to draw the complex shapes in my head given the parameters of the software. I’m much more comfortable with programming and things like that than hardware, so it’s a bit frusterating to not know these things now.
CAD is hard ^^ (me messing around with parameters/general concepts)
Here’s my full render of what I’m currently working with, which I’m currently (very WIP name) calling the “Sonic Painting”:
Finally, some details about how I intend the software to work. Basically, I plan on using RAVE, which is a recent autoencoder-based audio model because it’s fast and high quality and has some prior work with successful integration into other AI musical instruments. Autoencoders are cool becaue they work by condensing a data representation into a much smaller latent space, that in this case allows for expressive, somewhat controllable generations as one travels through it. What’s really cool is that while this latent space is usually just a bunch of n-dimensional tensors of numbers, there’s no reason why it couldn’t represent something like…coordinates, or perhaps… coordinates of lines and shapes that someone draws :o
In this way, sound gets mapped to drawing, hopefully in a way that is “learnable” - thus transforming the traditional paradigm of user learns instrument into user learns AI model which is then itself instrument. I also intend on augmenting this with some classic musical effects, like pitch shifters and loopers. Tenatively, my idea is that shapes that form full loops, allowing the user to make patterns and emergent soundscapes. I’d also like to divy up the screen so that the right side controls the pitch shifter, perhaps even with the ability to make “pitch shapes” that allow one to dynamically shift everything that happens. Here’s a sketch of what I mean:
Notes/Warnings/Tips to Myself and Future Travelers#
-Ask for help!! People who know what they’re doing can save you hours of frustration (especially with CAD)
-Even if you’re bad at drawing (like me), drawings are super helpful to translate head picture into slightly more real picture of thing