HTM(A)A — Charlie DeTar


Microphone Input: Voiced/Unvoiced classifier

"Measure something and display the result"

The Result:

Voiced/unvoiced classifier:

The Plan

My plan is to make a voiced/unvoiced classifier. For example, "s", "t", or "h" sounds should make one LED light up, where vowel or voiced consonant ('v', 'r', 'l') should make the other LED light up. The board won't do any fancy speech/non-speech classification (claps, whistles, or other noises will be classified as though they were speech). Magnitude (amplitude) and zero crossing rate can be used to do the classification - with the expectation that zero crossing rate will be higher and amplitude lower for unvoiced sounds.

While Neil wanted us to primarily rely on the computer for data display, I wanted to do the classification on-board the microcontroller, since having a simple classifier with 2 LEDs is cool and compelling. So I did both some computer display and on-microcontroller classification.

The Board

My initial board design was a minor modification of Neil's hello.mic.45.cad - I added two LED's, one for voiced, one for unvoiced. The only challenge in board milling and construction was a troubling problem with the Z-axis gear on the Modella. It seems someone had somehow stripped the Z gear, and consequently the machine had limited vertical travel. I was thus unable to cut out my board using the Modella, and had to use the scroll saw in the shop, resulting in a wider border around the board (which I cleaned with pliers and an exacto knife).

The wider board proved to be an asset when I encountered the power issues described below and had to tack on another MTA.

The Code - first try: C

I initially set out to code this project in C, not because the complexity of this task demands it, but because the complexity of future projects I'd like to code demand it. In order to do that, I would have to code a software UART for the ATtiny45 to mimic the functionality in Neil's hello code.

First, I set out to use Pascal Stang's Procyon avrlib, a library of useful C functions for AVR programming, which includes software UART transmit and receive code. However, the Procyon avrlib seems to be mainly written with the ATMega's in mind, and uses many mnemonic definitions that are specific to the megas. After struggling with getting the code to compile for the tiny45 using iocompat.h from avr-libc's example projects (and adding some definitions that were missing), I gave up, deciding that it would be simpler to code my own software UART from scratch.

Thus, I sought out Atmel's application note on interrupt driven software UARTs, in an effort to learn how the timer-based UART is constructed in assembler so I could do the same in C. I would only need the transmit half of the UART, and not the receive half, so I started to hack AVR's example code into my own version. However, my code doesn't work - after a lot of trying to figure out why, I decided I needed to just get this week's assignment done, and code it in assembler.

The Code - second try: assembler

Coding the assignment was initially fairly straightforward. The most interesting part was figuring out how to do what I needed to with 16-bit numbers. I found atmel's 16-bit arithmetics application note to be extremely helpful in teaching the basic pattern: do operation on low byte, then do operation on high byte "with carry".

My algorithm for voiced/unvoiced classification is very simple: examine the "magnitude" of the signal. (On Eric's suggestion, this is simply the sum of the part of the waveform that is above zero). If the magnitude is sufficiently high, look at the number of zero crossings. More zero crossings indicates an unvoiced signal, fewer indicates a voiced signal. Both of these calculations depend on a fairly solid notion of what "zero" is.

The Trouble with "zero"

The ADC returns values that are 10-bit numbers from 0 to 1024. So theoretically, if everything is perfect, "zero" from an audio signal perspective should be 512. However, it turns out that zero creeps around quite a bit, and consequently any calculation of zero crossings is highly variable depending on what zero is at the moment. Here's a video showing zero climbing around based on temperature:

In order to get accurate zero crossings, I had to estimate the position of zero. I decided to try simply calculating the minimum and maximum sample values from each frame, and to average them. This seemed to work rather well and consistently, so woohoo!

Voiced/unvoiced classifier: is the glottis engaged?

First board design, with power issues. (cad file)

The Trouble with Power

I realized right away that power was going to be a problem. As soon as I engaged the on-board LEDs, the board started behaving erratically - the LED would blink with a fixed period and the sample values coming across the serial line would stutter at the same rate. It seemed like the DTR current was being maxed out.

To fix this without resorting to making a new board, I built a simple 9-volt battery cable that plugs into the 4-pin MTA connector, but only connects the +V and ground leads.

While using this connector, however, I couldn't get serial data in and out of the comptuer. So, I tacked on a second 4-pin MTA to the board using hot glue, and jumped the transfer and ground to the appropriate places (see picture above).

Computer data display

My goal was just a voiced/unvoiced classifier, which the two onboard LEDs do marvelously. For the assignment, Neil wanted computer display, so I did a simple python/tk display building on Neil's example code to display the zero line, max/min of the signal, zero crossings and magnitude. The circle's diameter represents zero crossings, and the color (redness) represents magnitude. Both are calculated on the AVR and sent over serial to the python code. See the top of this page for videos of the result.

The base line signal while running on DTR power. Note the dip on the left-hand side, and the high level of noise. (video)

The base line signal while running on battery power. Much higher signal-to-noise ratio, no dip. (video)