Tuesday, November 1, 2011

I am currently working on fastSlow features that I hope will be more representative than previous features that I used in the previous fastSlow paper. These features related to information rates and are intended to have perceptual analogs. While IOI rate (e.g., per 6 sec. frame) is an important feature, it doesnt tell us if everything is spectrally the same in the piece. Thus we're instead looking at distributions of frequencies within each IOI, and then considering statistics of change from one to the next.

Here is a spectrogram of 5 seconds of a slow dnb song:


This is the IOI-distribution-ogram the same song. It seems quite telling.


While we aren't able to separate out the individual notes using this method (maybe a multiple-gauss per IOI might work), I also dont think its necessary. Right now we're running a fast-slow classification test using features extracted from the fastSlow dataset, and I'll make another post once this is done.

Thursday, October 6, 2011

working on a solution for problems 1-3. More to come shortly...

Tuesday, October 4, 2011

Last week I presented a an idea for my dissertation topic. I would like to be able to identify which breaks (short segments of 2 or more bars of solo percussion from funk/jazz/soul songs) are being used in jungle/drum and bass, a wildly popular form of dance music from the 1990s - 2000s. While at any given time in the music's progression, the notable characteristics of the genre (i.e., tempo [fast - ~155-180], timbre [familiar drums, deep bass], overall dynamics [loud]) constrict the possibilities of creativity, true artistry is seen through the way in which drum machines and breaks are used and manipulated. As a first step in the creation of any dnb song, musicians select a break(s) and slice it into individual hits. The samples are then distorted, pitched, layered with other breaks, then reordered. Several of these stages are either done using compression and limiting making the recognition of origin even more difficult. Yet the avid listener is capable of recognizing "Apache", "Amen", or "Think".

Standard methods of analyzing music (e.g., harmonic functions) or musicianship (e.g., performance characteristics) do not apply to this music, as the aim and structure of the music is seemingly rudimentary from this point of view, and not often performed in a live context. The music is really about "the accelerated, chopped-up breakbeat rhythms...looped into a rolling flow". The melodic and harmonic content that generates excitement in other forms of music is replaced with the manipulation of the timing and timbre of the drums, as well as the interplay between the drums and other elements such as the bass.

This thesis topic would provide methods of analyzing these drums, providing:
1) an automated list of breaks used within the track under analysis
2) a possible map describing how these drums have been modified to fit the context of the piece

Towards these goals, I have begun working towards the separation of drums from the music.

Here's how I see it. I need to:
1) isolate sections of the track in which the drums are the most prominent part of the song
2) remove (if possible) non-drum related content
3) find these drums within a database of others
4) find out how they've been i) resequenced, ii) modified (pitch, distortion, reversed etc.)

I've started to work towards the first of these tasks.

Take this song: Freedom, by Big Bud

1) Foote's novelty function (via mirtoolbox) for segmentation is applied to identify where in the song drastic changes occur. Here is a plot of the wavform, followed by a plot of the generated novelty curve:


2) Then features are extracted from each section that will hopefully be able to be used to determine if drums are present. These features are mostly based on onsets (e.g., onset density). A modified version of the onset detector described in this blog was used.

3) Once the sections are identified, the onset points are again used to segment the audio.

4) The k-Means algorithm is used to cluster the output into categories based on grouped spectral characteristics. At present, k=3, and assumes drums at each onset point. Drum classes are then assigned using the following: kicks occupy the class with the lowest center of spectral gravity, snares the most widespread energy. Hats are then by default the remaining class.

Here is the audio from the section of Big Bud's Freedom with the greatest onset density.

And here is the audio of all KICK drums from this section.

So there is obviously still quite a bit to do. Amongst others:

1) not all slices are drums. By using k=3 for kmeans, I am assuming that the slices each contain a drum. While this is likely for jungle, and much of drum and bass (dnb), its not always the case, and it would be important to create some sort of filter for this, increase the size of k, or come up with another solution entirely. Template matching?

2) determination of drum presence in section is not foolproof. Possible solution: add additional features. This shouldn't be too difficult.

3) how to remove non-drum spectral content.

4) break classification has not been touched yet.

Monday, October 3, 2011

Had some time to code something up on the train from Montreal to NY. Certainly not bad for drum sound separation... Will post up full details tomorrow in the AM. Too tired now!

Wednesday, September 28, 2011

Thinking about how to find the drum sections...(this was written prior to trying out using existing segmentation methods). Foote's 1999 segmentation method seems to work well for this, and will be covered in the next blogpost.

attempt 1: simple energy profile:

hypothesis: large structural changes will be visible in spectral features. For this music, we will see significant rises (with sustained activity) in standard deviation of spectral features such as spectral flux because of the full instrumentation during certain parts.

Each of the following 30 graphs is a drum and bass song from our constant tempo database. The The feature used is the standard deviation of the windowed spectral flux. Window size here is approximately 6 seconds.




pros:
- in most songs, there is a clear representation of the degree to which a song changes, due to the added frequency content in the track.

cons:
- if song doesnt have much variation from section to section it is very difficult to tell where the sections are
- too coarse a view.

attempt 2: use onset detector to create feature(s) to use for sectioning.

Hypothesis: the large number of onsets will be indicative of the presence of drums. This is definitely true for this music. The problem with this is that there are many songs that contain a LOT of drums, all the way through. For example: Big Bud - Freedom, found in graph #3 (of 30) above, has a IOI output (x-axis=the index of the IOI, and y-axis=length of IOI in seconds) that looks like this:


The large peak is for the 25 second-breakdown in the middle of the song, where there are no drums. Other than explaining where in the song the drums are definitely not, this graph does nothing for identifying the larger sections. This song is indicative of many dnb (abbreviation for drum and bass) songs of the era, and it can be assumed that this is not a reliable method of finding the sections.

Monday, August 29, 2011

tempoTracker has been adapted so that there is congruence between its output and that of the 3 other trackers (on at least 1 tempo octave). At this point, there are still some "flaws" -> for example, the output is sometimes 2/3*tempo or 1.33*tempo.

On further inspection, I've noticed that this type of output is not supported by other shorter-interval peaks at lower beat levels (something the literature [L&H] states is necessary for the perception of beat). Using wavelet decomposition into subbands, these flaws are really noticeable, as well as the more correct beat levels.

Here are 2 such examples:

1) Basic Channel - Octagon

audio


Here is the output tempo from each subband. The actual beat level (@lag 37) results in 140 BPM. The 7th and 8th subbands (bottom 2 graphs) demonstrate this beat level very clearly. The first few subbands show the 112 BPM peak (@lag 46). Notice that this lag is not supported in these 2 subbands.


2) Big Bud - Freedom

audio


Here is the output tempo from each subband. In this audio example, the drums are playing at 162 BPM, then cut out for a breakdown section, before coming back in. The Viterbi backtrace does an excellent job here filling in a best fit tempo for this breakdown, however you'll notice in some bands this estimate drifts, due to the prior weighting curve's dominance in the face of lessened peaks.

Monday, August 22, 2011

Of the original 202 tracks, 191 had agreement between 3 tempo trackers
on at least one metrical level. The tracker that "fails" to meet the
others in these cases is the Ellis tracker.

Once these tracks are removed, I have run our tempo tracker. With the
0.04 tolerance window as explained in the previous email, we miss a
total of 18 of these. Increasing the allowance window results in the
following:

allow%: 0.04 0.05 0.06 0.07 0.08 0.09 0.10
wrong: 18 15 13 8 6 6 5

My reasoning for this is the following. The three tempo trackers
contain 2 components: An initial tempo classifier, and a beat phase
calculation. In both Davies and Ellis, initial tempo is found using a
rough estimate. Dynamic programming is then used to find the best fit
of beats within the audio given this rough tempo calculation. A tempo
trace is then created from the inter-beat intervals (IBI). In ours
however, as I was asked to not program any further and just use the
tempo calculation, we are left with the rough tempo calculation from
the initial tempo estimate; no beat phase estimate is performed.

To test this hypothesis I took a look at the 18 tracks that were "incorrect".

1) Antidote - Off Dub -> this is a weird one. But its dubstep - and
all dubstep lies in the range of 140 (sometimes (but rarely) as slow
as 127, but no faster than 150). I opened this up in logic, and lined
up a metronome beat to it, and it was 140. Of the 4 trackers tested,
ours was the only one to get it right. All others found either 80 or
160.

2) Avry Tare - Ghosts of Books -> Klapuri gets it right at around 140
BPM. None of the other trackers get it. Ours guessed 120. The intro is
soft pads with slow attacks. My guess is that because the tempo
tracker is based on found onsets, 0 onsets = pre-emphasis curve only
with a peak at ~120. Will need to adjust code and look further into
this one.

3) Basic Channel - Octagon -> Strange that our tracker gets this one
wrong. Its a classic 139 4/4 minimal techno beat from the mid-90s.
There is some odd rhythm stuff that kicks in about 1/3 way into the
track, that is much slower, but since onset detection isn't piecing
the instruments apart, this should have no effect.

4) Bibio - All the Flowers -> this is a simple (and beautiful)
acoustic guitar/bass/vocal piece that has been artificially
constructed in a sequencer. Using logic as above, I hear the track
more as 6/8 which would play at 76 BPM. None of the trackers agree
with me. Klapuri's does identify a 101 BPM 4/4 track however, that
when overlaid on the music works harmonically, however doesn't quite
fit the vocals as cleanly as the 6/8 metronome. Ours found a similar
tempo as Davies and Ellis here.

5) Big Bud - Freedom -> drum and bass track from mid-late 90s. Our
tracker guessed 132 BPM. All 3 others had some level of 163 BPM
(correct) at some metrical level.

6) Boards of Canada - Bocuma -> slow ambient piece. All trackers other
than ours predict correctly 90 BPM. Ours guesses 115.

7) Commix - Japanese Electronics (Instra:Mental remix) -> modern drum
and bass piece set to 170 BPM. Ellis, Davies, and Klapuri have 170 BPM
on at least one metrical level. Ours does not. This track has a long
fade out of ambience and bass at the end. I'm assuming its a similar
problem to problem track #2, so when I adjust the tracker observation
vector, I will test this again.

8) Digital and Morphy - Shanti -> 172 BPM drum and bass track. Ours
guesses 187 BPM.

9) El-P - Patriotism -> Hip hop track from early 2000's. Three
trackers guess 85, ours guesses 88 BPM. I suspect that this is an
example of the hypothesized problem of "rough tempo" before beats are
accurately laid in. The problem is one of resolution, akin to trying
to identify an exact frequency from a 44.1kHz wavfile using a sliding
DFT with 512 points. While the audio has 44100/2 possible frequencies,
only 512 bins exist in the DFT.

10) Gerwin - Mistakes -> 140 BPM post-dubstep. The 3 others get ~140
BPM. Ours gets 112 BPM.

11) Getz & Nuage - Is The Way -> 170 BPM drum and bass track. 3 others
get it. Ours again guesses 112 BPM. There's something going on with
the 112 BPM predictions.

12) Low Limit - Turf Day -> Davies and Klapuri get 93 BPM for the 4/4
beat. Ours comes up with a similar answer to Ellis: ~116 BPM. There is
a synth zap sound that potentially causes a 5 feel. 5/4 *93 = 116 BPM.

13) Martsman - Marksman -> Klapuri is the only one that gets this one
right. Its 170 BPM (or some metrical level therein). Davies = 127 BPM;
Ellis + Ours = 114 BPM

14) Sanderson - Beautiful Click (SND remix) -> straightforward 172 BPM
drum and bass. Ours guesses 123 BPM. Not sure why this happened. Will
have a look at observation vector.

15) Stunna - Optima - Half-tempo drum and bass track. Probably
sequenced at 174 (with little drums hinting at the genre and speed),
but everything else about the track says 87 BPM. Ours gets 92. I think
that this is another example of rough-tempo calculation gone bad
(e.g., #9).

16) Teebee - Black Rain -> 175 BPM Drum and bass track. Somehow our
tracker is getting values in the 1000's. This definitely needs looking
into. The 3 others get it correct.

17) Triosk - Intensives Leben -> 150 Jazz track played to computerized
low-bit drums. 3 other algorithms get it correct. Ours says ~120 BPM.

18) Unknown - 0006 -> arpeggio at 120 BPM. Klapuri+Davies+Ours = 120
BPM, Davies = 95 BPM.

So, there are 17 errors as opposed to 18.

Next I'll check out why these errors are occuring by analyzing the
input observation vectors and output tempi of the tracks above. Will
report on that next.