Methods and a computer-readable storage device are disclosed for generating a frequency representation of a query audio file. The frequency representation represents information about at least a number of frequencies within a time range containing a number of time frames of the audio content information and a level associated with each of said frequencies. At least one of area of data points in the frequency representation is selected. A fingerprint for each selected area of data points is generated by applying a trained neural network onto said selected area of data points thereby generating a vector in a metric space. A distance between at least one of the generated query fingerprints and at least one reference fingerprint is calculated using a specified distance metric. A reference audio file having associated reference fingerprints which have produced at least one associated distance satisfying a predetermined threshold is identified.

@patent{Donier2020,
    title = "Audio fingerprint extraction and audio recognition using said fingerprints",
    author = "Donier, Jonathan and Hoffmann, Till",
    note = "US10657175B2",
    year = "2020",
}