There's been quite a lot of interest recently on the subject of AI bird sound recognition. It's been used in various nature reserves around the country and featured on BBC Springwatch in 2024 where they used AI identification for analysing recordings made at RSPB Arne for monitoring purposes.
I have a friend and old work colleague who is an inveterate tinkerer, and who introduced me to BirdNET-Pi, which is based on a Raspberry-Pi and uses BirdNET to analyse the sounds. The BirdNET-Pi device is cool in that it uses WiFi to communicate in real time the birds it's detecting, so you can put it in your garden and see what it's hearing from the comfort of your warm house (see also BirdWeather), but this is also a drawback if it's a remote setup you want. There are commercial bird sound recorders that can be deployed remotely and their recordings analysed back at base; this is what was used at RSPB Arne. These recorders start at £160, and very much upwards. I thought I'd have a go at making my own, partly to save money and partly because it would be a challenge, and hopefully fun.
I've experimented with the cheap ESP32 microprocessor before, adding one to a humane mousetrap I'd made (thus creating possibly the world's first "mouse e-trap"!) which simply sends me an email to alert me to the fact there was a mouse in it - so that it didn't become an inhumane trap. The ESP32 is programmable via the readily available Arduino development framework. I use this on a Linux Mint system.
These are the main components of the recorder:
ESP32-WROOM-32 surface mounted module board
This is the ESP32 microprocessor mounted on a board with pins to allow connection to other components via breadboard or PCB/stripboard.
INMP441 I2S miniature microphone
An I2S MEMS (Micro Electro-Mechanical System) microphone. I2S is a standard communication protocol that is supported by the microprocessor. Another commonly available microphone, the MSM261S4030HO, can alternatively be used as it appears compatible.
DS3231 RTC module
Although the ESP32 has an internal clock, it gets reset when power is removed. The RTC (real time clock) module has a rechargeable battery that maintains the time, similar to (possibly even the same as) the component in your computers and other household devices that keeps their clocks correct. The recorder needs to be aware of the time so the sound files can be correctly time stamped.
Micro SD card module
Sound files are written to a micro SD card. The ESP32 program produces 16bit WAV data, at various sampling rates.
In the image below, from left to right: ESP32, DS3231 RTC, MSM261S4030HO microphone, SD card module.
![]() |
The main components |
This is the prototype put together on breadboard, using the MSM261S4030HO microphone:
and V1 on stripboard using the INMP441 microphone:
Enclosures
In both cases I used a food storage container as a cheap solution. They appear to have a sufficiently good seal to keep out the weather. The V1 box has survived a week in a cold and wet wood without any obvious wet inside. The microphone inevitably needs a small hole to let the sound in, and I tried to prevent any rain entering by fixing a small "ledge" above it.
Update: I tested putting a membrane (piece of sellotape) across the hole to keep out the weather, and it didn't seem to affect the sound
Power
The ESP32 can be powered from anything between 3.7V to 12V via its VIN pin, and the rest of the circuit powered from its 3V3 output pin. For the breadboard prototype I used a 600mAh 9V PP3 rechargeable battery which gave about 200 minutes recording time. The stripboard version (same circuit, same power consumption) uses a rechargeable Li-Ion battery. The 3500mAh one shown has powered it for 700 minutes recording time and the battery was not exhausted (although probably not far off). I have a 9900mHa one on order which should last proportionally longer. More on this later.
Programming
The recorder is centred around the microprocessor. The essential part of the programming is taking data bits from the microphone interface and writing them to the SD card in WAV format. Surrounding this is a bunch of support functionality:
- a CONFIG file on the SD card which is is read to allow various parameters to be specified
- connection to the internet via WiFi if necessary to get the correct time and update the RTC module.
- logging to a log file on the SD card
- configurable recording time and between-recording sleep time
Between recordings the ESP32 enters deep sleep mode, which uses little power.
The microphone challenge
The biggest challenge was processing the data from the I2S mic correctly. As you'd expect, most of the how-do-you-do-that? came from finding examples on the Internet, and it turned out that none of the examples I could find appeared to be doing it correctly, as I eventually found out.
The I2S interface for a microphone can be configured to deliver data at a particular sample rate (e.g. 44kHz) and with a specified sample width (e.g. 32 bits). The INMP441 however delivers 24 bits of data in the high order bits of a 32 bit sample and this is fixed, so attempts to configure the interface to deliver something different are likely to lead to the wrong processing. My code shifts the sampled bits to produce a real 32 bit value, and then down-samples it to 16 bits. It samples at a rate of 24kHz, which gives a frequency range up to 12kHz (sample rate/2), sufficient to capture all bird sounds I've seen, and the sample size and rate I consider to be a reasonable compromise between quality and file size. A 2 minute recording produces a 5.8MB file (but see later).
As the sample rate is not hard-wired, it can be experimented with. A higher rate will give a higher frequency cut-off, but at the expense of file size and possible dropouts if it's pushing the processor too hard.
More on batteries
I found that as the battery runs down, the recordings get shorter. I assume that (for a 3.7V battery) the voltage drops below the ESP32 threshold and somehow the recording stops. This isn't sufficient to power off the processor, as it goes to sleep after recording as programmed. I'm guessing that during sleep, the battery picks up again for the next recording, and this repeats until the battery dies completely.
Also, these later recordings also get noisier, fluttering and rhythmic "crackling". Again, I assume this is a result of insufficient voltage from the dying battery. I'm currently experimenting with increasing the voltage, through using 2 x 3.7V Li-ion batteries in series, or 2 x 9V PP3 batteries in parallel.
Battery life
Tests on battery life has so far shown:
Test with 3300mAh Li-ion battery until flat: 38 x 28 mins (with 2 mins sleep in between) = 1064 mins 17.5 hrs total recording, although last 25% poor quality probably caused by low voltage
Test with 2 x 600mAh rechargeable PP3 batteries until flat: 24 x 28 mins (with 2 mins sleep in between) = 670 mins 11 hrs total recording. All recordings good until end (which tends to confirm the suspicion that low voltage is causing the problem with the 3.7V Li-ion)
More on recording frequency
I started by using BirdNET-Analyzer to process the recordings. Then I discovered Chirpity, a tool designed to make the processing of large number of sound files easier. It uses BirdNET-Analyzer to do the analysis, and shows the detections found in a very easy-to-use form, as well as display a spectrogram of the sound. A really good program with lots of functionality.
BirdNET-Analyzer (according to Matt, the Chirpity author) works on files recorded at 48kHz, and will automatically upsample lower frequency recordings to this before analysis. Ideally, therefore, recordings should be at this sample rate. Chirpity does this resampling if necessary before feeding the files to BirdNET so BirdNET doesn't have to. But this does lead to differences, as Chirpity and BirdNET use different methods to do the resampling, Chirpity seeming to do better a better job (it uses ffmpeg, a tried and tested tool for audio file processing).
(BTW I need to be more careful about differentiating sample rate in ksps, and frequency, in kHz. The maximum frequency in the recording is half the sample rate i.e. 32ksps sample rate gives a max frequency of 15kHz.)
The bottom line for me it that if I'm going to use BirdNET directly, I should use 48kHz recordings (or upsample to this myself with ffmpeg). For lower sample rates, use Chirpity. However, given the functionality in Chirpity I don't see me not using it all the time.
Until recently I'd been reluctant to believe identification was any better with sample rates greater than the 24ksps I've chosen, but discussions on the Chirpity discussion group has persuaded me that 32ksps would be better. This is because BirdNET uses up to 15kHz in its analysis (anything higher is thrown away).
Configuration
// WiFi values for getting NTP time
// Don't specify if WiFi is not to be used
//SSID=Home;
//PASSWORD=xxxxxxxx;
//GMTOFFSET=0;
// we can save a bit of power by turning off the LEDs
//LEDS=0;
SAMPLE_RATE_KHZ=48; // 24, 32, 40 or 48
ATTENUATION_FACTOR=4; // turn down the gain
// Recording control for deployment
INITIAL_DELAY=90; // 1.5 minute in secs
RECORD_TIME=300; // 5 minutes in secs
RECORD_WAIT=1500; // 25 minutes in secs
Wi-fi is only used for setting the real-time clock. If these values are specified the current date/time will be obtained from the Internet and set in the DS3231. As the DS3231 has its own rechargeable battery this should rarely need to be done, changing to and from daylight saving time would be the main reason under normal circumstances.
The INITIAL_DELAY is to give you time to get the recorder into position before it starts.
The other values are pretty self-explanatory.
Log file
A sample log file:
2025/2/25 16:41:35 ===================================================
2025/2/25 16:41:35 Starting...
2025/2/25 16:41:35 SAMPLE_RATE_KHZ=48
2025/2/25 16:41:35 ATTENUATION_FACTOR=4
2025/2/25 16:41:35 INITIAL_DELAY=90
2025/2/25 16:41:35 RECORD_TIME=300
2025/2/25 16:41:35 RECORD_WAIT=1500
2025/2/25 16:41:35 Setting ESP32 clock from RTC: Tue Feb 25 16:41:35 2025
2025/2/25 16:41:35 Initial delay start
2025/2/25 16:43:5 Initial delay end
2025/2/25 16:43:6 Recording to file /13364305.wav
2025/2/25 16:48:6 Recording ended
2025/2/25 16:48:6 Temperature is 18.3C
2025/2/25 16:48:7 Going to sleep...
2025/2/25 17:12:16 ===================================================
2025/2/25 17:12:16 Starting...
2025/2/25 17:12:16 Recording to file /13371216.wav
2025/2/25 17:17:17 Recording ended
2025/2/25 17:17:17 Temperature is 10.3C
2025/2/25 17:17:18 Going to sleep...
2025/2/25 17:41:42 ===================================================
2025/2/25 17:41:42 Starting...
2025/2/25 17:41:43 Recording to file /13374143.wav
2025/2/25 17:46:43 Recording ended
2025/2/25 17:46:43 Temperature is 8.8C
2025/2/25 17:46:44 Going to sleep...
2025/2/25 18:11:8 ===================================================
...
The DS3231 module contains a temperature sensor, so why not use it?
File names
The SD card file system only supports 8.3 file names. I wanted the file names to be derived visibly from date and time, but the short file name makes this virtually impossible. The unique, ascending names that are used are created from the date and time but in a non-obvious way. When I upload the files to my computer I use a simple command to rename the files, using the file creation dates.
And next...
I still consider this to be work in progress. Perhaps at some stage I will put details on GitHub. Meanwhile if you would like more details, contact me via email.