Audition is triggered by sound-induced nanoscale vibrations in sensory epithelium of the inner ear. The epithelium is composed of three layers; sensory hair-cell, supporting-cell, and extracellular-matrix layers. Conventional optical coherence tomography (OCT) systems planarly scan the sample to reconstitute the image and analyze the motions. However, because of the low scanning resolution, planar distribution of the vibrations inside the epithelium remains uncertain. To address this issue, we describe an advanced OCT imaging system. A high-resolution microscope and ultra-speed CMOS camera were incorporated into the system, which resulted in acquisition of a square image of 0.5 x 0.5 mm at once with resolution of 2 μm. A supercontinuum broadband light source in the system achieved depth resolution of 1.8 μm. A vibrometric technique that can stroboscopically capture the motion permits us to detect vibrations of up to 30 kHz in the object. Through a microscope equipped with the system, we recorded a planar distribution of nanoscale vibrations on the extracellular-matrixlayer in a live guinea pig.This system may contribute to clarification of a fundamental mechanism underlying hearing.