Household Dirt & Stain Data (Floor-Stains)

Background

In the pursuit of truly autonomous cleaning, the industry faces a "Long-Tail" problem. While robots can easily navigate around furniture, they often lack the ability to distinguish the nature of the debris on the floor. Is it a harmless dust bunny, a wet juice spill, or a sticky dried stain? Current datasets typically lack this granular variety. To bridge this gap, Robotin has shifted focus to constructing a massive, crowdsourced repository of Real-World Floor Contaminants. By leveraging the power of user-generated content, we capture the messy, unstructured reality of daily life—data that laboratory simulations simply cannot replicate.

Core Features of the Dataset

Unlike traditional robotic datasets captured by specific sensors, this dataset embraces the diversity of real-world capture conditions.

  • Crowdsourced Realism (Smartphone RGB): Data is captured via users' smartphones, providing high-fidelity images under natural, uncontrolled lighting. This approach ensures a vast diversity of textures, reflections, and floor types (wood, tile, carpet).

  • Multi-View Flexibility:

    • Robot-Centric View: Users are guided to shoot from low angles (10–40 cm) to mimic the perspective of standard vacuum robots.

    • Human-Centric View: Includes supplementary high-angle shots, offering 3D context useful for humanoid robots or surveillance systems.

  • Rich Semantic Metadata: Beyond visual data, each sample is tagged with user-described categories (e.g., "coffee," "pet hair," "mud"). This allows models to learn not just where the dirt is, but what it is.

  • Scalable Annotation (Bounding Box): We provide efficient 2D Bounding Box annotations to support rapid object detection training.

  • Dynamic "Before & After" Paired Data: A unique feature of this dataset is the inclusion of pre-cleaning and post-cleaning comparison shots. These pairs are critical for training "Reward Models" in Embodied AI, helping robots learn to verify if a surface is truly clean.

Potential Applications

  • Smart Cleaning Strategy (Dry vs. Wet) enables robots to classify stain types effectively—identifying liquid spills to retract vacuum brushes (preventing damage) or spotting dried stains to apply extra mopping pressure.

  • Visual Verification & Closed-Loop Control leveraging the "Before/After" data, robots can self-assess their cleaning performance. If residue is detected after a pass, the system can autonomously trigger a re-cleaning cycle.

  • Hazard & Anomaly Detection improves the robot's ability to distinguish between cleanable debris and hazardous items (e.g., cables, pet waste) or floor damage, ensuring safe operation in complex home environments.

Market & Research Value

  • Academic Value: Provides a challenging benchmark for fine-grained classification and anomaly detection in unstructured environments.

  • Industrial Value: Offers critical "corner case" data (transparent liquids, sticky residues) needed to reduce failure rates in consumer cleaning robots.

  • Community-Driven Growth: As a crowdsourced project, the dataset continuously evolves, capturing regional and seasonal variations in household environments.

Vision: Building a global embodied intelligence data asset through crowdsourced real-world complexity.

This dataset shifts the focus of robotic training from simple spatial navigation to granular environmental understanding. By capturing the chaotic, unscripted reality of household surfaces, we empower robots to move beyond merely "avoiding" obstacles to meaningfully "interacting" with and maintaining their environments. This is a crucial step toward building truly autonomous, context-aware embodied intelligence.

Last updated