Scene context is known to facilitate object recognition and guide visual search, but little work has focused on isolating image-based cues and evaluating their contributions to eye movement guidance and search performance. Here, we explore three types of contextual cues (a co-occurring object, the configuration of other objects, and the superordinate category of background elements) and assess their joint contributions to search performance in the framework of cue-combination and the temporal unfolding of their extraction. We also assess whether observers’ ability to extract each contextual cue in the visual periphery is a bottleneck that determines the utilization and contribution of each cue to search guidance and decision accuracy. We find that during the first four fixations of a visual search task observers first utilize the configuration of objects for coarse eye movement guidance and later use co- occurring object information for finer guidance. In the absence of contextual cues, observers were suboptimally biased to report the target object as being absent. The presence of the co-occurring object was the only contextual cue that had a significant effect in reducing decision bias. The early influence of object-based cues on eye movements is corroborated by a clear demonstration of observers’ ability to extract object cues up to 168 into the visual periphery. The joint contributions of the cues to decision search accuracy approximates that expected from the combination of statistically independent cues and optimal cue combination. Finally, the lack of utilization and contribution of the background-based contextual cue to search guidance cannot be explained by the availability of the contextual cue in the visual periphery; instead it is related to background cues providing the least inherent information about the precise location of the target in the scene.