The strength of visual backward masking depends on the stimulus onset asynchrony (SOA) between target and mask. Recently, it was shown that the conjoint spatial layout of target and mask is as crucial as SOA. Particularly, masking strength depends on whether target and mask group with each other. The same is true in crowding where the global spatial layout of the flankers and target- flanker grouping determine crowding strength. Here, we presented a vernier target followed by different flanker configurations at varying SOAs. Similar to crowding, masking of a red vernier target was strongly reduced for arrays of 10 green compared with 10 red flanking lines. Unlike crowding, single green lines flanking the red vernier showed strong masking. Irregularly arranged flanking lines yielded stronger masking than did regularly arranged lines, again similar to crowding. While cuboid flankers reduced crowding compared with single lines, this was not the case in masking. We propose that, first, masking is reduced when the flankers are part of a larger spatial structure. Second, spatial factors counteract color differences between the target and the flankers. Third, complex Gestalts, such as cuboids, seem to need longer processing times to show ungrouping effects as observed in crowding. Strong parallels between masking and crowding suggest similar underlying mechanism; however, temporal factors in masking additionally modulate performance, acting as an additional grouping cue.