A Technique for Extracting Sub-Source Similarities from Information Sources Having Different Formats