C# Webservice for getting Link details like facebook with Open Graph Support


Was working on customizing the CK Editor for a site I’m working on and thought after using sites like facebook where they just paste a link and it auto formats it with details people are not going to want to use the actual link button. That’s too much work. I started looking around for a CK Editor Plugin that did the ‘facebook magic’ for me but I couldn’t find one. So I started looking at facebook, digging through their api and decided I needed to build a webservice to do this.

Here’s what I did:

1.  Use HtmlAgilityPack for html parsing.
2.  Get the basic info from standard ‘head’ tags.
3. Check to see if the site supports facebooks ‘open graph’
4.  If we still don’t have an image loop though all the images on the page. Download them and check they are atleast 50 pixels, ‘score’ them based on the alt tag being compared to the title and h1 tag.
5. If the link is an image itself set the link type to 2. (Link type of 0 = non html or image.)

So far I just have the webservice, you can go to /ajaxwebservice/webapi.asmx/GetDetails?url=www.cnn.com/2011/US/02/18/arkansas.tremors.increase/index.html and it’ll return the xml you need.

<LinkDetails>
<Title>4.3 quake shakes tiny, tremor-plagued Arkansas town</Title>
<Url>http://www.cnn.com/2011/US/02/18/arkansas.tremors.increase/index.html</Url>
<Type>article</Type>
<Image>
  <Width>90</Width>
  <Height>51</Height>
  <Url>http://i.cdn.turner.com/cnn/.element/img/3.0/newsscanner/no_image_cnn_90x51.jpg</Url>
</Image>
<Images/>
<Description>
When Mark Barrett moved to Guy, Arkansas, he had no idea the tiny town of less than 300 people was nearly as rocking as the Southern California community he'd left behind.
</Description>
<ContentLength>52889</ContentLength>
<MimeType>text/html</MimeType>
<LinkType>1</LinkType>
</LinkDetails>

The code

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.Services;
using HtmlAgilityPack;
using System.IO;
using System.Drawing;

namespace MyNameSpace.AjaxWebservice
{
    /// <summary>
    /// Summary description for WebApi
    /// </summary>
    [WebService(Namespace = "http://tempuri.org/")]
    [WebServiceBinding(ConformsTo = WsiProfiles.BasicProfile1_1)]
    [System.ComponentModel.ToolboxItem(false)]
    [System.Web.Script.Services.ScriptService]
    [System.Web.Script.Services.GenerateScriptType(typeof(LinkDetails))]
    [System.Web.Script.Services.GenerateScriptType(typeof(ImageLink))]
    public class WebApi : System.Web.Services.WebService
    {

        [WebMethod]
        public LinkDetails GetDetails(string url)
        {
            LinkDetails linkDetails = new LinkDetails();
            //http://htmlagilitypack.codeplex.com/

            linkDetails.Url = "http://" + url;
            linkDetails = GetHeaders(linkDetails.Url, linkDetails);

            if (linkDetails.MimeType.ToLower().Contains("text/html"))
            {
                linkDetails.LinkType = 1;
                HtmlDocument htmlDocument = new HtmlDocument();
                System.Net.WebClient webClient = new System.Net.WebClient();
                string download = webClient.DownloadString(linkDetails.Url);

                htmlDocument.LoadHtml(download);
                HtmlNode htmlNode = htmlDocument.DocumentNode.SelectSingleNode("html/head");

                linkDetails = GetStandardInfo(linkDetails, htmlNode);

                linkDetails = GetOpenGraphInfo(linkDetails, htmlNode);

                if (linkDetails.Image == null)
                {
                    linkDetails = GuessImage(htmlDocument, linkDetails);
                }
            }

            if (linkDetails.MimeType.ToLower().Contains("image/"))
            {
                linkDetails.LinkType = 2;
                linkDetails.Image = new ImageLink(linkDetails.Url);
            }
            return linkDetails;
        }
        //get info off of basic meta tags
        private LinkDetails GetStandardInfo(LinkDetails linkDetails, HtmlNode head)
        {
            foreach (HtmlNode headNode in head.ChildNodes)
            {
                switch (headNode.Name.ToLower())
                {
                    case "link" : break;
                    case "title" :
                        linkDetails.Title = HttpUtility.HtmlDecode(headNode.InnerText);
                        break;
                    case "meta" :
                        if (headNode.Attributes["name"] != null && headNode.Attributes["content"] != null )
                        {
                            switch (headNode.Attributes["name"].Value.ToLower())
                            {
                                case "description" :
                                    linkDetails.Description = HttpUtility.HtmlDecode(headNode.Attributes["content"].Value);
                                break;
                            }
                        }
                        break;
                }

            }
            // look for apple touch icon in header
             HtmlNode imageNode = head.SelectSingleNode("link[@rel='apple-touch-icon']");
            if (imageNode != null)
            {
                if (imageNode.Attributes["href"] != null) { linkDetails.Image = new ImageLink(imageNode.Attributes["href"].Value, linkDetails.Url); }
                if (imageNode.Attributes["src"] != null) { linkDetails.Image = new ImageLink(imageNode.Attributes["src"].Value, linkDetails.Url); }
            }
            //look for link image in header
           imageNode = head.SelectSingleNode("link[@rel='image_src']");
            if (imageNode != null)
            {
                if (imageNode.Attributes["href"] != null) { linkDetails.Image = new ImageLink(imageNode.Attributes["href"].Value, linkDetails.Url); }
                if (imageNode.Attributes["src"] != null) { linkDetails.Image = new ImageLink(imageNode.Attributes["src"].Value, linkDetails.Url); }
            }

            return linkDetails;
        }

        //get info using open graph
        private LinkDetails GetOpenGraphInfo(LinkDetails linkDetails, HtmlNode head)
        {
            foreach (HtmlNode headNode in head.ChildNodes)
            {
                switch (headNode.Name.ToLower())
                {
                    case "link" : break;

                    case "meta" :
                        if (headNode.Attributes["property"] != null && headNode.Attributes["content"] != null )
                        {
                            switch (headNode.Attributes["property"].Value.ToLower())
                            {
                                case "og:title" :
                                    linkDetails.Title = HttpUtility.HtmlDecode(headNode.Attributes["content"].Value);
                                    break;
                                case "og:type" :
                                    linkDetails.Type = headNode.Attributes["content"].Value;
                                    break;
                                 case "og:url" :
                                    linkDetails.Url = headNode.Attributes["content"].Value;
                                    break;
                                case "og:image" :
                                    linkDetails.Image = new ImageLink(headNode.Attributes["content"].Value, linkDetails.Url);
                                    break;
                                 case "og:site_name" :
                                    linkDetails.SiteName = HttpUtility.HtmlDecode(headNode.Attributes["content"].Value);
                                    break;
                                case "og:description" :
                                    linkDetails.Description = HttpUtility.HtmlDecode(headNode.Attributes["content"].Value);
                                    break;
                                case "og:email" :
                                    linkDetails.Email = HttpUtility.HtmlDecode(headNode.Attributes["content"].Value);
                                    break;
                                case "og:phone_number" :
                                    linkDetails.PhoneNumber = HttpUtility.HtmlDecode(headNode.Attributes["content"].Value);
                                    break;
                                case "og:fax_number" :
                                    linkDetails.FaxNumber = HttpUtility.HtmlDecode(headNode.Attributes["content"].Value);
                                    break;

                            }
                        }
                        break;
                }

            }
            return linkDetails;
        }

        //try to guess at the images
        private LinkDetails GuessImage(HtmlDocument htmlDocument, LinkDetails linkDetails)
        {
            LinkDetails detail = linkDetails;
            HtmlNodeCollection imageNodes = htmlDocument.DocumentNode.SelectNodes("//img");
            string h1 = string.Empty;
            HtmlNode h1Node = htmlDocument.DocumentNode.SelectSingleNode("//h1");
            if (h1Node != null)
            {
                h1 = h1Node.InnerText;
            }
            int bestScore = -1;
            if (imageNodes != null)
            {
                foreach (HtmlNode imageNode in imageNodes)
                {
                    if (imageNode != null && imageNode.Attributes["src"] != null && imageNode.Attributes["alt"] != null)
                    {
                        ImageLink imageLink = new ImageLink(imageNode.Attributes["src"].Value, detail.Url);
                        if (!(imageLink.Width > 0 && imageLink.Width < 50)) //if we don't have a width go with it but if we know width is less than 50 don't use it
                        {
                            int myScore = ScoreImage( imageNode.Attributes["alt"].Value, linkDetails.Title);
                            myScore += ScoreImage(imageNode.Attributes["alt"].Value, h1);

                            if (myScore > bestScore)
                            {
                                detail.Image = imageLink;
                                bestScore = myScore;
                            }

                            if (!detail.Images.Contains(imageLink)) { detail.Images.Add(imageLink); }
                        }
                    }
                }
            }

            return detail;
        }
        //score the image based on matches in comparing alt to title and h1 tag
        private int ScoreImage(string text, string compare)
        {
            text = text.Replace("\r\n", string.Empty).Replace("\t",string.Empty);
            compare = compare.Replace("\r\n", string.Empty).Replace("\t", string.Empty);
            int score = 0;
            if (!string.IsNullOrEmpty(text) && !string.IsNullOrEmpty(compare))
            {
                string[] c = compare.Split(' ');

                foreach (string test in c)
                {
                    if (text.Contains(test)) { score++; }
                }
            }
            return score;
        }

        public LinkDetails GetHeaders(string link, LinkDetails linkDetails)
        {
            try
            {

                System.Net.WebClient wc = new System.Net.WebClient();
                wc.OpenRead(link);
                linkDetails.ContentLength = Convert.ToInt64(wc.ResponseHeaders["Content-Length"]);
                linkDetails.MimeType = wc.ResponseHeaders["Content-Type"];
            }
            catch
            {
                linkDetails.MimeType = "Don't Download";
            }
            return linkDetails;
        }

    }

    public class LinkDetails
    {
        public LinkDetails()
        {
            Images = new List<ImageLink>();
        }
        public string Title { get; set; }
        public string Url { get; set; }
        public string Type { get; set; }
        public ImageLink Image { get; set; }
        public List<ImageLink> Images { get; set; }
        public string SiteName { get; set; }
        public string Description { get; set; }
        public string Email { get; set; }
        public string PhoneNumber { get; set; }
        public string FaxNumber { get; set; }
        public Int64 ContentLength { get; set; }
        public string MimeType { get; set; }
        public int LinkType { get; set; } // 0=bad, 1=html, 2=image

    }

    public class ImageLink
    {
        public int Width { get; set; }
        public int Height { get; set; }
        public string Url { get; set; }

        public ImageLink()
        {
        }

        public ImageLink(string url, string siteUrl)
        {
            SetImageLink(FullyQualifiedImage(url, siteUrl));
        }

        public ImageLink(string url)
        {
            SetImageLink(url);
        }

        private void SetImageLink(string url)
        {
            this.Url = url;
            try
            {
                System.Net.WebClient webClient = new System.Net.WebClient();
                byte[] imageData = webClient.DownloadData(url);
                MemoryStream stream = new MemoryStream(imageData);
                Image img = Image.FromStream(stream);
                stream.Close();
                this.Width = img.Width;
                this.Height = img.Height;
            }
            catch
            {

            }
        }

        //get the image url if it beings with / instead of // if it's a relative url I'm too lazy to make it work
        private string FullyQualifiedImage(string imageUrl, string siteUrl)
        {
            if (imageUrl.Contains("http:") || imageUrl.Contains("https:"))
            {
                return imageUrl;
            }

            if (imageUrl.IndexOf("//") == 0)
            {
                return "http:" + imageUrl;
            }
            try
            {
                string baseurl = siteUrl.Replace("http://", string.Empty).Replace("https://", string.Empty);
                baseurl = baseurl.Split('/')[0];
                return string.Format("http://{0}{1}", baseurl, imageUrl);

            }
            catch { }

            return imageUrl;

        }
    }
}

Let me know if you have any luck getting it to work auto magicly in CK Editor before I do or if you have any suggestions in the comments.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.Services;
using HtmlAgilityPack;
using System.IO;
using System.Drawing; 

namespace MyHobbyGear.AjaxWebservice
{
/// <summary>
/// Summary description for WebApi
/// </summary>
[WebService(Namespace = “http://tempuri.org/&#8221;)]
[WebServiceBinding(ConformsTo = WsiProfiles.BasicProfile1_1)]
[System.ComponentModel.ToolboxItem(false)]
[System.Web.Script.Services.ScriptService]
[System.Web.Script.Services.GenerateScriptType(typeof(LinkDetails))]
// To allow this Web Service to be called from script, using ASP.NET AJAX, uncomment the following line.
// [System.Web.Script.Services.ScriptService]
public class WebApi : System.Web.Services.WebService
{

[WebMethod]
public LinkDetails GetDetails(string url)
{
LinkDetails linkDetails = new LinkDetails();
//http://htmlagilitypack.codeplex.com/
HtmlDocument htmlDocument = new HtmlDocument();
System.Net.WebClient webClient = new System.Net.WebClient();
string download = webClient.DownloadString(“http://&#8221; + url);
htmlDocument.LoadHtml(download);
HtmlNode htmlNode = htmlDocument.DocumentNode.SelectSingleNode(“html/head”);
linkDetails.Url = “http://&#8221; + url;
linkDetails = GetStandardInfo(linkDetails, htmlNode);

linkDetails = GetOpenGraphInfo(linkDetails, htmlNode);

if (string.IsNullOrEmpty(linkDetails.Image))
{
linkDetails = GuessImage(htmlDocument, linkDetails);
}
return linkDetails;
}
//get info off of basic meta tags
private LinkDetails GetStandardInfo(LinkDetails linkDetails, HtmlNode head)
{
foreach (HtmlNode headNode in head.ChildNodes)
{
switch (headNode.Name.ToLower())
{
case “link” : break;
case “title” :
linkDetails.Title = headNode.InnerText;
break;
case “meta” :
if (headNode.Attributes[“name”] != null && headNode.Attributes[“content”] != null )
{
switch (headNode.Attributes[“name”].Value.ToLower())
{
case “description” :
linkDetails.Description = headNode.Attributes[“content”].Value;
break;
}
}
break;
}

}
//look for link image in header
HtmlNode imageNode = head.SelectSingleNode(“link[@rel=’image_src’]”);
if (imageNode != null)
{
if (imageNode.Attributes[“href”] != null) {linkDetails.Image = imageNode.Attributes[“href”].Value;}
if (imageNode.Attributes[“src”] != null) { linkDetails.Image = imageNode.Attributes[“src”].Value; }
}
// look for apple touch icon in header
imageNode = head.SelectSingleNode(“link[@rel=’apple-touch-icon’]”);
if (imageNode != null)
{
if (imageNode.Attributes[“href”] != null) { linkDetails.Image = imageNode.Attributes[“href”].Value; }
if (imageNode.Attributes[“src”] != null) { linkDetails.Image = imageNode.Attributes[“src”].Value; }
}

return linkDetails;
}

//get info using open graph
private LinkDetails GetOpenGraphInfo(LinkDetails linkDetails, HtmlNode head)
{
foreach (HtmlNode headNode in head.ChildNodes)
{
switch (headNode.Name.ToLower())
{
case “link” : break;

case “meta” :
if (headNode.Attributes[“property”] != null && headNode.Attributes[“content”] != null )
{
switch (headNode.Attributes[“property”].Value.ToLower())
{
case “og:title” :
linkDetails.Title = headNode.Attributes[“content”].Value;
break;
case “og:type” :
linkDetails.Type = headNode.Attributes[“content”].Value;
break;
case “og:url” :
linkDetails.Url = headNode.Attributes[“content”].Value;
break;
case “og:image” :
linkDetails.Image = headNode.Attributes[“content”].Value;
break;
case “og:site_name” :
linkDetails.SiteName = headNode.Attributes[“content”].Value;
break;
case “og:description” :
linkDetails.Description = headNode.Attributes[“content”].Value;
break;
case “og:email” :
linkDetails.Email = headNode.Attributes[“content”].Value;
break;
case “og:phone_number” :
linkDetails.PhoneNumber = headNode.Attributes[“content”].Value;
break;
case “og:fax_number” :
linkDetails.FaxNumber = headNode.Attributes[“content”].Value;
break;

}
}
break;
}

}
return linkDetails;
}

//try to guess at the images
private LinkDetails GuessImage(HtmlDocument htmlDocument, LinkDetails linkDetails)
{
LinkDetails detail = linkDetails;
HtmlNodeCollection imageNodes = htmlDocument.DocumentNode.SelectNodes(“//img”);
string h1 = string.Empty;
HtmlNode h1Node = htmlDocument.DocumentNode.SelectSingleNode(“//h1”);
if (h1Node != null)
{
h1 = h1Node.InnerText;
}
int bestScore = -1;
if (imageNodes != null)
{
foreach (HtmlNode imageNode in imageNodes)
{
if (imageNode != null && imageNode.Attributes[“src”] != null && imageNode.Attributes[“alt”] != null)
{
string fullImageUrl = FullyQualifiedImage(imageNode.Attributes[“src”].Value, detail.Url);
int width = GetImageWidth(fullImageUrl);
if (!(width > 0 && width < 50)) //if we don’t have a width go with it but if we know width is less than 50 don’t use it
{
int myScore = ScoreImage( imageNode.Attributes[“alt”].Value, linkDetails.Title);
myScore += ScoreImage(imageNode.Attributes[“alt”].Value, h1);

if (myScore > bestScore)
{
detail.Image = fullImageUrl;
bestScore = myScore;
}

if (!detail.Images.Contains(fullImageUrl)) { detail.Images.Add(fullImageUrl); }
}
}
}
}

return detail;
}

private int GetImageWidth(string uri)
{
int width = 0;
try
{
System.Net.WebClient webClient = new System.Net.WebClient();
byte[] imageData = webClient.DownloadData(uri);
MemoryStream stream = new MemoryStream(imageData);
Image img = Image.FromStream(stream);
stream.Close();
width = img.Width;
}
catch
{

}

return width;

}
//score the image based on matches in comparing alt to title and h1 tag
private int ScoreImage(string text, string compare)
{
text = text.Replace(“\r\n”, string.Empty).Replace(“\t”,string.Empty);
compare = compare.Replace(“\r\n”, string.Empty).Replace(“\t”, string.Empty);
int score = 0;
if (!string.IsNullOrEmpty(text) && !string.IsNullOrEmpty(compare))
{
string[] c = compare.Split(‘ ‘);

foreach (string test in c)
{
if (text.Contains(test)) { score++; }
}
}
return score;
}

//get the image url if it beings with / instead of // if it’s a relative url I’m too lazy to make it work
private string FullyQualifiedImage(string imageUrl, string siteUrl)
{
if (imageUrl.Contains(“http:”) || imageUrl.Contains(“https:”))
{
return imageUrl;
}

if (imageUrl.IndexOf(“//”) == 0)
{
return “http:” + imageUrl;
}
try
{
string baseurl = siteUrl.Replace(“http://&#8221;, string.Empty).Replace(“https://&#8221;, string.Empty);
baseurl = baseurl.Split(‘/’)[0];
return string.Format(“http://{0}{1}”, baseurl, imageUrl);

}
catch { }

return imageUrl;

}

}

public class LinkDetails
{
public LinkDetails()
{
Images = new List<string>();
}
public string Title { get; set; }
public string Url { get; set; }
public string Type { get; set; }
public string Image { get; set; }
public List<string> Images { get; set; }
public string SiteName { get; set; }
public string Description { get; set; }
public string Email { get; set; }
public string PhoneNumber { get; set; }
public string FaxNumber { get; set; }

}
}

Advertisements

About Kevin Buckley
.Net web developer with a lot of experience in CMS. Currently working at Sitecore as Solutions Engineer.

2 Responses to C# Webservice for getting Link details like facebook with Open Graph Support

  1. After implementing into CK Editor I updated this to have support for different link types.
    1 = Html
    2 = Image
    0 = Other

    I don’t want to work with Other cuz it might point to an executable or something malicious.

    I also updated the Image to have a width and height so I can code against them in the UI.

  2. Pingback: Meta Tags for public facing websites « Web Content Management and Delivery

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: