How to Access Windows Clipboard by Windows API (With CString, ANSI, Unicode Issue Explained)

Abstract of Windows Clipboard Handling:

We discuss how to write text data into windows system clipboard in this article by trying out three different versions of code (incorrect ones and correct ones).  The issue about CString (dynamic array of TCHAR), ANSI and Unicode would be raised from these experimental code too. We will also distinguish CString related stuff in Visual C++ of (LPSTR, LPCSTR); (LPWSTR, LPCWSTR) ; (LPTSTR, LPCTSTR); and BSTR.

Keywords: Clipboard; CString, TCHAR; ANSI, Unicode; LPSTR, LPCSTR; LPWSTR, LPCWSTR; LPTSTR, LPCTSTR; BSTR

Different Try Out of Windows Clipboard Handling:

The purposes of the following three copies of code  are the same to write a simple CString text data into the windows system clipboard, but from my own experiments, based on the first two copies (1) and (2), I only make the third copy (3) work in my system.

For code copy (1), the fact is that when I try to do paste, it only paste one character “E” while the whole text  ”Error test by Sigmainfy” is expected. The reason lies in that it always treat the the source string as a ANSI string each element of which occupies only one byte. While actually Unicode string is expected here each element of which occupies two byte. The following figure is from MSDN, and we can see as the lower byte for Unicode actually equals ‘\0′, when we treat a Unicode string as ANSI string, from the system’s angle, this ANSI string would be a single character ended with ‘\0′.  (More details on the differences and issues about ANSI and Unicode will be discussed in next section)

Figure 1.   Character codes for “A” in ANSI, Unicode, and DBCS

 Character codes for A in ANSI, Unicode, and DBCS

// Code copy (1) Error code
void CMyListView::OnEditCopy()
{
    CString source = _T("Error test by Sigmainfy!");
    if( OpenClipboard() )
    {
        HGLOBAL clipbuffer;
        char * buffer;
        EmptyClipboard();
        clipbuffer = GlobalAlloc(GMEM_DDESHARE, source.GetLength()+1);
        buffer = (char*)GlobalLock(clipbuffer);
        strcpy(buffer, LPCSTR(source));
        GlobalUnlock(clipbuffer);
        SetClipboardData(CF_TEXT,clipbuffer);
        CloseClipboard();
    }
}

For code copy (2), Although the following code copy (2) is from MSDN official example, but it is still not good either because it makes the string interpreted as ANSI string as always. This is not general and not encouraged.

// Code copy (2) From MSDN
void CMyListView::OnEditCopy()
{
    if ( !OpenClipboard() )
    {
         AfxMessageBox( _T("Cannot open the Clipboard") );
         return;
    }
    // Remove the current Clipboard contents
    if( !EmptyClipboard() )
    {
         AfxMessageBox( _T("Cannot empty the Clipboard") );
         return;
    }
    // Get the currently selected data
    HGLOBAL hGlob = GlobalAlloc(GMEM_FIXED, 64);
    strcpy_s((char*)hGlob, 64, "Current selection\r\n");
    // For the appropriate data formats...
    if ( ::SetClipboardData( CF_TEXT, hGlob ) == NULL )
    {
        CString msg;
        msg.Format(_T("Unable to set Clipboard data, error: %d"), GetLastError());
        AfxMessageBox( msg );
        CloseClipboard();
        GlobalFree(hGlob);
        return;
    }
    CloseClipboard();
}

For the following code copy (3) I have made based on the above two, it is better and more general because it involves general string character handing stuff including:

a) _T macro,

b) TCHAR, LPTSTR,

c) memcpy to deal with general memory byte block rather than strcpy which only deals with ANSI characters

d) and distinguish CF_UNICODETEXT from CF_TEXT,

The above four general techniques make the code copy (3) can be used in both ANSI and Unicode environment without modifying the code at all.

// Code copy (3)
// My own general version of code to handle windows clipboard
void CMyListView::OnEditCopy()
{
    // open clipboard
    if (!OpenClipboard() ) {
        ::AfxMessageBox( _T("Cannot open the Clipboard!") );
        return;
    }

    // clear clipboard
    if( !EmptyClipboard() )
    {
        ::AfxMessageBox( _T("Cannot empty the Clipboard!") );
        return;
    }

    // prepare data
    CString strClipboardData(_T("Clipboard Test Data From Sigmainfy"));
    size_t iDataSize = sizeof(TCHAR)*(1 + strClipboardData.GetLength());

    // global memory block
    HGLOBAL hDataPool = GlobalAlloc(GMEM_MOVEABLE, iDataSize);
    LPTSTR lptstrDataPoolCopy = (LPTSTR)GlobalLock(hDataPool);
    memcpy(lptstrDataPoolCopy, strClipboardData.GetBuffer(), iDataSize);
    GlobalUnlock(hDataPool);
#ifndef _UNICODE
    if ( NULL == ::SetClipboardData( CF_TEXT, hDataPool) )
#else
    if ( NULL == ::SetClipboardData( CF_UNICODETEXT, hDataPool) )
#endif
    {
        CString strMessage;
        strMessage.Format(_T("Unable to set Clipboard data, error: %d"), GetLastError());
        ::AfxMessageBox( strMessage );
        ::CloseClipboard();
        GlobalFree(hDataPool);
        return;
    }
    CloseClipboard();
}

Explanation about CString, ANSI and Unicode:

这个section其实和系统的clipboard关系不大了, 只不过是笔者在处理windows clipboard的时候连带的想理清楚的一块知识点, 网上搜了很多, 笔者认为比较靠谱的相关知识点的总结梳理如下(笔者稍做了整理,  原文出处实在没找到, 看到的多是已经转载过好几轮的文章, 若原作者看到, 请联系笔者进行标注):

(1) CString LPCTSTR BSTR三者的区别联系:
CString是一个动态TCHAR数组, 而BSTR是一种专有格式的字符串(需要用系统提供的函数来操纵), LPCTSTR只是一个常量的TCHAR指针。CString 是一个完全独立的类,动态的TCHAR数组,封装了 + 等操作符和字符串操作方法。

typedef OLECHAR FAR* BSTR;
typedef const char * LPCTSTR;

(2) VC++中各种字符串的表示法:
首先char* 是指向ANSI字符数组的指针,其中每个字符占据8位(有效数据是除掉最高位的其他7位),这里保持了与传统的C,C++的兼容。

LP的含义是长指针(long pointer)。LPSTR是一个指向以‘\0’结尾的ANSI字符数组的指针,与char*可以互换使用,在win32中较多地使用LPSTR。而LPCSTR中增加的‘C’的含义是“CONSTANT”(常量),表明这种数据类型的实例不能被使用它的API函数改变,除此之外,它与LPSTR是等同的。

  1. LP表示长指针,在win16下有长指针(LP)和短指针(P)的区别,而在win32下是没有区别的,都是32位.所以这里的LP和P是等价的.
  2. C表示const
  3. T是什么东西呢,我们知道TCHAR在采用Unicode方式编译时是wchar_t,在普通时编译成char.

为了满足程序代码国际化的需要,业界推出了Unicode标准,它提供了一种简单和一致的表达字符串的方法,所有字符中的字节都是16位的值,其数 量也可以满足差不多世界上所有书面语言字符的编码需求,开发程序时使用Unicode(类型为wchar_t)是一种被鼓励的做法。

LPWSTR与LPCWSTR由此产生,它们的含义类似于LPSTR与LPCSTR,只是字符数据是16位的wchar_t而不是char。

然后为了实现两种编码的通用,提出了TCHAR的定义:
如果定义_UNICODE,声明如下:
typedef wchar_t TCHAR;
如果没有定义_UNICODE,则声明如下:
typedef char TCHAR;

LPTSTR和LPCTSTR中的含义就是每个字符是这样的TCHAR。

CString类中的字符就是被声明为TCHAR类型的,它提供了一个封装好的类供用户方便地使用。

LPCTSTR:
#ifdef _UNICODE
typedef const wchar_t * LPCTSTR;
#else
typedef const char * LPCTSTR;
#endif

##

**总结: **
(LPSTR, LPCSTR)是一套, 指向ANSI字符串;
(LPWSTR, LPCWSTR)是一套, 指向宽字节字符串(每个character占两个字节);
(LPTSTR, LPCTSTR)是通用的一套, 按照预编译头里面是否有_UNICODE判断字符串类型, 鼓励用这一条带T的通用的处理机制.

Written on January 27, 2014