Inline assembly is used in Linux kernel to optimize performance or access hardware. So I decided to check it first. Before digging deeper, you may wanna read the GCC Inline Assembly HOWTO to get a general understanding. In C, a simple add function looks like:

Its inline assembly version may be:

Or simpler:

Here’s its generated code by gcc:

Output:

Our inline assembly is surrounded by #APP and #NO_APP comments. Redundant gcc directives are already removed, the remaining are just function prolog/epilog code. add2() and add3() works fine using default gcc flags. But it is not the case when -O2 optimize flag is passed. From the output of gcc -S -O2(try it yourself), I found these 2 function calls are inlined in their caller, no function call at all. These 2 issues prevent the inline assembly from working: – Depending on %eax to be the return value. But it is silently ignored in -O2. – Depending on 12(%ebp) and 8(%ebp) as parameters of function. But it is not guaranteed that parameters are there in -O2. To solve issue 1, an explicit return should be used:

To solve issue 2, parameters are required to be loaded in registers first:

add5() now works in -O2. The default calling convention is cdecl for gcc. %eax, %ecx and %edx can be used from scratch in a function. It’s the function caller’s duty to preserve these registers. These registers are so-called scratch registers. So what if we specify to use other registers other than these scratch registers, like %esi and %edi?

Again with gcc -S:

It seems that code generation of gcc in default optimize level is not so efficient:) But you should actually noticed that %esi and %edi are pushed onto stack before their usage, and popped out when finishing. These code generation is automatically done by gcc, since you have specified to use %esi(“S”) and %edi(“D”) in input list of the inline assembly. Actually, the code can be simpler by specify %eax as both input and output:

We can tell gcc to use a general register(“r”) available in current context in inline assembly:

And wrong code generation again…:

%eax is moved to %eax? gcc selected %eax and %edx as general registers to use.  The code accidentally does the right job, but it is still a potential pitfall. Clobber list can be used to avoid this:

As commented inline: The clobber list tells gcc which registers(or memory) are changed by the asm, but not listed as an output. Now gcc does not use %eax as a candidate of general registers any more. gcc can also generate code to preserve(push onto stack) registers in clobber list if necessary.

I’ve googled a lot to find the answer. But none really solve the problem simply and gracefully, even on stackoverflow. So we’ll do ourselves here 🙂

Actually, std::string supports operation using multibytes characters. This is the base of our solution:

g_cs is a Chinese word(“你好” which means hello) encoded in UTF-8. The code works under both Windows(WinXP+VS2005) and Linux(Ubuntu12.04+gcc4.6). You may wanna open a.txt to check whether the string is correctly written.

NOTE: Under Linux, we print the string directly since the default console encoding is UTF-8, and we can view the string. While under Window, the console DOES NOT support UTF-8(codepage 65001) encoding. Printing to it simply causes typo. We just convert it to a std::wstring and use MessageBox() API to check the result. I will cover the encoding issue in windows console in my next post, maybe.

I began to investigate the problem, since I cannot find a solution to read/write a UTF-8 string to XML file using boost::property_tree. Actually, it’s a bug and is already fixed in boost 1.47 and later versions. Unfortunately, Ubuntu 12.04 came with boost 1.46.1. When reading non-ASCII characters, some bytes are incorrectly skipped. The failure function is boost::property_tree::detail::rapidxml::internal::get_index(). My test code looks like:

Almost the same structure with the previous function. And finally the utf8_to_ucs2() function:

Please add header files yourselves to make it compile 🙂

There’s a series of C++ FAQ: http://www.parashift.com/c++-faq/pointers-to-members.html. And one of it addresses some technical details:

Pointers to member functions and pointers to data are not necessarily represented in the same way. A pointer to a member function might be a data structure rather than a single pointer. Think about it: if it’s pointing at a virtual function, it might not actually be pointing at a statically resolvable pile of code, so it might not even be a normal address – it might be a different data structure of some sort.

Let’s write some demo code:

Output on WinXP/VS2005:

Output on Ubuntu12.04/gcc4.6:

Both are run on 32bit systems. Sizes of pointer-to-member-function are not confirmed to be equal to sizeof(void *). And you are not allowed to convert a pointer-to-member-function to void * or a plain pointer-to-function type. Only equality comparisons(=, !=) are supported. Thus, it can be used to implement the “Safe Bool Idiom” here: http://www.artima.com/cppsource/safebool.html.

Perhaps the best-known use of this technique comes from the C++ Standard the conversion that allows the state of iostreams to be queried uses it.

But at least in gcc/libstdc++, its implementation uses bool and void * conversion operations. In basic_ios class:

This allows std::cout to even be deleted. Hmmm…Just write down some details here.

Continue with last article, only paste code here:

IcmpSendEcho() is used to send ICMP messages which does not require administrator privilege. I summarize all cases in which raw socket, icmp api or system ping approach may fail:

raw socket icmp api system ping
Windows XP Administrator OK OK OK
User WSAEACCES in sendto() OK OK
Guest WSAEACCES in sendto() OK OK
Windows 7 Administrator WSAEACCES in socket() OK OK
User WSAEACCES in socket() OK OK
Guest WSAEACCES in socket() ERROR_ACCESS_DENIED in IcmpCreateFile() Unable to contact IP driver. General failure.
Run as Administrator OK OK OK

You may ask what’s the difference between “Administrator” and “Run as Administrator”, the answer comes from stackoverflow:

– When an user from the administrator group logs on, the user is allocated two tokens: a token with all privileges, and a token with reduced privileges. When that user creates a new process, the process is by default handed the reduced privilege token. So, although the user has administrator rights, she does not exercise them by default. This is a “Good Thing”.

– To exercise those rights the user must start the process with elevated rights. For example, by using the “Run as administrator” verb. When she does this, the full token is handed to the new process and the full range of rights can be exercised.

A ping utility is used to check the availability of a remote host. I wanted to implement the function in my project. But it is not so easy as I had expected. Since administrator/root privilege is required to create a raw socket under windows/linux, and this is not what I want.

Finally, I chose to utilize system’s ping via CreateProcess()/execve() function. Under windows, ping may used IcmpSendEcho() API to wrap the creation of a raw socket, and this does not require administrator privilege. But it is still not working when logged in as a guest. Under linux, ping is a +s(setuid) utility, which means it is always run with root privilege. Anyway, I still tried to implement ping by using raw socket and sending raw ICMP(Internet Control Message Protocol) messages.

A raw ICMP echo request message has a ICMP header, while a raw ICMP echo reply message has an additional IP header in front of the ICMP header. Say:

There are several ICMP message types defined in RFC 792, but we only care about the echo type. So here’s our definition of a IP header and a ICMP header:

Our customized ICMP echo request/reply definition with self-defined data field:

The raw socket is created with:

Sending a ICMP echo request:

We simply use the checksum algorithm found in the original ping program from Mike Muuss.

Now, receiving a ICMP echo reply:

You may have noticed the if/else clause in the receive function. The use_icmp_socket flag is used to tell which socket type is used when sending a ICMP message. In linux kernel 3.0, a new socket type is introduced to reduce the possibility to use a raw socket that only send ICMP echo messages. Thus, the classic ping utility can be no longer a +s(setuid) one. A ICMP socket can be created with:

Note the difference in the second parameter. A kernel parameter(/proc/sys/net/ipv4/ping_group_range) in comment above should be set to indicate which UID range is allowed to use a ICMP socket.

When using a raw socket, the TTL value is in the IP header. While, the TTL value is in the socket ancillary data when using a ICMP socket, the reply data does not contain IP header any more. And we must set a socket option explicitly to retrieve the TTL value:

Let’s put them all together:

All code compiles and works under Ubuntu 12.04(gcc4.6), Windows XP(VS2005) and Windows 7(VS2010) with administrator/root privilege. After enabling the ICMP socket parameter, root privilege is not required under linux. The output may look like:

Reference:
– RFC 791: http://tools.ietf.org/html/rfc791
– RFC 792: http://tools.ietf.org/html/rfc792
– Implement ping in C: http://www.ibm.com/developerworks/cn/linux/network/ping/
– Raw Socket and ICMP: http://courses.cs.vt.edu/cs4254/fall04/slides/raw_6.pdf
– Linux Kernel 3.0: http://kernelnewbies.org/Linux_3.0
– IPv4: Add ICMP Socket Kind: http://lwn.net/Articles/420800/
– Patch for Userspace ping: ftp://ftp.intelib.org/pub/segoon/iputils-ss020927-pingsock.diff
– Wine Implementation: http://fossies.org/dox/wine-1.4.1/icmp_8c_source.html