foolyc

multithreaded simple datatype access and atomic variables

多线程下变量-原子操作

最近阅读pachi的源代码,在多线程中需要对全局数据进行操作,看到__sync_fetch_and_add这种代码,下面是学习的整理。

事实上,C程序中简单的global_int++这种操作可以分为三部分:

  • 从缓存中取出数据到寄存器
  • 寄存器中数据累加
  • 将数据从寄存器取到缓存

thread lock

因此多个线程并行对同一数据进行操作时,无法保证数据的正确性,因此多线程操作时通常需要上锁,保证同一时刻只能有一个线程在对数据进行操作,如下:

1
2
3
pthread_mutex_lock(&count_lock);
global_int++;
pthread_mutex_unlock(&count_lock);

atomic variables

而__sync_fetch_and_add从处理器指令集层面对该功能进行了优化,该系列操作锁住FSB,这个FSB是处理器和RAM之间的总线,锁住了它,就能阻止其他处理器或者core从RAM获取数据,但是这种操作只适用于小片内存空间。Atomic variables在系统内核一直广泛使用,但是在用户层,直到gcc 4.1.2 才能使用。

性能差异

对上锁和atomic variables的性能进行了测试,代码见附录:

1
2
3
4
5
Starting 4 threads...
cost 1.415952 seconds value : 4000000 Expected value : 4000000
Starting 4 threads...
cost 0.288744 seconds value : 4000000 Expected value : 4000000
rate 4.903832
  • 两种方式都可以保证线程安全
  • atomic variables比加锁的方式快4.78(std 0.3468)倍(统计5次)

其他

该系列其他函数

type __sync_fetch_and_add (type *ptr, type value);

type __sync_fetch_and_sub (type *ptr, type value);

type __sync_fetch_and_or (type *ptr, type value);

type __sync_fetch_and_and (type *ptr, type value);

type __sync_fetch_and_xor (type *ptr, type value);

type __sync_fetch_and_nand (type *ptr, type value);

type __sync_add_and_fetch (type *ptr, type value);

type __sync_sub_and_fetch (type *ptr, type value);

type __sync_or_and_fetch (type *ptr, type value);

type __sync_and_and_fetch (type *ptr, type value);

type __sync_xor_and_fetch (type *ptr, type value);

type __sync_nand_and_fetch (type *ptr, type value);

数据类型只能是如下类型

  • int
  • unsigned int
  • long
  • unsigned long
  • long long
  • unsigned long long

refference

Multithreaded simple data type access and atomic variables

附录

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
#include <stdio.h>
#define __USE_GNU
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <sched.h>
#include <linux/unistd.h>
#include <sys/syscall.h>
#include <errno.h>
#include <time.h>
#define INC_TO 1000000 // one million...
int global_int = 0;
pthread_mutex_t count_lock = PTHREAD_MUTEX_INITIALIZER;
pid_t gettid( void )
{
return syscall( __NR_gettid );
}
void *thread_routine0( void *arg )
{
int i;
int proc_num = (int)(long)arg;
cpu_set_t set;
CPU_ZERO( &set );
CPU_SET( proc_num, &set );
if (sched_setaffinity( gettid(), sizeof( cpu_set_t ), &set ))
{
perror( "sched_setaffinity" );
return NULL;
}
for (i = 0; i < INC_TO; i++)
{
pthread_mutex_lock(&count_lock);
global_int++;
pthread_mutex_unlock(&count_lock);
}
return NULL;
}
void *thread_routine( void *arg )
{
int i;
int proc_num = (int)(long)arg;
cpu_set_t set;
CPU_ZERO( &set );
CPU_SET( proc_num, &set );
if (sched_setaffinity( gettid(), sizeof( cpu_set_t ), &set ))
{
perror( "sched_setaffinity" );
return NULL;
}
for (i = 0; i < INC_TO; i++)
{
//global_int++;
__sync_fetch_and_add( &global_int, 1 );
}
return NULL;
}
int main()
{
int procs = 0;
int i;
pthread_t *thrs;
clock_t start, finish;
double duration0,duration1;
// Getting number of CPUs
procs = (int)sysconf( _SC_NPROCESSORS_ONLN );
if (procs < 0)
{
perror( "sysconf" );
return -1;
}
thrs = malloc( sizeof( pthread_t ) * procs );
if (thrs == NULL)
{
perror( "malloc" );
return -1;
}
printf( "Starting %d threads...\n", procs );
start = clock();
for (i = 0; i < procs; i++)
{
if (pthread_create( &thrs[i], NULL, thread_routine0,
(void *)(long)i ))
{
perror( "pthread_create" );
procs = i;
break;
}
}
for (i = 0; i < procs; i++)
pthread_join( thrs[i], NULL );
finish = clock();
duration0 = (double)(finish - start) / CLOCKS_PER_SEC;
printf( "cost %f seconds value : %d Expected value : %d\n", duration0,global_int, INC_TO * procs);
global_int = 0;
printf( "Starting %d threads...\n", procs );
start = clock();
for (i = 0; i < procs; i++)
{
if (pthread_create( &thrs[i], NULL, thread_routine,
(void *)(long)i ))
{
perror( "pthread_create" );
procs = i;
break;
}
}
for (i = 0; i < procs; i++)
pthread_join( thrs[i], NULL );
finish = clock();
duration1 = (double)(finish - start) / CLOCKS_PER_SEC;
printf( "cost %f seconds value : %d Expected value : %d\n", duration1,global_int, INC_TO * procs);
printf("rate %f\n",duration0/duration1);
free( thrs );
return 0;
}
本文由foolyc创作和发表,采用BY-NC-SA国际许可协议进行许可
转载请注明作者及出处,本文作者为foolyc
本文标题为multithreaded simple datatype access and atomic variables
本文链接为http://foolyc.com//2017/02/14/multithreaded-simple-data-type-access-and-atomic-variables/.