دانلودنرم افزار سنجش و آنالیز زمان و منابع مصرفی کد ها Intel VTune Amplifier XE 2016 update 2

یکی از جزئی ترین واحد های نرم افزاری که سیستم عامل برای زمان بندی پردازنده از آن استفاده می کند یک پردازه یا Thread است. تمامی نرم افزار های کامپیوتری در حقیقت خود یک پردازه می باشند که خود ممکن است از چندین پردازه دیگر استفاده نمایند. پردازه ها در واقع مقدار زمانی است که کد های یک نرم افزار سخت افزار های سیستم مانند CPU و GPU را برای اجرا در اختیار می گیرند.
جمعه، 28 اسفند 1394
تخمین زمان مطالعه:
موارد بیشتر برای شما

دانلودنرم افزار سنجش و آنالیز زمان و منابع مصرفی کد ها Intel VTune Amplifier XE 2016 update 2

یکی از جزئی ترین واحد های نرم افزاری که سیستم عامل برای زمان بندی پردازنده از آن استفاده می کند یک پردازه یا Thread است. تمامی نرم افزار های کامپیوتری در حقیقت خود یک پردازه می باشند که خود ممکن است از چندین پردازه دیگر استفاده نمایند. پردازه ها در واقع مقدار زمانی است که کد های یک نرم افزار سخت افزار های سیستم مانند CPU و GPU را برای اجرا در اختیار می گیرند.
دانلودنرم افزار سنجش و آنالیز زمان و منابع مصرفی کد ها Intel VTune Amplifier XE 2016 update 2

دانلودنرم افزار سنجش و آنالیز زمان و منابع مصرفی کد ها Intel VTune Amplifier XE 2016 update 2

یکی از جزئی ترین واحد های نرم افزاری که سیستم عامل برای زمان بندی پردازنده از آن استفاده می کند یک پردازه یا Thread است. تمامی نرم افزار های کامپیوتری در حقیقت خود یک پردازه می باشند که خود ممکن است از چندین پردازه دیگر استفاده نمایند. پردازه ها در واقع مقدار زمانی است که کد های یک نرم افزار سخت افزار های سیستم مانند CPU و GPU را برای اجرا در اختیار می گیرند.

کمپانی Intel که یکی از بزرگترین تولید کنندگان کامپوننت ها و محصولات کامپیوتری است نرم افزاری با عنوان VTune Amplifier XE را منتشر کرده است که مدت زمان و منابع مصرفی serial هاو thread های نوشته شده توسط کد های C ،C++ ،Fortran و ... و میزان هم زمانی کد ها و تنگناهای موجود را اندازه گیری و مشخص می کند. این نرم افزار ابزار و ویژگی های بی شمار و بی نظیری را برای بهینه سازی عملکرد و بهره وری CPU و GPU، عملکرد و مقیاس پذیری thread یا همان نخ، پهنای باند، ذخیره و caching و ... در اختیار کاربر قرار می دهد. عملیات آنالیزی که توسط این نرم افزار صورت می گیرد نیز بسیار سریع است چرا که قادر به تشخیص طیف وسیعی از مدل های متداول threading و ارائه ی اطلاعات شفاف و کاربردی برای تفسیر هر چه ساده تر کد ها می باشد. آنالیز های این برنامه را می توان  مرتب کرد، فیلتر نمود و یا به صورت مصور در جدول زمانی و یا منبع مورد نظر مشاهده کرد.

قابلیت های کلیدی نرم افزار Intel VTune Amplifier XE:
- ارائه ی لیست طبقه بندی شده از عملکرد ها بر اساس میزان استفاده از CPU
- یافتن کد هایی با بیشترین بار اعمالی بر سیستم تنها با چند کلیک
- تشخیص طیف وسیعی از مدل های thread یا همان نخ
- آنالیز کد ها در زبان های مختلف از جمله C ،C++ ،C# ،Fortran ،Java و ASM
- شناسایی خودکار فریم های مایکروسافت DirectX
- پشتیبانی از محصولات Intel Xeon Phi
- سازگار با کامپایلر های استاندارد نظیر Microsoft ،GCC ،Intel و ...
- تجزیه و تحلیل  User Tasks
- یکپارچه با نرم افزار های Microsoft Visual Studio و Eclipse
- مدل آنالیز خاص برای یافتن علل کندی پردازه ها
- تجزیه و تحلیل GPU و پلت فرم داده
- استفاده خودکار از خط فرمان برای تجزیه و تحلیل رگرسیون
- تجزیه و تحلیل پهنای باند و حافظه
- تحلیل همه جانبه شامل ماژول های کرنل، برنامه های چند مرحله و درایور ها
- بهینه سازی کد ها برای بالا بردن رزولوشن با اعمال کمترین بار بر سیستم
- تجزیه و تحلیل چند رتبه ای MPI و OpenMP
- و ...

 

دانلودنرم افزار سنجش و آنالیز زمان و منابع مصرفی کد ها Intel VTune Amplifier XE 2016 update 2

 




Whether you are tuning for the first time or doing advanced performance optimization, Intel VTune Amplifier XE provides a rich set of performance insight into CPU & GPU performance, threading performance & scalability, bandwidth, caching and much more. Analysis is faster and easier because VTune Amplifier understands common threading models and presents information at a higher level that is easier to interpret. Use its powerful analysis to sort, filter and visualize results on the timeline and on your source.

Here are some key features of "Intel VTune Amplifier XE":

What should I tune first? - Quickly locate code taking a lot of time
Hotspots analysis gives you a sorted list of the functions using a lot of CPU time. This is where tuning will give you the biggest benefit. Click [+] for the call stacks. Double click to see the source.

Analyze results faster - See the profiling data on your source
A double click from the function list takes you to the hottest spot in the function.

Threaded Performance Is Critical in today’s Multicore World
VTune Amplifier’s built-in understanding of parallel programming models including Intel® Threading Building Blocks, OpenMP* 4.0 and Intel® Cilk™ Plus makes it easy to see and understand multi-threading concepts such as task begin/end, synchronization, wait time… Lock & waits analysis (first image below) is one example of how this is useful. Visualization on the timeline (second image below) lets you easily see lock contention (lots of yellow transitions), load imbalance and inadvertent serialization – all common causes of poor parallel performance.

Quickly find common causes of slow threaded code with “locks and waits” analysis
Waiting too long on a lock while the cores are underutilized during the wait is a common cause of slow performance in parallel programs. Profiles like "basic hotspots" and "locks & waits" use a software collector that works on both Intel and compatible processors.

Find the answer faster – Mine the data with timeline filtering.
Select a time range in the timeline to filter out data (e.g., application startup) that masks the information you need. When you select and filter in the timeline, the grid that lists functions using a lot of CPU time updates to show the list filtered for the selected time. Yellow lines above show transitions. A high density of transitions may indicate lock contention and poor parallel performance. Turn off CPU time marking to diagnose issues with spin locks – see just when threads are running or waiting and quickly spot inadvertent serialization.

Easy Profiling of Remote Systems – License only required on host, not target.
You can easily collect data on your current host or a remote system. Or collect data using the command line on the remote system and import the data for analysis locally.
Tip: For the best performance avoid VNC’s slow graphics. Run the UI locally. Import data from the remote target. No license is required for collecting data which makes for a simple lightweight install on remote systems. A license is required to view or analyze the data collected.

Tune Drivers. Get High Resolution with Low Overhead
Intel® processors have an on chip Performance Monitoring Unit (PMU). In addition to "basic hotspots" analysis that works on both Intel and compatible processors, VTune Amplifier XE has "advanced hotspots" analysis that uses the PMU to collect data with very low overhead. System wide analysis lets you analyze drivers. Increased resolution (~1 ms vs. ~10 ms) can find hot spots in small functions that run quickly.

Bandwidth and Memory Analysis Made Easy
Use the Memory Access analysis to identify memory-related issues, like
Bandwidth-limited accesses. Quickly see a timeline of DRAM and Intel QPI bandwidth for your program. The consumers of memory bandwidth will generally vary as your program runs. By viewing the bandwidth in a graph, you can see where in your application spikes in memory usage. Filter by selecting the area in the timeline where the spike occurs and see only the code that was active at that time. This lets you isolate the individual contributors to bandwidth consumption and tune effectively.
Identify the code source and memory objects that are using bandwidth. As a general rule a structure of arrays is more cache friendly than an array of structures, but it all depends upon how your program is accessing the data. Quickly identify data structures that can be reorganized to consume less bandwidth.
For Linux targets, Memory Access analysis can be configured to attribute performance events to memory objects (data structures). You can see the parts of your code that are contributing to memory issues. Sorting results by average latency helps to prioritize your tuning efforts for maximum impact.

Opportunities Highlighted For Faster, Easier Analysis
The cell is highlighted in pink when there is a potential tuning opportunity. Hover to get suggestions.

New– Easier, More Effective OpenMP* and MPI Multi-Rank Tuning
The new summary report quickly gets you the top 4 answers you need to effectively improve OpenMP* performance. Additional details for each region are available by clicking the links.

Quickly See How to Improve OpenMP* Performance
Detailed data for each OpenMP* region highlights tuning opportunities. The region shown has the potential to run 34% faster if it is rebalanced.

Easier Multi-Rank Analysis of MPI + OpenMP
VTune Amplifier’s summary view is enriched with a table of the top MPI ranks that will benefit from improved OpenMP performance
For hybrid MPI and OpenMP* applications, it is important to explore OpenMP* inefficiency along with MPI communication between ranks. The lower the communication spin time the more the rank was executing (vs. spinning) and the more impact OpenMP* tuning will have on the application elapsed time. Use Intel® Trace Analyzer and Collector to tune MPI and select ranks with low communication spin times for further analysis in VTune Amplifier. VTune Amplifier can be installed on a cluster.

New - Easier OpenCL™ and GPU Profiling. Now for both Windows* & Linux
When tuning OpenCL on newer processors the GPU Architecture Diagram makes it easier to understand GPU hardware metrics.

Analyze GPU and Platform Data
On newer Intel processors, optionally collect GPU and platform data for tuning OpenCL and media applications. Correlate GPU and CPU activities.

No special compilers, use your regular build
Use a production build with symbols from your normal compiler. Low collection overhead means accurate results you can count on.

Automate Using the Command Line
Use the included command line to automate regression analysis. It also permits a light weight install on remote systems for simple remote collection.

System Wide Analysis
Tune drivers, kernel modules and multi-process apps.

Auto Detect Microsoft DirectX* Frames
Got a slow spot in your Windows* game play? You don't want to know where you are spending a lot of time, you want to know where you are spending a lot of time and the frame rate is slow. VTune Amplifier can automatically detect Microsoft DirectX* frames and filter results to show you what is happening in slow frames. Not using DirectX*? Just define the critical region using the API and frame analysis becomes a powerful tool for analyzing latency.

Low Overhead Java* Profiling
Analyze Java or mixed Java and native code. Results are mapped to the original Java source. Unlike some Java profilers that instrument the code, VTune Amplifier uses low overhead statistical sampling with either a hardware or software collector. Hardware collection has extremely low overhead because it uses the on-chip performance monitoring hardware.

Analyze User Tasks
The task annotation API is used to annotate your source so VTune Amplifier can display which tasks are executing. For example if you label the stages of your pipeline, they will be marked in the timeline and hovering will reveal details. This makes profiling data much easier to understand.

Tune for Intel® Xeon Phi™ Products
Hardware profiling is supported for Intel® Xeon Phi™ products and can be launched from the graphic user interface. It can collect advanced hotspots and advanced event data and has time markers for correlation of data across multiple cards.
 

لینک های دانلود

دانلود
 





ارسال نظر
با تشکر، نظر شما پس از بررسی و تایید در سایت قرار خواهد گرفت.
متاسفانه در برقراری ارتباط خطایی رخ داده. لطفاً دوباره تلاش کنید.